Experimental Webcam Hand Tracking with OpenCV and MediaPipe (only for development purpose until I get this working)

MaryJane · Feb 5, 2024

Hi,
I watched on Youtube following video:

It's a machine learning based hand detection, written in python.
He wrote a wrapper package for the MediaPipe called cvzone and created a UDP Stream for Unity.
The Hand detection looks very promising.

I thought this could work also in VAM.
Therefore I wrote and VAM Plugin that consumes that UDP Stream, see code below.

As Input also a video file is possible.

cap = cv2.VideoCapture(0) for webcam
cap = cv2.VideoCapture("anyvideofile.mp4") for video

Here is the Python code:

Python:

import cv2
from cvzone.HandTrackingModule import HandDetector
import socket

# Paramters
width, height = 1280, 720
#width, height = 1920, 1080

# Video or Webcam
cap = cv2.VideoCapture(0)
cap.set(3, width)
cap.set(4, height)

# Hand Detector
detector = HandDetector(maxHands=1, detectionCon=0.8)

# Communication
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
serverAddressPort = ("127.0.0.1", 5052)

while True:
    # Get the frame from the webcam
    success, img = cap.read()

    # Hands
    hands, img = detector.findHands(img)

    data = []
    # Landmark values - (x,y,z)*21
    if hands:
        # Get the first hand detected
        hand = hands[0]
        # Get the landmark list
        lmList = hand['lmList']
        #print(lmList)
        for lm in lmList:
            data.extend([lm[0], height - lm[1], lm[2]])
        #print(data)
        sock.sendto(str.encode(str(data)), serverAddressPort)

    cv2.imshow("Image", img)
    cv2.waitKey(1)

You need the following packages (exact version):

Python 3.7.1
cvzone 1.5.6
mediapipe 0.9.0.1

And here is the HandTracking.cs for VAM.

C#:

using System;
using System.Net;
using System.Net.Sockets;
using UnityEngine;

namespace VAMDev
{
    public class HandTracking : MVRScript
    {
        FreeControllerV3 controller;
        protected UIDynamicButton connectToServer;
        protected UIDynamicButton disconnectFromServer;

        UdpClient clientData;
        int portData = 5052;
        public int receiveBufferSize = 120000;

        public bool showDebug = false;
        IPEndPoint ipEndPointData;
        private object obj = null;
        private AsyncCallback AC;
        byte[] receivedBytes;

        public override void Init()
        {
            try
            {
                var connectToServer = CreateButton("Connect", false);
                connectToServer.button.onClick.AddListener(ConnectToServerCallback);
                var disconnectFromServer = CreateButton("Disconnect", true);
                disconnectFromServer.button.onClick.AddListener(DisconnectFromServerCallback);
            }
            catch (Exception e)
            {
                SuperController.LogError("Exception caught in Init(): " + e);
            }
        }

        protected void ConnectToServerCallback()
        {
            try
            {
                InitializeUDPListener();
                SuperController.LogMessage("Connected to server!");

            }
            catch (Exception e)
            {
                SuperController.LogError("Exception caught: " + e);
            }
        }

        protected void DisconnectFromServerCallback()
        {
            OnDestroy();
            SuperController.LogMessage("Disconnected from server.");
        }

        public void InitializeUDPListener()
        {
            ipEndPointData = new IPEndPoint(IPAddress.Any, portData);
            clientData = new UdpClient();
            clientData.Client.ReceiveBufferSize = receiveBufferSize;
            clientData.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, optionValue: true);
            clientData.ExclusiveAddressUse = false;
            clientData.EnableBroadcast = true;
            clientData.Client.Bind(ipEndPointData);
            clientData.DontFragment = true;
            if (showDebug) SuperController.LogMessage("BufSize: " + clientData.Client.ReceiveBufferSize);
            AC = new AsyncCallback(ReceivedUDPPacket);
            clientData.BeginReceive(AC, obj);
            SuperController.LogMessage("UDP - Start Receiving..");
        }

        void ReceivedUDPPacket(IAsyncResult result)
        {

            receivedBytes = clientData.EndReceive(result, ref ipEndPointData);

            ParsePacket();

            clientData.BeginReceive(AC, obj);
        }

        void ParsePacket()
        {
            string data = System.Text.Encoding.UTF8.GetString(receivedBytes);
            data = data.Remove(0, 1);
            data = data.Remove(data.Length - 1, 1);
            //SuperController.LogMessage(data);
            string[] points = data.Split(',');
            //SuperController.LogMessage(points[0]);

            //RIGHT HAND WRIST
            Vector3 pointA = new Vector3(float.Parse(points[0]), float.Parse(points[1]), float.Parse(points[2]));
            //RIGHT HAND MIDDLE_FINGER_TIP
            Vector3 pointB = new Vector3(float.Parse(points[36]), float.Parse(points[37]), float.Parse(points[38]));

            Vector3 dir = pointA - pointB;
            Quaternion rotation = Quaternion.LookRotation(Vector3.Cross((dir), Vector3.up).normalized);

            controller = containingAtom.GetStorableByID("rHandControl") as FreeControllerV3;
            SetControllerState(controller, FreeControllerV3.PositionState.On, FreeControllerV3.RotationState.On);

            controller.SetLocalPosition(new Vector3(pointA.x, pointA.y, pointA.z).normalized);
            controller.SetLocalEulerAngles(rotation.eulerAngles);
        }
        void OnDestroy()
        {
            if (clientData != null)
            {
                clientData.Close();
            }
        }

        protected void SetControllerState(FreeControllerV3 controller, FreeControllerV3.PositionState positionState, FreeControllerV3.RotationState rotationState)
        {
            controller.currentPositionState = positionState;
            controller.currentRotationState = rotationState;
        }
    }
}

But there are several issues:

x and y offset, the correct value is not that easy to find
can't get the finger working correctly, it's bending and streching all over the screen
breaking VAM UI after several runs

MaryJane · Feb 6, 2024

Found it: Rotation has to be set, too.
Now the hands are working, but have to fix the orientation.
Also checked the code from AcidBubbles Leap Motion, but not found the magic code.

MaryJane · Feb 8, 2024

Code above is reduced and updated.
I need help, can't get the rotation and the position right.

Trying to reach this values:
Local Position:
x= 0.1851619
y= 1.614938
z= 0.2535316

Rotation:
x= 47.01933
y= 263.1064
z= 82.36137

Position from WRIST:
x=4,63
y=-0,44
z=0

Position from MIDDLE_FINGER_TIP:
x= 4.34
y= 2.8
z=-0.2

BobArctor · Mar 25, 2024

The problem I've always seen with this tech is that hand tracking needs depth to feel right so you need vR/ar/3dtv to make It usable. And rn the tech is not far enough ahead of VR built in hand tracking to bother using unless you're running a game on the headset itself or have a multi cam solution. Honestly I would kill to have this withna good 3dtv or AR visor in an animation suite. People use VR to mocap but I'm moreso thinking of them as posable action figures. Or like pick up a character and directly mold the morphs with my hand instead of slider system that hasn't innovated in 25 years

Lurky · Jul 26, 2024

First post here, Hi!
I have zero experience in C# and unity but i still managed to run the scripts; the problem is i keep on getting the gui problems.
To be more precise, i run the python script, i start vam, new scene, i add a person, add the script/plugin to it, click on connect, it all goes normally until i rotate or pan in the scene with the mouse.
If i comment the line that sets the rotation, when i get the error the hand stops following the control, while the control keeps moving normally with the script; if then i move the control with other means (like, grabbing it) the hand returns to the control.
if i let the rotation instead the error does not seem to happen but i think it actually does, and the rotation fixes it automatically.
Now, the real problem is that the GUI goes crazy too: i close a menu, open another, and that is rotated and in a wrong position. Per Session. Also all sliders work weirdly. And no way to fix, unlike with the hand control.
Also, the "error" does not throw exceptions (you know, the red windows)
I think the problem is the "setLocalPosition" line because if i comment it i get no errors.
It's a real shame cause i managed to run a modified version using the legacy mediapipe pose detector directly, without cvzone, and while not perfect it is usable, with depth too

MaryJane · Dec 24, 2025

I am not giving up yet!
I started a new github page for that: https://github.com/MaryJaneVaM/VamBridgePrototype
What is done:

window camera plugin
- read window camera position and fov
- set window camera
- make screenshots
person plugin
- read the controllers
- set the controllers
- read the morphs
- set the morphs
bridge server that read and send the VaM data over TCP and send and read data over websocket
pose server that could detect the pose of an image and send the related data back over rest
camera debug client for testing
person debug client for testing
pose debug client for testing

Next steps:

creating a mapping debug client that sends various testdata and collects all controllers data
creating a calibration debug client that creates a calibration table
First test with a real world image.

PS: Everything is done in the last week. Now I need to rest.

Have a nice holidays.

Best regards
MaryJane

Experimental Webcam Hand Tracking with OpenCV and MediaPipe (only for development purpose until I get this working)

MaryJane

Member

MaryJane

Member

MaryJane

Member

BobArctor

New member

Lurky

New member

MaryJane

Member

Similar threads