Experimental Webcam Hand Tracking with OpenCV and MediaPipe (only for development purpose until I get this working)


I watched on Youtube following video:
It's a machine learning based hand detection, written in python.
He wrote a wrapper package for the MediaPipe called cvzone and created a UDP Stream for Unity.
The Hand detection looks very promising.

I thought this could work also in VAM.
Therefore I wrote and VAM Plugin that consumes that UDP Stream, see code below.

As Input also a video file is possible.
cap = cv2.VideoCapture(0) for webcam
cap = cv2.VideoCapture("anyvideofile.mp4") for video

Here is the Python code:
import cv2
from cvzone.HandTrackingModule import HandDetector
import socket

# Paramters
width, height = 1280, 720
#width, height = 1920, 1080

# Video or Webcam
cap = cv2.VideoCapture(0)
cap.set(3, width)
cap.set(4, height)

# Hand Detector
detector = HandDetector(maxHands=1, detectionCon=0.8)

# Communication
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
serverAddressPort = ("", 5052)

while True:
    # Get the frame from the webcam
    success, img = cap.read()

    # Hands
    hands, img = detector.findHands(img)

    data = []
    # Landmark values - (x,y,z)*21
    if hands:
        # Get the first hand detected
        hand = hands[0]
        # Get the landmark list
        lmList = hand['lmList']
        for lm in lmList:
            data.extend([lm[0], height - lm[1], lm[2]])
        sock.sendto(str.encode(str(data)), serverAddressPort)

    cv2.imshow("Image", img)

You need the following packages (exact version):
  • Python 3.7.1
  • cvzone 1.5.6
  • mediapipe
And here is the HandTracking.cs for VAM.
using System;
using System.Net;
using System.Net.Sockets;
using UnityEngine;

namespace VAMDev
    public class HandTracking : MVRScript
        FreeControllerV3 controller;
        protected UIDynamicButton connectToServer;
        protected UIDynamicButton disconnectFromServer;

        UdpClient clientData;
        int portData = 5052;
        public int receiveBufferSize = 120000;

        public bool showDebug = false;
        IPEndPoint ipEndPointData;
        private object obj = null;
        private AsyncCallback AC;
        byte[] receivedBytes;

        public override void Init()
                var connectToServer = CreateButton("Connect", false);
                var disconnectFromServer = CreateButton("Disconnect", true);
            catch (Exception e)
                SuperController.LogError("Exception caught in Init(): " + e);

        protected void ConnectToServerCallback()
                SuperController.LogMessage("Connected to server!");

            catch (Exception e)
                SuperController.LogError("Exception caught: " + e);

        protected void DisconnectFromServerCallback()
            SuperController.LogMessage("Disconnected from server.");

        public void InitializeUDPListener()
            ipEndPointData = new IPEndPoint(IPAddress.Any, portData);
            clientData = new UdpClient();
            clientData.Client.ReceiveBufferSize = receiveBufferSize;
            clientData.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, optionValue: true);
            clientData.ExclusiveAddressUse = false;
            clientData.EnableBroadcast = true;
            clientData.DontFragment = true;
            if (showDebug) SuperController.LogMessage("BufSize: " + clientData.Client.ReceiveBufferSize);
            AC = new AsyncCallback(ReceivedUDPPacket);
            clientData.BeginReceive(AC, obj);
            SuperController.LogMessage("UDP - Start Receiving..");

        void ReceivedUDPPacket(IAsyncResult result)

            receivedBytes = clientData.EndReceive(result, ref ipEndPointData);


            clientData.BeginReceive(AC, obj);

        void ParsePacket()
            string data = System.Text.Encoding.UTF8.GetString(receivedBytes);
            data = data.Remove(0, 1);
            data = data.Remove(data.Length - 1, 1);
            string[] points = data.Split(',');

            //RIGHT HAND WRIST
            Vector3 pointA = new Vector3(float.Parse(points[0]), float.Parse(points[1]), float.Parse(points[2]));
            Vector3 pointB = new Vector3(float.Parse(points[36]), float.Parse(points[37]), float.Parse(points[38]));

            Vector3 dir = pointA - pointB;
            Quaternion rotation = Quaternion.LookRotation(Vector3.Cross((dir), Vector3.up).normalized);

            controller = containingAtom.GetStorableByID("rHandControl") as FreeControllerV3;
            SetControllerState(controller, FreeControllerV3.PositionState.On, FreeControllerV3.RotationState.On);

            controller.SetLocalPosition(new Vector3(pointA.x, pointA.y, pointA.z).normalized);
        void OnDestroy()
            if (clientData != null)

        protected void SetControllerState(FreeControllerV3 controller, FreeControllerV3.PositionState positionState, FreeControllerV3.RotationState rotationState)
            controller.currentPositionState = positionState;
            controller.currentRotationState = rotationState;

But there are several issues:
  • x and y offset, the correct value is not that easy to find
  • can't get the finger working correctly, it's bending and streching all over the screen
  • breaking VAM UI after several runs
Found it: Rotation has to be set, too.
Now the hands are working, but have to fix the orientation.
Also checked the code from AcidBubbles Leap Motion, but not found the magic code.
Code above is reduced and updated.
I need help, can't get the rotation and the position right.

Trying to reach this values:
Local Position:
x= 0.1851619
y= 1.614938
z= 0.2535316

x= 47.01933
y= 263.1064
z= 82.36137

Position from WRIST:

Position from MIDDLE_FINGER_TIP:
x= 4.34
y= 2.8
The problem I've always seen with this tech is that hand tracking needs depth to feel right so you need vR/ar/3dtv to make It usable. And rn the tech is not far enough ahead of VR built in hand tracking to bother using unless you're running a game on the headset itself or have a multi cam solution. Honestly I would kill to have this withna good 3dtv or AR visor in an animation suite. People use VR to mocap but I'm moreso thinking of them as posable action figures. Or like pick up a character and directly mold the morphs with my hand instead of slider system that hasn't innovated in 25 years
First post here, Hi!
I have zero experience in C# and unity but i still managed to run the scripts; the problem is i keep on getting the gui problems.
To be more precise, i run the python script, i start vam, new scene, i add a person, add the script/plugin to it, click on connect, it all goes normally until i rotate or pan in the scene with the mouse.
If i comment the line that sets the rotation, when i get the error the hand stops following the control, while the control keeps moving normally with the script; if then i move the control with other means (like, grabbing it) the hand returns to the control.
if i let the rotation instead the error does not seem to happen but i think it actually does, and the rotation fixes it automatically.
Now, the real problem is that the GUI goes crazy too: i close a menu, open another, and that is rotated and in a wrong position. Per Session. Also all sliders work weirdly. And no way to fix, unlike with the hand control.
Also, the "error" does not throw exceptions (you know, the red windows)
I think the problem is the "setLocalPosition" line because if i comment it i get no errors.
It's a real shame cause i managed to run a modified version using the legacy mediapipe pose detector directly, without cvzone, and while not perfect it is usable, with depth too
