• Hi Guest!

    We are extremely excited to announce the release of our first Beta1.1 and the first release of our Public AddonKit!
    To participate in the Beta, a subscription to the Entertainer or Creator Tier is required. For access to the Public AddonKit you must be a Creator tier member. Once subscribed, download instructions can be found here.

    Click here for information and guides regarding the VaM2 beta. Join our Discord server for more announcements and community discussion about VaM2.
  • Hi Guest!

    VaM2 Resource Categories have now been added to the Hub! For information on posting VaM2 resources and details about VaM2 related changes to our Community Forums, please see our official announcement here.

Experimental Webcam Hand Tracking with OpenCV and MediaPipe (only for development purpose until I get this working)

MaryJane

Member
Joined
Sep 19, 2022
Messages
23
Reactions
38
Hi,
I watched on Youtube following video:
It's a machine learning based hand detection, written in python.
He wrote a wrapper package for the MediaPipe called cvzone and created a UDP Stream for Unity.
The Hand detection looks very promising.

I thought this could work also in VAM.
Therefore I wrote and VAM Plugin that consumes that UDP Stream, see code below.

As Input also a video file is possible.
cap = cv2.VideoCapture(0) for webcam
cap = cv2.VideoCapture("anyvideofile.mp4") for video

Here is the Python code:
Python:
import cv2
from cvzone.HandTrackingModule import HandDetector
import socket

# Paramters
width, height = 1280, 720
#width, height = 1920, 1080

# Video or Webcam
cap = cv2.VideoCapture(0)
cap.set(3, width)
cap.set(4, height)

# Hand Detector
detector = HandDetector(maxHands=1, detectionCon=0.8)

# Communication
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
serverAddressPort = ("127.0.0.1", 5052)

while True:
    # Get the frame from the webcam
    success, img = cap.read()

    # Hands
    hands, img = detector.findHands(img)

    data = []
    # Landmark values - (x,y,z)*21
    if hands:
        # Get the first hand detected
        hand = hands[0]
        # Get the landmark list
        lmList = hand['lmList']
        #print(lmList)
        for lm in lmList:
            data.extend([lm[0], height - lm[1], lm[2]])
        #print(data)
        sock.sendto(str.encode(str(data)), serverAddressPort)

    cv2.imshow("Image", img)
    cv2.waitKey(1)

You need the following packages (exact version):
  • Python 3.7.1
  • cvzone 1.5.6
  • mediapipe 0.9.0.1
And here is the HandTracking.cs for VAM.
C#:
using System;
using System.Net;
using System.Net.Sockets;
using UnityEngine;

namespace VAMDev
{
    public class HandTracking : MVRScript
    {
        FreeControllerV3 controller;
        protected UIDynamicButton connectToServer;
        protected UIDynamicButton disconnectFromServer;

        UdpClient clientData;
        int portData = 5052;
        public int receiveBufferSize = 120000;

        public bool showDebug = false;
        IPEndPoint ipEndPointData;
        private object obj = null;
        private AsyncCallback AC;
        byte[] receivedBytes;

        public override void Init()
        {
            try
            {
                var connectToServer = CreateButton("Connect", false);
                connectToServer.button.onClick.AddListener(ConnectToServerCallback);
                var disconnectFromServer = CreateButton("Disconnect", true);
                disconnectFromServer.button.onClick.AddListener(DisconnectFromServerCallback);
            }
            catch (Exception e)
            {
                SuperController.LogError("Exception caught in Init(): " + e);
            }
        }

        protected void ConnectToServerCallback()
        {
            try
            {
                InitializeUDPListener();
                SuperController.LogMessage("Connected to server!");

            }
            catch (Exception e)
            {
                SuperController.LogError("Exception caught: " + e);
            }
        }

        protected void DisconnectFromServerCallback()
        {
            OnDestroy();
            SuperController.LogMessage("Disconnected from server.");
        }

        public void InitializeUDPListener()
        {
            ipEndPointData = new IPEndPoint(IPAddress.Any, portData);
            clientData = new UdpClient();
            clientData.Client.ReceiveBufferSize = receiveBufferSize;
            clientData.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, optionValue: true);
            clientData.ExclusiveAddressUse = false;
            clientData.EnableBroadcast = true;
            clientData.Client.Bind(ipEndPointData);
            clientData.DontFragment = true;
            if (showDebug) SuperController.LogMessage("BufSize: " + clientData.Client.ReceiveBufferSize);
            AC = new AsyncCallback(ReceivedUDPPacket);
            clientData.BeginReceive(AC, obj);
            SuperController.LogMessage("UDP - Start Receiving..");
        }

        void ReceivedUDPPacket(IAsyncResult result)
        {

            receivedBytes = clientData.EndReceive(result, ref ipEndPointData);

            ParsePacket();

            clientData.BeginReceive(AC, obj);
        }

        void ParsePacket()
        {
            string data = System.Text.Encoding.UTF8.GetString(receivedBytes);
            data = data.Remove(0, 1);
            data = data.Remove(data.Length - 1, 1);
            //SuperController.LogMessage(data);
            string[] points = data.Split(',');
            //SuperController.LogMessage(points[0]);

            //RIGHT HAND WRIST
            Vector3 pointA = new Vector3(float.Parse(points[0]), float.Parse(points[1]), float.Parse(points[2]));
            //RIGHT HAND MIDDLE_FINGER_TIP
            Vector3 pointB = new Vector3(float.Parse(points[36]), float.Parse(points[37]), float.Parse(points[38]));

            Vector3 dir = pointA - pointB;
            Quaternion rotation = Quaternion.LookRotation(Vector3.Cross((dir), Vector3.up).normalized);

            controller = containingAtom.GetStorableByID("rHandControl") as FreeControllerV3;
            SetControllerState(controller, FreeControllerV3.PositionState.On, FreeControllerV3.RotationState.On);

            controller.SetLocalPosition(new Vector3(pointA.x, pointA.y, pointA.z).normalized);
            controller.SetLocalEulerAngles(rotation.eulerAngles);
        }
        void OnDestroy()
        {
            if (clientData != null)
            {
                clientData.Close();
            }
        }

        protected void SetControllerState(FreeControllerV3 controller, FreeControllerV3.PositionState positionState, FreeControllerV3.RotationState rotationState)
        {
            controller.currentPositionState = positionState;
            controller.currentRotationState = rotationState;
        }
    }
}

But there are several issues:
  • x and y offset, the correct value is not that easy to find
  • can't get the finger working correctly, it's bending and streching all over the screen
  • breaking VAM UI after several runs
 
Last edited:
Found it: Rotation has to be set, too.
Now the hands are working, but have to fix the orientation.
Also checked the code from AcidBubbles Leap Motion, but not found the magic code.
 
Code above is reduced and updated.
I need help, can't get the rotation and the position right.

Trying to reach this values:
Local Position:
x= 0.1851619
y= 1.614938
z= 0.2535316

Rotation:
x= 47.01933
y= 263.1064
z= 82.36137

Position from WRIST:
x=4,63
y=-0,44
z=0

Position from MIDDLE_FINGER_TIP:
x= 4.34
y= 2.8
z=-0.2
 
The problem I've always seen with this tech is that hand tracking needs depth to feel right so you need vR/ar/3dtv to make It usable. And rn the tech is not far enough ahead of VR built in hand tracking to bother using unless you're running a game on the headset itself or have a multi cam solution. Honestly I would kill to have this withna good 3dtv or AR visor in an animation suite. People use VR to mocap but I'm moreso thinking of them as posable action figures. Or like pick up a character and directly mold the morphs with my hand instead of slider system that hasn't innovated in 25 years
 
First post here, Hi!
I have zero experience in C# and unity but i still managed to run the scripts; the problem is i keep on getting the gui problems.
To be more precise, i run the python script, i start vam, new scene, i add a person, add the script/plugin to it, click on connect, it all goes normally until i rotate or pan in the scene with the mouse.
If i comment the line that sets the rotation, when i get the error the hand stops following the control, while the control keeps moving normally with the script; if then i move the control with other means (like, grabbing it) the hand returns to the control.
if i let the rotation instead the error does not seem to happen but i think it actually does, and the rotation fixes it automatically.
Now, the real problem is that the GUI goes crazy too: i close a menu, open another, and that is rotated and in a wrong position. Per Session. Also all sliders work weirdly. And no way to fix, unlike with the hand control.
Also, the "error" does not throw exceptions (you know, the red windows)
I think the problem is the "setLocalPosition" line because if i comment it i get no errors.
It's a real shame cause i managed to run a modified version using the legacy mediapipe pose detector directly, without cvzone, and while not perfect it is usable, with depth too
 
I am not giving up yet!
I started a new github page for that: https://github.com/MaryJaneVaM/VamBridgePrototype
What is done:
  • window camera plugin
    • read window camera position and fov
    • set window camera
    • make screenshots
  • person plugin
    • read the controllers
    • set the controllers
    • read the morphs
    • set the morphs
  • bridge server that read and send the VaM data over TCP and send and read data over websocket
  • pose server that could detect the pose of an image and send the related data back over rest
  • camera debug client for testing
  • person debug client for testing
  • pose debug client for testing

Next steps:
  • creating a mapping debug client that sends various testdata and collects all controllers data
  • creating a calibration debug client that creates a calibration table
  • First test with a real world image.
PS: Everything is done in the last week. Now I need to rest.

Have a nice holidays.

Best regards
MaryJane
 
Back
Top Bottom