User Tools

Site Tools


Advanced Gesture Mapping

Gesture Mapping is the process of associating “gesture actions” such as hand 3D-pose and hand 3D-motion to specific output or control events. There are a variety of gesture-mapping methods that can be used to strategically manage sets of gestures (with similar properties) by grouping actions, which can take the form of grouped motions or grouped configurations, such as poses.

User Pose Variation & Ambiguity

Advanced gesture mapping methods can be used to manage a variety of gesture conflicts and gesture ambiguities, such as: pose ambiguity from tracking errors due to motion blur, hand self-occlusion or pose confusion from similarity and user error. When working with rich gestures (where high degrees of freedom are available) there is a greater chance of variability in the performance of gestures from one user to another. This is especially true of 3D motion gestures as there are fewer physical constraints on motion or pose, unlike surface touch gestures which limit motion to the plane of the 2D surface or hand-held gamepad controllers with mechanical buttons that limit motion. As a result, there is greater variation in user-performed actions (poses and motion) within bare-hand motion gestures.

To manage this variability, similar poses can be mapped to a single, common pose state that can be treated as a unified gesture action. For example: a simple “pinch” 3D hand motion gesture is commonly performed in several different ways. When using an interface for the first time, an index-thumb-open-hand-pinch may yield the same result as an index-middle-thumb-pinch or even a full-hand-grab gesture.

Most hand tracking methods allow for simple, automated customizations which modify the base skeleton to better match the user. This can improve pose and gesture recognition confidence values so that similar poses have improved separability. With these improvements, when the user operates the system for a greater period of time, clear patterns of preferred use emerge that can indicate which exact pose (or set of poses) will be exclusively more effective for the user. As a result, the weighted probability of other (similar) poses can have a significantly lower chance of triggering events. This would automatically re-open these alternate poses (and gestures) as a refined subset of gestures which can be reserved for other discrete tasks or provide reliable micro-gesture transition paths.

Additionally, there are other poses that can be strategically grouped to create elegant degradation pathways to ensure user success. For example: pointing poses and trigger poses can be grouped to create a set of effective pointer gestures which can be used to control an on-screen cursor or select 3D objects from a distance.

Various types of tap motions along with tip-tap motions can be mapped to a single tap-select gesture which can then be used in similar fashion to click-event selections.

Multiple, richly-defined 3D hand poses, held in place for a short period of time, can be used to generate hold gestures. These can be mapped to “right-click” system events or other events that are used to access sub-controls such as menus or mode-switching methods such as enabling tools.

3D motion gestures are particularly well suited for interactions with (virtual) 3D objects. Fast gathering methods that do not require explicit selection can be used to move objects or groups of objects with rapid actions that require less coordination than proper manipulations. A balance of pose and motion properties can be used to build a group of actions that map to a single move gesture event which can be used interchangeably to bump or gather objects to one side.

In summation, this “gesture cross-mapping” approach can be described as providing “strong, redundant, default pose and motion mapping” that ensures elegant gesture-degradation channels. When used as part of a supervised learning or training system, cross-mapping can, over time, provide an expanded, fully separable, set of poses and gestures that can be used to confidently define distinct gesture subsets as clear patterns of natural user preference emerge.

Bi-manual Pose Cross-Mapping

Tracking the pose and motion of both hands can create a whole new set of 3D motion gestures. These gestures allow for each hand to be considered independently or as a coordinated group to create rich bi-manual input methods.

Classic multi-channel input schemes, such as the mouse and keyboard, leverage bi-manual controls to provide high-bandwidth input. When users operate a desktop mouse and keyboard the left hand can directly use the keyboard while the right hand can manipulate the mouse controls. When using applications, such as a vector image editors, the right hand can control the cursor position (drawing tool) while the left hand controls tool selection. This means that users can easily leverage “pro” shortcut methods to hot swap tools without having to move the cursor off the canvas to the tools selection panel which would otherwise take additional time and effort. Using the left hand to qualify the input from the right hand in this manner provides a method for real-time context-qualified controls.

Other bi-manual controls use duplicated poses between hands to create an interaction-point pair. This “typed” pair can be treated as a unique cluster which can be analyzed for relative translation, rotation and separation to provide reliable direct manipulation tools.

In the example above, all pair types have been mapped to the same manipulation gesture. This way, users can create a pair of interaction points that can be used to manipulate a 3D object using a variety of well defined poses reducing the need for the user to remember an exact pose.

For more information about gesture control schemes and detailed application mapping for gaming, desktop and virtual reality (VR/AR/MR), see

Micro-Gesture index

gestures/fusion/gesture_mapping_index.txt · Last modified: 2019/01/29 19:06 (external edit)