Every quarter we have a “Making Things Day” where each employee is invited to work for one day on something innovative, maybe in a personal area of interest or just something different to what they do every day.
Notwithstanding the member of R&D that misinterpreted the day as “Baking Things Day” and produced their first cake (very tasty I might add, thanks Andy), I thought I would write about a project a couple of us worked on using the Microsoft Kinect.
The Kinect of course is an excellent example of how a natural user interface, or NUI, can be implemented, and as such it makes it very easy to use gestures and body movement to control things. So we decided we would try using it to control sound.
As we could not obtain a Kinect for Windows sensor in time, we used the Kinect Xbox sensor. One key difference is that the Windows sensor supports “Near Mode”, allowing it to be used with objects as close as 40 cm, while the Xbox sensor requires a minimum distance of 80 cm.
We already have an app (called simply Automap and available in the App Store) that allows the iPhone to work as a controlling device for Automap Server. The app allows you to use your iPhone to control any parameter of an Automap client (such as an effect plugin, virtual instrument plugin, DAW mixer, external MIDI device…) simply by configuring appropriate mappings in the Automap Server application (which runs on your PC or Mac).
So we thought the easiest way to get up and running would be to create a Windows application using the Kinect C++ API, adapting it to speak the same protocol as the existing iPhone device so that it could connect to Automap Server without too many modifications.
The Kinect SDK supports a vast array of sensor information, including depth frames (where each pixel in the frame is given RGB values and its distance from the sensor), skeletal tracking, microphone, speech recognition etc. We decided to base our implementation on the skeleton tracking API. The returned skeletal data updates at 30 frames per second, and each frame contains the 3D positions of 20 skeletal vertices (head, shoulder left, elbow left etc.) for each of up to 2 people in the scene. Additionally, up to 4 other people can be tracked, but only in passive mode, where only the position of their centre of mass is reported rather than their full skeletal data.
Fortunately, the Kinect SDK includes a sample application called Skeletal Viewer which tracks the image from the Kinect’s camera and superimposes the interpreted skeleton on top of the frame. We adapted this application, adding the SDKs for Automap and Bonjour (for the network communication), and a console window to output debug information in real-time.
We decided to use the vertical positions of the left hand and right hand as continuous controllers, and the left and right foot positions as toggles. Then we used Automap Server to map these as follows:
- LH vertical position → cutoff frequency
- RH vertical position → resonance
- LH foot tap 25 cm to left → next preset
- RH foot tap 25 cm to right → toggle reverb on/off
After playing with this for a bit, we thought it would be nice to add some rhythm, with the ability to start and stop it. So we mapped a hand clap to start/stop the transport, and used it to control playback of a simple loop.
- hand clap → start/stop transport
A hand-clap event was defined as the distance between the left hand and right hand vertices decreasing below 0.5m, provided this event has not occurred within the last second (to prevent spurious toggling).
Check out the video to see what we got up to!
Future directions could include:
- choice of a particular scale for the cutoff frequency rather than a continuous value
- enhancement of Automap Server to add a custom UI for the Kinect
- support for additional gestures
What ideas do you have for how a NUI could be used in next generation music and audio production?