Monthly Archives: April 2012

As promised in my previous post, here is some information that may prove useful for those attempting to write USB drivers for Saffire 6USB Mk I (MkII is audio class 2 compliant and should not need a driver on Linux).  Its not going to be that different for our other USB 1.1 products (Ultranova, VRM Box etc.) so could be extended for those devices later.

Before we get started, a word of warning – this is a work in progress and quite likely to be error prone, so please bear with us – we’ll try to correct any mistakes or omissions as they are discovered.

Finally, you may want to get access to a USB bus analyser, we find them incredibly useful for development!


Audio is transferred to and from Saffire 6USB on Interface 0, alternate setting 1.  Endpoint 0x01 is the output, transmitting four channels, interleaved in 24-bit little-endian format.  Endpoint 0x82 provides two channels of input in the same format.

Saffire 6USB runs from the USB SOF (start of frame) clock, as such it transfers a predictable number of samples for each 1ms USB frame.

At 48kHz, each packet contains 48 samples per channel.  At 44.1kHz, its not possible to transfer an integer number of samples per frame, so instead we transmit / receive nine transfers of 44 samples followed by one of 45 samples (hence transferring 441 samples every 10ms).

To read and set the sample rate, we use the same format as for USB audio class devices, documented in section and

After changing sample rates, it is advisable to wait for a few hundred milliseconds before attempting to start transfers again (the device needs to resynchronise its PLL).


MIDI is transferred on Interface 1, transmitting up to 8 bytes of data per packet on endpoint 0x03, and receiving up to 16 bytes on endpoint 0x84.  The format is raw MIDI, not USB class formatted.  The hardware does not process the stream in any way, it just passes it directly to / from the physical ports.

No data will be transferred unless the device has been set to configuration 1.


Below is a dump of the device descriptor, in case it proves useful!

    Device Descriptor   
        Descriptor Version Number:   0x0100
        Device Class:   0   (Composite)
        Device Subclass:   0
        Device Protocol:   0
        Device MaxPacketSize:   8
        Device VendorID/ProductID:   0x1235/0x0010   (unknown vendor)
        Device Version Number:   0x0100
        Number of Configurations:   1
        Manufacturer String:   1 "Focusrite Audio Engineering"
        Product String:   2 "Saffire 6USB"
        Serial Number String:   0 (none)
    Configuration Descriptor   
        Length (and contents):   64
            Raw Descriptor (hex)    0000: 09 02 40 00 02 01 00 80  F9 09 04 00 00 00 FF 00  
            Raw Descriptor (hex)    0010: 00 00 09 04 00 01 02 FF  00 00 00 07 05 01 01 4C  
            Raw Descriptor (hex)    0020: 02 01 07 05 82 01 26 01  01 09 04 01 00 02 FF 00  
            Raw Descriptor (hex)    0030: 00 00 07 05 03 03 08 00  01 07 05 84 03 10 00 01  
            Unknown Descriptor   0040: 
        Number of Interfaces:   2
        Configuration Value:   1
        Attributes:   0x80 (bus-powered)
        MaxPower:   498 ma
        Interface #0 - Vendor-specific   
            Alternate Setting   0
            Number of Endpoints   0
            Interface Class:   255   (Vendor-specific)
            Interface Subclass;   0   (Vendor-specific)
            Interface Protocol:   0
        Interface #0 - Vendor-specific (#1)   
            Alternate Setting   1
            Number of Endpoints   2
            Interface Class:   255   (Vendor-specific)
            Interface Subclass;   0   (Vendor-specific)
            Interface Protocol:   0
            Endpoint 0x01 - Isochronous Output   
                Address:   0x01  (OUT)
                Attributes:   0x01  (Isochronous no synchronization data endpoint)
                Max Packet Size:   588
                Polling Interval:   1 ms
            Endpoint 0x82 - Isochronous Input   
                Address:   0x82  (IN)
                Attributes:   0x01  (Isochronous no synchronization data endpoint)
                Max Packet Size:   294
                Polling Interval:   1 ms
        Interface #1 - Vendor-specific   
            Alternate Setting   0
            Number of Endpoints   2
            Interface Class:   255   (Vendor-specific)
            Interface Subclass;   0   (Vendor-specific)
            Interface Protocol:   0
            Endpoint 0x03 - Interrupt Output   
                Address:   0x03  (OUT)
                Attributes:   0x03  (Interrupt no synchronization data endpoint)
                Max Packet Size:   8
                Polling Interval:   1 ms
            Endpoint 0x84 - Interrupt Input   
                Address:   0x84  (IN)
                Attributes:   0x03  (Interrupt no synchronization data endpoint)
                Max Packet Size:   16
                Polling Interval:   1 ms

A number of our users have been asking for help using Saffire 6 USB on Linux.  Before we get to that, I thought it would be useful to clarify our interfaces status on Linux, then I’ll post up some information that will be useful for brave driver developers wanting to attack the devices that don’t work.

Please note that this is cobbled together from the back of my head, so might well be inaccurate – I’ll endeavor to correct and update it as best I can.

Finally, please understand that Focusrite does not officially support Linux.  Although some people are seeing positive results in the comments, and some of our products are “known to work”, your mileage may vary.  Good luck!

USB Audio Interfaces

Could work: Scarlett 2i2, 2i4, 8i6, 18i6, 6i6, 18i8, 18i20, Saffire 6 USB MkII (USB audio class 2.0 compatible), Forte and iTrack Solo.

Note that Forte’s display will not function on Linux as its content is rendered by a daemon running on the host.  I don’t think this should affect its operation as a sound card though.

Could work (but no driver): Saffire 6 USB, Novation nio 2|4

VRM Box will work as an audio device, with two outputs (headphones).  However, the VRM processing will not work, as this is embedded in the kernel mode driver on OS X and Windows and would be a very complex task to port (sorry, we have no plans to open source the VRM algorithm any time soon).

FireWire Audio Interfaces

Saffire Pro 40, Pro 24, Pro 24DSP, Liquid Saffire 56: may work via FFADO drivers.

Saffire Pro 40 (second revision): does not work with FFADO driver. These devices can be identified by serial number – any unit with a serial greater than PF0000100000 will not (currently) work.

Saffire, Saffire Tracker, Pro 26i/o, Pro 10 i/o: full support via FFADO drivers

Novation USB Controllers

Impulse, ReMOTE SL MkII (& ZeRO), ReMOTE SL (& ZeRO), Nocturn Keyboard: should work (USB class-compliant MIDI ports).  Note that Impulse and SL/ZeRO MkII have extra vendor-specific USB endpoints that will not work without a driver, these are for communication with the Automap server application which is not essential for MIDI control.

Launchpad: works (thanks to driver by Will Scott)

Nocturn: could work, though not USB class-compliant so would need a driver.  Could probably be adapted from the Launchpad driver with a trivial change of USB PID (see below).

Launchkey: should work – class compliant.

Launchpad S – class compliant!

Novation Synths

Ultranova: requires driver, not known if one exists.  Automap / plug-in editor interaction is complex (routing logic in the Windows / OS X drivers) but MIDI could work easily enough.

MiniNova: requires driver.  The MiniNova (and UltraNova) librarian and soon to be released editor have a back-door to communicate with the hardware (purely to prevent spam MIDI data clogging up your DAW), which can’t work with the class driver.  As with the Nocturn, UltraNova & Launchpad, the format for USB MIDI transfers is simply raw MIDI data (as opposed to the four-byte packeting of USB MIDI class data).

Xio & X-Station: should work (USB audio class-compliant ports).


Scarlett 18i6 1235 0x8000
Scarlett 8i6 1235 0x8002
Scarlett 2i2 1235 0x8006
Scarlett 2i4 1235 0x800A
Scarlett 6i6 1235 0x8012
Scarlett 18i8 1235 0x8014
Scarlett 18i20 1235 0x800C
iTrack Solo 1235 0x800E
Forte 1235 0x8010
Saffire 6USB (USB 2.0 version) 1235 0x8008
Remote 1235 0x4661
XStation               (old) 1235 0x0001
Speedio 1235 0x0002
RemoteSL + ZeroSL 1235 0x0003
RemoteLE 1235 0x0004
XIOSynth             (old) 1235 0x0005
XStation 1235 0x0006
XIOSynth 1235 0x0007
Remote SL Compact 1235 0x0008
nio 1235 0x0009
Nocturn 1235 0x000A
Remote SL MkII 1235 0x000B
ZeRO MkII 1235 0x000C
Launchpad 1235 0x000E
Saffire 6 USB 1235 0x0010
Ultranova 1235 0x0011
Nocturn Keyboard 1235 0x0012
VRM Box 1235 0x0013
VRM Box Audio Class (2-out) 1235 0x0014
Dicer 1235 0x0015
Ultranova 1235 0x0016
Twitch 1235 0x0018
Impulse 25 1235 0x0019
Impulse 49 1235 0x001A
Impulse 61 1235 0x001B
XIO Emergency OS Download Device 1235 0x1005
Nocturn Keyboard Emergency OS Download Device 1235 0x1012
Impulse bootloader 1235 0x1019
nIO DFU 1235 0x3201
6USB DFU 1235 0x3202
VRM Box DFU 1235 0x3203
Twitch DFU 1235 0x3218


Every quarter we have a “Making Things Day” where each employee is invited to work for one day on something innovative, maybe in a personal area of interest or just something different to what they do every day.

Notwithstanding the member of R&D that misinterpreted the day as “Baking Things Day” and produced their first cake (very tasty I might add, thanks Andy), I thought I would write about a project a couple of us worked on using the Microsoft Kinect.

The Kinect of course is an excellent example of how a natural user interface, or NUI, can be implemented, and as such it makes it very easy to use gestures and body movement to control things. So we decided we would try using it to control sound.

As we could not obtain a Kinect for Windows sensor in time, we used the Kinect Xbox sensor. One key difference is that the Windows sensor supports “Near Mode”, allowing it to be used with objects as close as 40 cm, while the Xbox sensor requires a minimum distance of 80 cm.

We already have an app (called simply Automap and available in the App Store) that allows the iPhone to work as a controlling device for Automap Server. The app allows you to use your iPhone to control any parameter of an Automap client (such as an effect plugin, virtual instrument plugin, DAW mixer, external MIDI device…) simply by configuring appropriate mappings in the Automap Server application (which runs on your PC or Mac).

So we thought the easiest way to get up and running would be to create a Windows application using the Kinect C++ API, adapting it to speak the same protocol as the existing iPhone device so that it could connect to Automap Server without too many modifications.

The Kinect SDK supports a vast array of sensor information, including depth frames (where each pixel in the frame is given RGB values and its distance from the sensor), skeletal tracking, microphone, speech recognition etc. We decided to base our implementation on the skeleton tracking API. The returned skeletal data updates at 30 frames per second, and each frame contains the 3D positions of 20 skeletal vertices (head, shoulder left, elbow left etc.) for each of up to 2 people in the scene. Additionally, up to 4 other people can be tracked, but only in passive mode, where only the position of their centre of mass is reported rather than their full skeletal data.

Fortunately, the Kinect SDK includes a sample application called Skeletal Viewer which tracks the image from the Kinect’s camera and superimposes the interpreted skeleton on top of the frame. We adapted this application, adding the SDKs for Automap and Bonjour (for the network communication), and a console window to output debug information in real-time.

We decided to use the vertical positions of the left hand and right hand as continuous controllers, and the left and right foot positions as toggles. Then we used Automap Server to map these as follows:

  • LH vertical position → cutoff frequency
  • RH vertical position → resonance
  • LH foot tap 25 cm to left → next preset
  • RH foot tap 25 cm to right → toggle reverb on/off

After playing with this for a bit, we thought it would be nice to add some rhythm, with the ability to start and stop it. So we mapped a hand clap to start/stop the transport, and used it to control playback of a simple loop.

  • hand clap → start/stop transport

A hand-clap event was defined as the distance between the left hand and right hand vertices decreasing below 0.5m, provided this event has not occurred within the last second (to prevent spurious toggling).

Check out the video to see what we got up to!

Future directions could include:

  • choice of a particular scale for the cutoff frequency rather than a continuous value
  • enhancement of Automap Server to add a custom UI for the Kinect
  • support for additional gestures

What ideas do you have for how a NUI could be used in next generation music and audio production?