Monthly Archives: October 2012

Once upon a time, in the days when computers were mysterious and new, there was no simple way of making electronic musical instruments communicate with each other. Every manufacturer invented their own methods for co-ordinating the various bits and pieces that they sold. These would usually involve fragile bundles of wires passing analogue control voltages from one device to another. On reaching their intended devices, these voltages were amplified, manually scaled and offset in order to render them useful.

The pre-MIDI EMS VCS3 Cricklewood keyboard and Putney synthesiser in various stages of interconnectedness. In those days, it was considered acceptable to name one’s products after unprepossessing but well-to-do corners of Greater London. (Apologies for the suboptimal photography.)

The brutal-looking connector on the 1960s VCS3 is called a Jones connector. It supplies two power voltages and a ground to the keyboard. Two scaled control signals and an envelope trigger are generated and returned on separate terminals. Putney’s backplane has an array of jack sockets that allow control voltages to enter and leave.

Midi In

In response to this unhelpful situation, the MIDI Manufacturer’s Association [MMA], a consortium of mostly American and Japanese companies, agreed on a universal specification for a digital interface. This specification was driven entirely by two needs: to encourage interoperability between musical devices, and to keep cost to a minimum. The MMA settled on an asynchronous serial interface, because this reduced the complexity and cost of interconnection. It was specified to run at 31.25kHz, a number chosen because it is easily reachable by dividing 1MHz by a power of two. At the time, this choice rendered it incompatible with RS-232 (which can usually provide nothing between 19.2kHz and 38.4kHz), preventing existing computers from transmitting or receiving MIDI without extra hardware. MIDI may have ended up on computers only as an afterthought.

Data was communicated in one direction only over 5-pin DIN connectors, which were ubiquitous in the home audio market, and were therefore about the cheapest multi-pin connectors available. (They were so cheap, in fact, that the MIDI specification wantonly squandered two of the connector’s pins by leaving them unconnected: a move that would not be countenanced today.)

The data that travels on the MIDI interface was elegantly designed to embrace the feature set of contemporary microprocessors. Only 8-bit data was employed and, to save memory, no standard message could exceed three bytes in length. One bit of every byte was reserved to frame the data, giving rise to the 7-bit data limitation that causes annoyance today.

By design, MIDI embraced note gating information, key velocity, and standard controller messages for the pitch bend wheel, sustain pedal, and key aftertouch. A loosely-defined framework of controller messages was also provided so that other data could be conveyed besides this. The provision was made for almost every command to be assigned one of 16 separate channels, intended to allow sixteen different sounds to be controlled independently over the same physical cable.

The first MIDI devices emerged in 1983. Some unintentionally very amusing episodes of Micro Live were created that demonstrated the technology. The rest is history. Synthpop was essentially democratised by this inexpensive serial protocol. Dizzy with the possibilities of MIDI, musicians ganged their synthesisers together and began controlling them from the same keyboard to layer more and more voices, creating fat digital sounds that were very distinctive and dated very quickly. Artists that did not have the resources to employ professional studios with all their pre-MIDI equipment connected their Casio keyboards to home computers, running new software that enabled them to build up note sequences, and then quantise, manipulate, and replay them in a manner that would have been unthinkably expensive by any other means.

Midi thru

Here we are, nearly thirty years later. The processing power and capacity of a computer is around two million times as great as anything available for similar money in 1983. As a consequence, keyboard controllers, synthesisers, sequencers, and signal processing tools have advanced considerably. And yet, amidst all this change, MIDI still reigns supreme. As a basic protocol, it is just about fit for purpose. With our ten digits, two feet, and a culturally-motivated lack of interest in breath controllers, most of us are still trying to do the same things that we’ve always done in terms of musicianship. Although devices now produce much richer MIDI data at a faster rate, this is not a problem because MIDI is conveyed over faster physical interfaces (such as USB) so we can still capture it.

Aside from the musical data, MIDI has another weapon that has ensured its survival: it allows manufacturer-specific data transmissions. These System Exclusive messages opened a portal that allows modern devices to play with computers in ways that MIDI’s creators could not have imagined. To System Exclusive messages, we owe patch storage and editing software, remote software upgrades, and next-generation protocol extensions like Automap.

And yet … and yet, the specification shows its age to anybody who wants to do more or to delve deeper. MIDI is inherently a single-direction protocol, and its 7-bit data limitation results in an obsession with the number 128 that is now painfully restrictive: 128 velocity gradations; 128 programs in a bank; 128 positions a controller can take. Certain aspects of MIDI were poorly defined at the beginning, and remain unresolved three decades later.

Q. Middle C is conveyed by MIDI note number 60. Should we display this note to the user as C3 or C4?

A. Just choose one at random and provide the other as a configuration option.

Q. How much data might I expect a System Exclusive message to convey?

A. Oh dear, you went and bought a finite amount of memory. Good luck designing that MIDI merge box / USB interface.

Q. I’ve got a MIDI Thru splitter that is supposed to be powered from the MIDI OUT port of my keyboard. Why doesn’t it work?

A. Your keyboard manufacturer and your Thru box manufacturer have both bent the specification. If they’ve bent it in opposite directions, then your box won’t work as advertised.

Q. If the user doesn’t understand MIDI channels, and is attempting to transmit MIDI data on one, and receive MIDI data on another, what will happen?

A. The device at one or other end of the cable will end up back at the shop.

Q. I’m designing a new keyboard. Should my device support Active Sensing?

A. I don’t know. Should it?

Apart from all that, a lack of per-note control data annoys the creators of more expressive instruments. The standard’s rigid genuflection to Western 12-tone chromaticism is an irksome limitation to some (particularly those who use terms such as ‘Western 12-tone chromaticism’). The note model cannot properly handle single-note pitch effects such as glissandi. For devices that must accept or transmit a wide variety of control data, including us, the NRPN system constitutes a fairly unpleasant prospect, loaded with parsing irregularities and a padding-to-payload ratio of 2:1.

In retrospect, dealing with MIDI could have been made somewhat easier. The size of a single MIDI instruction depends on the contents of the first byte in a way that is neither obvious nor easy to derive, and the first byte may not necessarily be repeated in subsequent messages, which leads to a fairly onerous parsing process.

The authors of the USB MIDI specification went to the trouble of re-framing all the data into four-byte packages to simplify parsing. Unfortunately, they left a back door open to transmit an individual data byte where this was deemed essential. When is this essential? When you are deliberately trying to send malformed data that’s useless to the device at the other end. Or, to put it another way, never. The inevitable happened: one company now misframes even valid instructions, using this message capriciously to split up standard data into streams of single bytes. The USB MIDI parser thus becomes more, not less, complex, because it has to be able to support both the new four-byte frames and the old-fashioned variable length ones.

In honesty, it’s only slightly inconvenient. The MIDI parser that we embed into our hardware designs is about 640 bytes long. These are 640 very carefully arranged bytes that took several days and a lot of testing to prove, and all they do is allow a device to accept a music protocol invented in the early 1980s, but it might have been a lot worse. Indeed, it is worse once you start trying to respond to the data. Never mind: if even the pettiest problem stings us, we fix it properly. And if any fool could do MIDI properly, we’d all have to find alternative careers.

Midi out

There have been attempts, and occasionally there still are, to supplant MIDI with an all-new data format, but these seem doomed to obscurity and ultimately to failure. About twenty years ago, there was ZIPI; today, it’s nothing more than a Wikipedia page. mLAN attempted to place MIDI into an inexpensive studio network. In spite of very wide industry support, it had few adopters. With hindsight, the futurologists were wrong and the world took a different turn. Latterly, there’s the HD-MIDI specification, and Open Sound Control [OSC], soon to be re-christened Open Media Control. We’ve looked into these. I cannot remember if we are prevented from discussing our draft of the HD-MIDI spec, but we probably are. My one-sentence review therefore contains nothing that isn’t already in the public domain.

HD-MIDI promises to be improved and more versatile, and does so by adding complexity in ways that not everybody will find useful. OSC suffers from a superset of this problem: it’s anarchy, and deliberately so. The owners of the specification have been so eager to avoid imposing constraints upon it that it has become increasingly difficult for hardware to cope with it. The most orthodox interpretation of the specification has the data payload transmitted via UDP somewhere in the middle of a TCP/IP stack. (You think that MIDI’s 7-bit limitation creates too many processing overheads and data bottlenecks? Wait until you try TCP/IP as a host-to-device protocol!)

Networking protocols are fine for computer software, phone apps, and for boutique home-brew products, but they are somewhat impractical for a mass-market music device. Most musicians are not IT specialists. Those whose savoir faire extends only as far as the concept of MIDI channels cannot be expected to prevail in a world of firewalls, MAC addresses, subnet masks, and socket pairing. Ethernet being the mess that it is, there are at least two simpler ways of interfacing with computers by using old serial modem protocols, but most new operating systems have all but given up supporting these and the burden of configuration is, again, upon the user.

More severely, there is an interoperability problem. OSC lacks a defined namespace for even the most common musical exchanges, to the extent that one cannot use it to send Middle C from a sequencer to a synthesiser in a standardised manner. There are many parties interested in commercialising OSC, and a few have succeeded in small ways, but it wouldn’t be possible to stabilise the specification and reach a wide audience without garnering a consortium of renegade manufacturers for a smash-and-grab raid. The ostensible cost of entry to the OSC club is currently far higher than MIDI, too. Producing a zero-configuration self-powered Ethernet device, as opposed to a bus-powered USB MIDI device of equivalent functionality, would price us out of the existing market, exclude us from the existing MIDI ecosystem, and require a great deal more support software, and to what advantage? For OSC to gain universal acceptance, it will need to be hybridised, its rich control data combined with more regular musical events, embedded together in a stream of – you’ve guessed it. If we’re going to go through all that palaver, and more or less re-invent OSC as a workable protocol in our own club, why would we start with its strictures at all? This brings us back to the MMA, and the original reason for its existence. HD-MIDI, at least, has industry consensus. If it is sufficiently more effective than MIDI 1.0, it may yet form part of a complete next-generation protocol.

For all its shortcomings, we musicians and manufacturers cannot abandon MIDI. We have had thirty years to invent a better protocol and we have singularly failed. Some of us have already lost sight of what makes MIDI great, and we must strive to remind ourselves how we can make it better. Meanwhile, the very simplicity, flexibility, and ubiquity of MIDI 1.0 make it certain to be an important protocol for some time to come. With this in mind, I confidently predict that, in 2023, MIDI will still be indispensible, unimpeachable, and utterly, utterly everywhere.

In my previous post, I touched on the problems of attempting to copy an acoustic (or electroacoustic) instrument via a MIDI controller keyboard. In conclusion, there are a lot of challenges. We must have the serenity to accept the things we cannot change, and the bloodymindedness to change, or at least to challenge, the things that we can.

It’s time to put this into action, and consider the controller keyboard in more depth. In this posting, I will focus on the piano for two reasons. Firstly, it’s a case study for most acoustic or electroacoustic keyboard instruments because it shares all of their vagaries. Secondly, it’s the instrument with which most people are most familiar, and for which the greatest amount of repertoire exists.

Generally speaking, a MIDI controller keyboard gets its sensitivity to nuance in a fairly unsophisticated way: we keep to trusted mechanical designs. Thus, the speed of finger impact is still measured in the same way it was forty years ago, by counting the time interval between two switches being closed, and this is the only information we have.

Top left: a key mechanism that we use. Top right: the C key has been removed to reveal the two levers and switch membranes for the neighbouring key. Bottom left: Just the circuit board and membranes from the keyboard. Bottom right: the bare circuit board showing each pair of switch contacts underneath.

A keypress on a piano or keyboard constitutes a movement of about half an inch (call it 12.5mm). The key switches on a European keyboard mechanism that I tested actuate at 4.5mm and 7.5mm down a white note’s travel, so they can indicate the average speed of note travel over 3mm.

Pairs of switches are read at high speed: they have to be. In our higher-end controller keyboards, we scan each set of key contacts at 10kHz so that we can detect inter-contact times to a worst-case accuracy of about 200 microseconds. That’s pretty much the state of the art because, although the technology can go quite a lot faster, there are certain inescapable design problems that prevent anyone from doing so economically. Our older synthesisers are a bit slower than this: nuance is less critical when you’re playing an acid house bassline or a fat string pad. Nevertheless, it turns out that 10kHz is just about enough to convey the dynamic range of speeds that a pianist produces from a semi-weighted keyboard. Although weighted and hammer-action keyboards feel more luxurious, their terminal velocities are considerably lower. Thus they can be scanned at a more leisurely pace, so it’s generally less expensive to read them effectively.

We spend a long time designing representative velocity curves that feel right. Here’s one from the semi-weighed Impulse keyboard shown in our curve-designing software (every manufacturer who is serious about their craft grows their own). A colleague laboured over this curve for several hours, using different third-party synthesiser modules to develop and prove it:

The graph shows MIDI velocity values on the Y-axis, and inter-contact timings (‘m’ being short for milliseconds) on the X-axis. To produce a white note of velocity 100 (64h) from this curve requires a 5.5ms interval between the top and bottom key contacts. Black notes have their sensors arranged in the same physical places, but the different key size makes them shorter levers, so it takes a 4ms interval to register a velocity of 100. This subtlety is a pain: the black and white curves are always designed separately and, because it’s a matter of subjective feel, no hard rules can be used to relate them.

Advances …

At this stage, things should perhaps get more complicated. As I’ve discussed, real pianos possess a double escapement mechanism, meaning that there are two ways in which the hammer can be made to contact the string: one where the hammer gets a kick throughout the entire travel of the note, and another where the key nudges the hammer more gently over a much shorter distance. The piano deconstructed is a terrific resource with some fun animations of all this. The first form of attack is the most difficult to control: that’s why piano teachers tell their pupils that all the expression is to be found right at the bottom of the keys.

The initial speed of travel of a piano key being hit for the first time is more important than its later speed: you cannot decelerate the hammer once it’s been given a good shove. For a fast attack, the hammer would impact the string around the same time as the first key sensor would be triggered on an electronic keyboard. So, to get the timing and velocity more representative of a real instrument, having three key sensors would improve matters. An extra contact would be actuated just as the key is depressed, so an extra velocity curve would be generated at the top of the key. There would be some complicated interaction between the two velocity curves thus derived, involving an immediate response for fast initial attacks, and a simpler comparison of the two velocities for slower attacks.

I have never seen this design in practice – not even on some of the fancier Italian key mechanisms we’ve tried. Some of those key mechanisms are so lovely that they make me want to retire, take classes in cabinet making, and learn the complete sonatas of Beethoven, but they’re still based on two-contact systems. However, I learned to play on acoustic pianos. After years of coaching, I now approach the keys gently, and exploit the last fraction of an inch of travel to convey my intentions at the right time. I fear for learners playing exclusively on digital instruments, as they may get a surprise when confronted with a real instrument one day, only to find that they cannot get an even tone from it.

A third sensor would make the key mechanism more expensive to build, harder to scan, and the input data harder to process, would render velocity curves and the scanning firmware more troublesome to design, and it puts us into the region of diminishing returns. My inner piano player finds it a bit of a shame that my inner critic can demolish the idea so readily, but perhaps one day I’ll be in a position to experiment. Although it’s too obvious to patent, it might turn out to be a missing link.

If you’ve ever tried to play a real harpsichord, you’ll know how disorientingly high the action is, and how there’s nothing else quite like it. If a keyboard player wants to emulate an organ, harpsichord or a similar Baroque-era mechanism without velocity sensitivity, it would be far more authentic if the actuation for the note happened when the upper key sensor triggered. And yet, I don’t know of any manufacturer that does this: the sound always triggers at the bottom of key travel. This is presumably because a player does not generally want to adjust his or her style just to try a different sound. Nevertheless, it’d be interesting to know if there’s any commercial demand for sensor settings that allow a player to practise as if playing an authentically old instrument. Does anybody out there need an 18th Century performance mode?

(Update: Apparently Clavier do allow triggering from either the top or bottom contact on their Nord keyboards. It also improves the feel of vintage synth emulations. Even more reason why Novation might be overdue an obligation-free firmware update or two. Many thanks to Matt Robertson for this correction, and for being successful enough to own a Nord.)

… Off the end of a plank

There are a few other key mechanisms about. A delightful company called Infinite Response places a Hall Effect sensor underneath every key, so that their instantaneous positions can be monitored throughout the keypress and release. There’s a mode on their controllers so you can see this happening: as a key travels downward it provides a continuous position readout. It’s beautiful to see, and it must take a lot of fast, parallel processing. Their keyboards are priced commensurately, which is one of many reasons why I don’t own one. The problems with this keyboard are the same as the problems with other novel performance interfaces. Firstly, one’s synthesiser or data processing has to be as sophisticated and rich as the keyboard’s data output to make the investment worthwhile; secondly, one has to relearn musicianship skills that have already taken two decades to bring to a modest level in order to exploit these features. There isn’t enough time to re-learn music unless somebody pays you to do it.

In theory, we could already measure the release speed of the key. We actually collect the appropriate data, and MIDI possesses a standard method whereby this could be conveyed to the synthesiser. And yet, we don’t supply this information: all velocities are conveyed homogeneously. Why is this? There are three reasons, locked in a circular argument. Firstly, although a slow release sounds a little different from a fast one on a real instrument, musicians tend not to use it as an effect because the ear is far less sensitive to offset details than to onsets. Secondly, as release velocity is not supported by most controller manufacturers, hardly any synthesisers support it. Thirdly, if synthesisers don’t generally support release velocity, how do we design a curve for it?


Now I’ve given a glimpse of why our key mechanisms, and everyone else’s, are only precisely good enough for the job, I shall finish by turning my scattergun towards the next part of the signal chain: the latest piano synthesisers. There are still things I’ve never heard a piano synthesiser do. There are some wonderful keyboard mechanisms out there allied to cutting-edge, silicon-devouring modelling algorithms, but I haven’t yet heard a digital instrument that can seduce me away from the real thing. It’s not just sentimentality. Here’s an example of something that no digital piano can render properly: the last four bars of the piano part of Berg’s Vier Stücke for Clarinet and Piano Op.5.

The italic instructions to the pianist, for those whose German is as ropey as mine, are ‘strike inaudibly’ and ‘so quiet as to be barely heard’. The loud staccato clusters in the left hand set up a sympathetic resonance in the strings of the notes that the right hand is holding down. When the dampers finish their work, what remains is an ethereal, disembodied chord. Acoustic modelling just cannot render this yet. (He was a clever chap, Alban Berg. If there can be any silver lining to his tragic death in 1935, it’s that his works are now out of copyright.)

Because a digital piano synthesiser can’t reproduce this fragment of Berg, it cannot render anything correctly while the sustain pedal is being held down: there’s just not enough power to compute the resonances of every string interacting with every other. Those synthesisers that claim to model string resonances genuinely do so, but model only those strings that are being played, in mutual isolation. Real pianos aren’t so deterministic. This is why digital pianos still sound a little anaemic.

While we’re on the subject of the sustain pedal, it is an auditory event of its own on any real instrument. However, MIDI treats it as a control change message, so we never hear the warm fizz and the quiet wooden choonk as eighty-eight dampers disengage from their strings. We’re already modelling strings, a soundboard, and hammers, but a bit of mechanical noise and simulated felt adhesion are still too much to ask. Perhaps I haven’t researched this recently enough: it’s not so hard to blend a few samples. There seems to be a bit of an arms race going on in piano synthesiser verisimilitude, so things have probably changed recently. Can I download a Glenn Gould piano model yet, that hums along with the middle voice whenever I attempt to play Bach?

Let’s end positively. One thing I’ve heard some piano models begin to manage at last is the ability to flutter the sustain pedal carefully to mess about with the decay of notes. It’s an effect that has its place when used sparingly. It’s taken twenty years, but there may be hope for these algorithms yet.