"Correct" Music Note Structure?

Started by
8 comments, last by alvaro 11 years, 4 months ago
I am programming a basic MIDI sequencing tool, and am starting off with the core program and internal music representation, then moving to UI, and finally to interfacing with the audio output API (I don't even know what API I'm going to use, but something MIDI-related). I have written a bit of something for my internal representation of a musical note and of its channel. I am asking for advice on both my note and what API to use for MIDI audio development (real time efficiency must also be considered). Here is my note structure as of now:
[source lang="cpp"]struct note {
char midiVal;
/* MIDI value, or note value */
char timeStmp [3];
/* Time value, sometimes called timestamp, basically when the note will play on the timeline.
It is organized by mm:ss:ms, or minutes, seconds, milliseconds, respectively. As you guess,
there is an inherent limit of 99 minutes per song. */
char channel;
/* The software channel. There are 0-255 channels. Channels organize groups of similar notes, like a
"nation of notes" if you will. Channels can be manipulated as one whole unit or as individual notes. */
float physChannel;
/* The actual physical channel, of no relation to the previous data. It is, of course, left and right audio streams, and each note will
contain data for how much it will blend on either side of the audio stream. A positive number is right stream, negative is left. */
char duration [3];
/* The duration of the note based on the same timing conventions used earlier for timestamp data. Used to specify the duration,
and yet again, the limit for a single note's duration is 99 minutes. pah. */
char instrument;
/* Specifies, out of a bank of a mere possible 256, which instrument a not belongs to. */
char volume;
/* Dictates what volume a note will be. based off of a scale of 0-255, zero being mute, and 255 being
the mortal enemy of your grandmother. */
ULINT name;
/* Not really a name, more like a serial number. Anyway, this puts a limit on possible notes as well (do you like my limits?).
Now, there is a ceiling of only 4,294,967,296 notes in a piece, and that piece can be 99 minutes long. */
};[/source]
The code comments can be a bit basic at times, but they were written so that someone with no clue of what I am doing could pick up on at least a little of what I was talking about and recognize similarities with other API's. I didn't mean to offend anyone with dumb humor or elementary explanations on basic subjects.

EDIT: "ULINT" is an unsigned long integer, fyi.

C dominates the world of linear procedural computing, which won't advance. The future lies in MASSIVE parallelism.

Advertisement
There are several things I would do differently.

For starters, things will be much easier if you make your timestamps a more reasonable type. Say, a float expressing seconds from the beginning of the song, or perhaps an integer number of milliseconds. Similarly for the duration.

I would make the note value a float, to allow for microtonality.

I don't know if there is any reason why a note should know what channel it belongs to. Presumably there will be a struct "channel" which will contain the notes in that channel, and you don't need to repeat the information of what the channel is inside each note. It is possible that you have a good reason to do that, but I don't know what it is.

`physChannel' is not very descriptive. It is known as "pan" in MIDI, so perhaps you should consider changing its name to something like that.

I would probably have used a 16-bit integer for the instrument. I know there are synthesizers with more than 256 instruments, and it's probably not worth saving the extra byte, given that people have already bumped into this limit at some point.


Similarly to the channel, I don't think I would make the identifier a part of the note. If some container of notes wants to locate them using an identifier, that's great, but I think that should be part of the container, not the contained type. (In C++ I would perhaps store the notes in an object of type std::map<unsigned, note>, where the unsigned integers are the identifiers.)

struct note {
float note; // in semitones, with 60 being C4, like in MIDI
float timestamp;
float duration;
short instrument;
char volume;
char pan;
};
Oh, one more thing. You should reorder the elements in your structs from largest to smallest, or you may end up with a bunch of padding (unused bytes to guarantee proper alignment of some types) that might make your struct larger than it needs to be.
Thank you, Álvaro, for your good advice and information, it is very much appreciated. By the way, do you have any suggestions on which API I should use?

C dominates the world of linear procedural computing, which won't advance. The future lies in MASSIVE parallelism.

I have worked with MIDI a ton and long ago I made a tool that generated valid MIDI files (the tool’s goal was to algorithmically generate music—it worked but the music it generated sucked ass).

Álvaro is correct about everything but the time. The times/durations must be in a resolution no shorter than microseconds. They should be stored in ULINT.
Today’s software have resolutions of up to around 960 PPQN (possibly more) and 250 BPM, giving you a resolution as low as 4.166667 microseconds between events.

The variable-length time stamps inside the MIDI files store the number of ticks between each successive event. For efficient run-time performance you should convert all of these event time stamps into literal times, which is why you need to store raw microsecond values. Internally you will still need to maintain this tick-style format so that you can add/remove events reliably and change the tempo, etc., without losing precision, but before playing a song you should make a quick prepass to convert all those ticks into absolute times.


Volume only ranges from 0 to 127, by the way. Most MIDI events do.

For anything you think will be in the range from 0-255, use an [color=#000080]unsigned char, not a [color=#000080]char. No reason to bite yourself in the ass with a useless sign bit, especially when shifting things.

For the instrument patch you are storing severely too little information. My Yamaha MOTIF XF8 has 1,353 voices and this is fairly common these days.
You need to look into the MSB/LSB system for selecting banks and patches.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


Álvaro is correct about everything but the time. The times/durations must be in a resolution no shorter than microseconds. They should be stored in ULINT.
Today’s software have resolutions of up to around 960 PPQN (possibly more) and 250 BPM, giving you a resolution as low as 4.166667 microseconds between events.


That makes sense. I picked milliseconds because that's what he was using, according to his comments (although he was trying to fit the milliseconds field in a single byte...).

For anything you think will be in the range from 0-255, use an [color=#000080]unsigned char, not a [color=#000080]char. No reason to bite yourself in the ass with a useless sign bit, especially when shifting things.[/quote]

Oh, yes. This is important, and using `char' by itself is even worse than that: Whether `char' is signed or not depends on the compiler, so you should definitely say explicitly which one you want. Sorry I missed that.
Bit of a brainfart here, but hopefully something useful: smile.png

As L. Spiro says, don't store you times as floats or something like that, store than as ticks.

Sequencers commonly work on a scale of PPQN (pulses per quarter note), so if you adjust tempo (once off or gradually throughout a song) it just *works*. The PPQN values are usually things like 48,96,192 etc.

Bear in mind that if you are doing 4/4 music that's all good, but if you are using triplets, or groove, you'll want the PPQN divisible by 3, and with enough precision for your 'groove'.

You'll also probably want to store your note timings as offsets from the start of a pattern, rather than the start of the song. This way you can several instances of the same pattern at different parts in the song.

Also instead of storing things as e.g. char[3] to save space, it's probably more sensible just to make them 4 byte unsigned int / ints and keep your structures 4 byte aligned so you (or the processor) aren't faffing about for no reason. You can always compress them on import / export, if you really need to.

Another reason for PPQN is so you easily change the output sample rate (assuming you are going to do some audio instead of purely MIDI).

I've done several audio / sequencing apps and don't think I stored anything as floats. PPQN can be used to calculate the exact sample for an instrument to start / end (and you might precache this kind of info). You could possibly use something more accurate to get within sample accuracy for timing, but I've never bothered myself.

It's really worth using a plugin architecture for different components of a sequencing / audio app, I'd highly recommend it. You can make effects (reverb, delay, chorus etc) plugins, and instruments plugins. You could potentially also use VST plugins or similar if you can work out their interface (you may find some open source apps that have managed this).cool.png

I'm currently rewriting a sequencer / audio app I wrote a few years ago, and have actually moved to using plugins for things like quantization / legato / groove / argeggios. Have a think about whether you want to be able to do stuff like 'undo' quantization, keep original values, or have a modification 'stack' applied to notes.

I don't think you'll get the exact structures bang on first time, it's the kind of thing you write a first version, then realise there's a better way of doing it, redo it, etc etc. But it is fairly easy to get something usable. You may also spend as much time on user interface / editing features as the stuff 'under the hood'.

As for APIs, I have so far cheated and don't actually use MIDI input or output (although I have done that in the distant past and it wasn't that difficult I don't think). I have just been writing a MIDI file importer though refreshing my memory lol.

If you want realtime MIDI input you'll have to pay much more attention to latency and the APIs you use. I was just getting by with the old Win32 audio functions for primary / secondary buffers, but the latency is awful, so using direct sound or I think there may be a new API in windows 7 would be better. Sorry can't help yet in that as I haven't researched it myself yet.

Also I'd add, consider using direct3d or (in my case) opengl to accelerate the graphics side. This way you can easily show the position within a song without overloading the CPU and causing stalls and having your audio stutter.

Once you start doing the audio side a bit of SSE / SIMD stuff helps. And you have to think carefully about how you'll structure your tracks / sends to effects, to make it efficient but also customizable.
More stuff:

Note pitch: I'd stick with just a note number like MIDI for now, and the 12 note western scale. 99% of music is written like this, and handling other systems is a bit more advanced and something you can tap on later. Storing notes as float frequencies I wouldn't recommend for several reasons : accuracy (say you transpose down, then up later) .. the wavelengths don't have a linear relationship with note number. You might want to do operations based on the relative pitches of notes, or detect chords etc. All of this would be stupidly difficult just trying to store wavelength / frequencies. Besides the fact your source instruments may have different base frequencies anyway and these would need to be compensated for.

Pan: Why limit yourself to stereo pan? What about surround sound?

Channels / instrument info on a note: Would you want the note to determine this, or the track and / or pattern? Having a 'grouping' feature for notes can be useful though. Remember you are going to want to be able to do stuff like edit the instruments you are using quickly and easily, and not change this for every note.

What happens when by accident you set 2 bunches of notes to the same instrument ID (if storing on the notes?) you have then lost their 'individuality'. Better to store something else that then maps to the instrument.

Volume: This is usually key velocity rather than volume (there is midi volume as well, but you wouldn't store this per note, but as a separate event), which in midi is 0-127. There is also release velocity, which may or may not be used by the instrument.

There's also other stuff like pitch bend, aftertouch etc, which you can store as a separate event.

Note name / ID: Why try and store this on the note? If your pattern has e.g. an array or vector of 35 notes, then you know its ID as you access it.

An example to start with might be something like this:

class Note
{
public:
int m_iStartTime; // in PPQN. this could be negative? if you want some notes to start before the official start of pattern
unsigned int m_uiLength; // in PPQN
unsigned int m_uiKey; // e.g. like MIDI have middle C as 60
unsigned int m_uiVelocity; // 0-127?
unsigned int m_uiReleaseVelocity; // 0-127?
};

Once you have a simple system working then it will become more obvious where to add things.

To reiterate on the notes side of things, don't worry so much about space saving, just concentrate on simplicity. Note data doesn't tend to be that large. It's more when you get to the audio side you need to pay attention to the data structures / bottlenecks.

And rather than just having a struct-like class you can use accessor functions so the actual data underneath can be anything you want.
On MIDI, AFAIR track voice info is not bound to notes, but is controlled by channel events. And is compose by bank and voice.

Of course, since you're abstracting away the MIDI protocol, you could put that info on notes. But you'll have cope with those diferences when you're actually sending MIDI info to the MIIDI devices.

E.G. Sending the voice change event would require you to look ahead on the note stream to see if the next note has a different voice program, and actually send the voice change event a little earlier.

Note pitch: I'd stick with just a note number like MIDI for now, and the 12 note western scale. 99% of music is written like this, and handling other systems is a bit more advanced and something you can tap on later. Storing notes as float frequencies I wouldn't recommend for several reasons : accuracy (say you transpose down, then up later) .. the wavelengths don't have a linear relationship with note number. You might want to do operations based on the relative pitches of notes, or detect chords etc. All of this would be stupidly difficult just trying to store wavelength / frequencies. Besides the fact your source instruments may have different base frequencies anyway and these would need to be compensated for.


I still think I would use a float for this, but instead of the frequency it would be a number of semitones (which is 12*log2(frequency)+some_constant). You don't magically lose precision if you add and subtract integer values to integer values, even if those integers are stored in floating-point variables.

This topic is closed to new replies.

Advertisement