A perennial question on messageboards, BBSs, and mailing lists of a certain kind is “what music do you tend to listen to when coding?” I’m a fairly lazy listener, so my answer tends to be “whatever’s on”. But Apple’s Genius Playlist feature has revolutionised the way I listen to music.
I’ve never really bothered with manual playlists – they take too much time to create, and have a short half-life before you get bored with them. The recently added smart playlist in iTunes is useful, because I often want to lsten to a new album the whole way through. But I reckon that’s a fairly uncommon desire. Unfortunately, because of the way the menus on your iPod are laid out, it’s quite often what people end up doing.
Genius playlists allow you to choose a track from your collection that suits your mood. It’ll then generate an ad-hoc playlist of twenty-five tracks from your collection that match the choice you made. The result is a couple of hours’ listening that matches your mood. It works on iTunes on your Mac or PC, and on your iPod. You can save generated playlists for later, and regenerate one if it’s not a very good choice.
I don’t know how Apple generate the track listing. When you first switch on the feature iTunes spends some time analysing the tracks in your collection, and talks to Apple about what you’ve got. It doesn’t seem to do any ‘rights’ analysis to see if you actually own the music in your library.
The thing is, the technology isn’t new. Audio analysis has been around for ages in the form of voice commands for computer systems (extant, but clearly not ubiquitous). British readers can use Shazam – call 2580 from your mobile phone, and the computer on the other end will listen to the music in the background for a little while, hang up, and text you back with the name of the track and the artist within a couple of seconds. It’s great fun to figure out how it works.
Amazon.com have been suggesting books you might like for years – they generate links between products based on what other people have looked at and bought. It’s a fairly good metric. Spam filters have been analysing documents for similar characteristics, although they generally only have two buckets for the results – spam, and not spam. Google Sets understands the links between collections of things – enter ‘John’, ‘Paul’, and ‘George’, and Google will tell you that the missing member of the set is ‘Ringo’ (the miserable sod).
The spam filter is the odd one out here. The cost of a false positive for Google Sets, Amazon’s recommendations, and Genius playlists is fairly low. Google don’t lose any money if they mistakenly thought you were naming the apostles. If you don’t take Amazon’s advice it’s really no skin off their nose (and if you buy something else instead, guess what? They get another data point and do better for the next guy). You can regenerate a Genius playlist you didn’t like, and what’s more: there’s a way that you can improve the results.
As well as showing a list of matching tracks in your collection, iTunes will also download a list of similar songs that you don’t own, and show you how you can buy them from the iTunes music store. All of these systems benefit better from more data. Google has plenty, so does Amazon. If you don’t have enough music in your iTunes library, it’s not Apple’s fault!
So, how might Apple be generating those track listings? I’ve got a couple of ideas:
- They might know the contents of peoples playlists. Assuming that tracks in a single playlist are similar, they’d create a weighting between those tracks.
- Similarly, they could do the same with tracks people bought in a single iTunes Music Store session.
- They might put a lesser weighting on the sum total of all the tracks that a single person has bought in the iTunes Music Store, based on the assumption that generally, different people listen to different kinds of music.
- They might do audio analysis on your music (or buy the data off someone else) – BPM, key instruments, genre, style, timbre, etc.
- They clearly know what music a person has in their iTunes library. I’d expect that they’ve got dynamic statistics too, like how highly people have rated various tracks, and how many times they’ve listened to a track. All this can be aggregated together, too.
Lots of those things seem ‘hand-wavey’, but bear in mind that iTunes has millions of users and assumptions based on aggregations of huge data sets often work. Ask yourself what constitutes a false positive entry in a playlist? And who decides?