YouTube already tracks videos with the most favourites, but I’m interested in exploring what videos have the highest view-to-favourite ratios. Are there videos where almost every viewer favourites it?

Two caveats that will skew findings:

  1. You need to have an account to favourite a video on YouTube.
  2. Once a video becomes popular, this ratio will decrease dramatically (see the 1% rule).

I’m curious to see how useful this statistic is for discovering interesting videos. Similarly, I’m also interested in videos with the highest view to number-of-ratings ratio, view to comment ratio, and others. I wonder how much these ratios differ from the top favourites, top rated, most discussed, etc.

When I get time, I’ll have to take a dive into the YouTube API to find out. Just bookmarking this thought for now…

Also, I wonder what algorithm YouTube uses to determine the most popular videos. The concept reminds me of Flickr’s interestingness.

If you’ve read Part 1 and Part 2 of this series, you’ll recall that I said I was really passionate about analyzing my music listening habits. This is where Last.fm comes in.

In January 2005, I was completely addicted to creating intricate iTunes Playlists & Smart Playlists. However, I wanted to statistically analyze my play counts and other info but there was no means to do so. So I began looking for Internet-based tools and discovered Last.fm (formally Audioscrobbler).

Last.fm is often compared to Pandora. While Pandora recommends music for you based on whether or not you told it that you liked the previous song(s) it played, Last.fm automatically logs information about the music you’re listening to on your computer, finds people who’ve listened to similar music and uses this to make recommendations to you. As you’ve probably guessed, I like Last.fm better because recommendations are based on my natural listening habits. (For reference, here’s my Last.fm profile).

However, my primary reason for joining Last.fm was not to discover new music. Rather, it was because Last.fm made all my music listening data available to me in a format that I could write software to statistically analyze it.

And that’s exactly what I did. In August 2005, I created the website, How Do You Listen To Music?. It’s a very simple website that uses your Last.fm profile to generate statistics about your music listening habits. I’ve been using Last.fm over the last 3 years, and since then I’ve listened to over 40,000 songs on my computer, so I’ve got quite some data for analysis.

Highlights include the following:

  • My top 10 artists only make up 20% of the music I’ve listened to. (Very eclectic eh?)
  • My top 25 artists make up 35% of the music I listened to.
  • My top 50 artists make up 50% of the music I listened to.
  • My top 15 artists make up 50% of the ~22000 times I’ve listened to my top 50 artists.

In October 2004, I came across an article by Chris Anderson in Wired Magazine called The Long Tail. Since then I’ve watched the idea evolve and he’s actually turned it into a book by the same name. In a nutshell the notion of the Long Tail is that if you have unlimited shelf space, you can collectively sell more niche items compared to only selling popular items with limited shelf space. A typical example is how Amazon collectively sells more non-bestseller books than brick and mortar book stores that only carry sell bestsellers.

Anyway, the idea of the Long Tail was on my mind while creating this web application, so you’ll notice numerous references to it. Similarly, I also wanted to know if the 80-20 rule was in effect for my music listening habits (i.e. did 80% of the music I listen to come from my top 20% artists?).

So basically, Last.fm has provided me with a playground for accurate, detailed analysis of my music listening habits. And as I continue to use it, the data set grows richer.

Future Ideas

I’ve been really busy lately, but I would love to return to this project and evolve the statistics I generate. I would also like to create a version of this app for iTunes, which is where my primary music listening stats lie.

To answer the original question I posed – I added lots of music artists to Facebook since Facebook enables you to query how many people in your network listen to the same music you do. For example, 40 people in the University of Waterloo network also listen to music by Nobuo Uematsu – the mastermind behind music from video games like Chrono Trigger and Final Fantasy.

YouTube is also fascinating because you can find fan-made video of artists/songs including people performing cover versions.

So much analysis to be done, so little time!

Anyways, this is the end of my iTunes/Last.fm/Music Analysis series. I hope you’ve enjoyed this glimpse into my passion for music listening analysis.

If write some new applications or discover something incredible about music listening habits, don’t worry, I’ll be sure to blog about it.

Recall in Part 1 of this series I said “more on this later” when it came to music genres in iTunes. Take a quick look through your digital music collection. You’ll probably notice that the genres are often conflicting (e.g. Hip-Hop vs. Hip Hop), wrong, or nonexistent.

I used to manually update my ‘Favourites’ playlists. But of course, I always fell behind or had a hard time deciding whether or not a song was my ‘favourite’ song. To fix this, my plan was to create smart playlists that automatically picked my favourite songs in each genre (A favourite song is defined as a song rated 4/5 or 5/5 that I’ve listened to at least once).

But this was problematic for me since these playlists were dependent on correct genres. So, there was no two ways about it – my genres had to be accurate.

If you’ve got 1000’s of songs like I do, you’re likely aware that it would take a ridiculously long time to manually update the genre of every single song. Well, when I started this task I had 1000’s of tracks to update genres for. Now all my tracks are tagged with a series of genres and fit into at least one of the fifteen genres I’ve defined. Here’s my secret.

Wikipedia + FixMyGenres

If you peruse music artists on Wikipedia, you’ll notice that many artists have very rich music genres listed in the Background Information section (eg. Billie Holiday). Honky Tonk? Baroque Pop? West-Coast Gangster Rap? Torch Songs? Celtic Rock? There’s no way I could come up with such detailed genres as these. It was clear – I wanted these rich genres in iTunes.

Meet FixMyGenres . Instead of spending an obscene amount of months to manually update all the genres, I spent a few hours to write a piece of software that is now named FixMyGenres.

FixMyGenres simply asks for the name of an iTunes playlist and then proceeds to update the genre of each song in the playlist using the genres listed in the artist’s Wikipedia entry.

Why use every single listed genre for an artist? Well, artists change genres unpredictably. Nelly Furtado went from Folk/Pop (I’m Like a Bird) to Hip Hop/R&B (Promiscuous Girl).

So using FixMyGenres, all I had to do was create an iTunes playlist with all the music I wanted tagged from Wikipedia and the genre in iTunes was automagically updated right before my very eyes.

If you’ve got mostly mainstream artists in iTunes, I’d say you’d get an 85%-95% success rate with FixMyGenres. My music is fairly eclectic and I had about a 70% success rate. In any case, I went from 1000’s to 100’s of songs to update genres for.

Anyway, FixMyGenres was a fun little programming project. Furthermore, I got to refresh my regular expression skillz to parse out music genres from Wikipedia entries :)

Well, that’s my secret to tagging music genres in iTunes. If you’d like a copy of the FixMyGenres application, let me know and I’ll get it to you. Requirements are Windows + iTunes (Hopefully a Mac version in the future).

I haven’t had a chance to prepare a proper release for FixMyGenres, but when I do it will be made available as a free download somewhere.

If you’ve added me as a friend on Facebook, perhaps you’ve been wondering why I have so many artists listed in my Facebook profile. The answer is actually quite simple – I simply LOVE analyzing my music listening habits.

Set aside some time and let me take through my music listening analysis journey…

In fall 2003, the same year I began my university career, I discovered iTunes. Prior to, I had been following all the rave reviews for Mac version and simply could not wait to try it. After my first encounter, I loved it. Some of my friends complained that it was too slow but I always defended it.

What was it about iTunes that I love so much, you ask? Playlists. Specifically, Smart Playlists. Smart Playlists are simply amazing because they enable you to create playlists based on a variety of criteria and they automatically keep themselves updated!

Over the past 5 years, I’ve accumulated over 325 playlists in iTunes. It began innocently enough – creating a playlist for every album. But slowly, I became addicted to creating playlists. These days I’ve got Smart Playlists that capture my top 25 most played songs every season of the year.

There’s something magical about naturally listening to music then proceeding to view a very accurate analysis of your music listening habits.

Here’s how I’ve categorized my iTunes playlists:

  • Full Albums (~150 Playlists): Each playlist contains a complete album with all the tracks arranged in the correct order.
  • Partial Albums (~50 Playlists): Each playlist contains a partial album. Either songs were missing or they were too crappy to be included.
  • Bookmarks (~10 Playlists): Ever listen to music on random and come across a song you complete forgot about? This happens to me all the time, so I ‘bookmark’ these tracks. Each bookmark playlist has 50 songs.
  • Compilations (~30 Playlists): These are the best of the best. Cream of the crop. There are 10 songs per compilation and once a song is placed in a compilation playlist, it does not get removed. These playlists have enabled me to accurately analyze how music taste has changed over 5 years.
  • Genres (~15 Playlists): These Smart Playlists automatically aggregate all my iTunes music into the right genre(s). More on this later.
  • Favourites (~15 Playlists): For each genre, a favourite song is defined as a song I’ve rated 4/5 or 5/5 and that I’ve listened to at least once. As long as I rate my music, these playlists are automatically updated.
  • Top 10’s by Genre (~15 Playlists): For each genre, these are my handpicked top 10 favourite songs.
  • Reflection (~3 Playlists): These are playlists reserved for use during periods of immense thought. When I need to reflect on life, I like to listen to music from these playlists.
  • Smart Playlists (~20 Playlists): These playlists are updated in real-time based on my settings: Songs I’ve listened to but haven’t rated; Songs I haven’t listened to yet; Songs I’ve rated 4 or 5 star but listened to less than 10 times; Songs I’ve listened to over 25, 50, 100, 200, 300, 400 times.
  • Top 25’s by Season (~25 Playlists): Over the past 5 years, I’ve either been attending Univeristy or on co-op every four-month term. These Smart Playlists automatically define my Top 25 Most Played songs during each season of every year.

Sometimes you hear a song from your childhood on the radio and reminisce. Au contraire, the Top 25’s by Season playlists are like reminiscence on steroids. There are top 25 songs for season since fall 2003. I can walk around the same streets I used to years ago while listening to the exact same music I listened to in those times. How amazing is that? You haven’t experienced nostalgia until you’ve experienced intricate playlists like these.

And of course the “eureka” moment happened in summer of 2004 when I purchased my first iPod. Yup, that’s right – every single aforementioned playlist was automatically synchronized to my iPod. Now all these playlists now follow me wherever I go.

In January 2005, I installed the Last.fm plugin (formerly known as Audioscrobbler) on my computer to track my music listening habits. I already knew that I listened a wide variety of artists, but I wanted to know more about how I listen to music. According to Last.fm, I’ve since listened to over 30,000 songs on my computer. For reference, I have 4,333 songs on my primary computer.

A Brief Look at Radio

Although I rarely listen to radio, the consensus among my friends and family is that commercial radio stations generally play the same songs over and over again. It is likely that the music labels want to ingrain popular music in our brains so that we’ll remember to buy the artist’s CD when it is released. On the contrary, if radio stations always played new music, we would always be listening to something different. A River of Music, if you will. With the latter approach, it would be harder for terrestrial radio stations to accurately track what music was most popular, as the popularity of a song would significantly depend on the number of people listening to the station at the time the song was played. However, a river of music style of radio is seems more suitable for Internet-based radio stations (eg. Pandora) since a system could be built for listeners to flag/bookmark songs they liked.

The bottom line is that commercial radio in its current form is hit-driven. Most commercial radio stations simply play the “hits”, leaving little room for discovering new music. Before the advent of digital music, almost all the music sold to us was in the form of entire albums by a single artist/band. The idea of singles never really received significant attention from music labels.

A Long Tail For Our Music Listening Habits?

In the past, some of us may have recorded music from the radio onto a cassette. However, we were still dependant on the hit-driven nature of radio. With the advent of digital music, we’ve gained full control of the music we listen to. From burning a CD of our favourite songs to creating playlists for our mp3 players, we’ve redefined how we listen to music. The question is, are our music listening habits still hit-driven or do we listen to music in a more diverse manner? Is there a long tail for our music listening habits? If so, how much value is in the tail?

Being curious about my music listening habits, I wrote a small application to calculate some statistics about how I listen to music, based on the data I’ve submitted to Last.fm. It is important to note that since Last.fm tracks all audio played on my computer, error is introduced by non-music audio like podcasts and audio books.

Here are some of my findings:

Of the 30,000+ songs I’ve listened to, 20% came from my top 10 artists, 35% came from my top 25 artists, and 50% came from my top 50 artists. Half the music I listen to lies outside of my top 50 artists!

Other interesting stats include:

  • My top 50 songs make up 13% of all the music I’ve listened to
  • My top 16 artists make up 50% of the 15713 times I’ve listened to music by my top 50 artists
  • My top 21 songs make up 50% of the 4032 times I’ve listened to my top 50 songs
  • My top 50 albums make up 10% of all the music I’ve listened to

The tail of my music listening habits appears to be extremely valuable. I already knew listened to a variety of music but I had not anticipated that I listened to so much music outside of my top 50 artists.

As a contrast, consider the long tail of one of Britney Spears’ top fans on Last.fm, elguapo17.

His/her top 10 artists account for 74% of the 10000+ songs he/she has listened to (since December 2005). Interestingly, his/her top 50 artists make up a whopping 88% of the music he/she has listened to!

Other interesting stats include:

  • His/her top 50 songs make up 39% of all the music he/she has listened to
  • His/her top 2 artists make up 50% of the 8957 times he/she has listened to music by his/her top 50 artists
  • His/her top 19 songs make up 50% of the 3968 times he/she has listened to his/her top 50 songs
  • His/her top 50 albums make up 13% of all the music he/she has listened to

Comparatively, there is smaller value in this tail. The way elguapo17 listens to music is still largely hit-driven. I wonder, do all the top fans of an artist listen to music in a hit-driven manner? I’ve noticed that my Last.fm neighbours have similar music listening habits as me (as expected, I suppose).

It is interesting to note that for both elguapo17 and me, our top 50 albums comprise very little of the total number of songs we have listened to. Perhaps this is a testament to the possibility that we rarely listen to music by albums? Or maybe our music collections are vastly mistagged.

Are you a Last.fm user? If so, visit How Do You Listen To Music to find out how hit-driven your music listening habits are.

The Future Long Tail of Music

As more diverse music becomes easily available to us, I imagine the value in the tail will significantly increase. Perhaps this already happening – consider how successful independent bands like Wilco have become thanks to their Internet fans. But who knows, maybe our music listening habits are naturally hit-driven and I’m an outlier.