# Fun with iTune Shuffle and Probabilities

I recently tagged and imported all my mp3 into iTune. I noticed then that there were lots of albums that I had only partially listened to and I decided to use the feature “Party Shuffle” to listen to my library randomly and eventually hear all the songs.

After a couple of weeks, I observed that some songs would reappear in the playlist and were picked twice. Over the weeks the frequency of “re-entry” songs increased with the direct consequence that new music was played less and less. Even though I had already realized that it would not be possible to hear all the songs with approach, I was still surprised by the “re-entry” rate, which I would have intuitively expected to be much lower.

I turned to probability to better understand the situation.

Let’s n be the size of my library. After t songs played randomly, the probability that a given song was played at least once is:

P( song played at least once ) = t / n.

Absolutely not! This probability can be computed with 1 – probability that the song was never played. This gives:

P( song played at least once ) = 1 – (( n-1 )/ n)  ^ t

More generally, the probability of a song having been played x times is given by the function

P( x ) = (1/n)^x * ( (n-1) / n )^(t-x) * C ( n, x  )

Where C(n,x) is the number of possible permutation. The expanded

P( x ) = (1/n)^x * ( (n-1) / n )^(t-x) *  n! / (n-x) ! x!

Note that the probability that the song was never played (x=0) is still (( n-1 )/ n)  ^ t.

After t songs, the sum P(0) + P(1) + … + P(t) = 1, which proves that the formula is correct.

The average number of songs played in the library after t songs, can be computed with

Avg. played

= n * P( song played at least once )

= n * ( 1 – ((n-1)/n)^t ) = n – (n-1)^t  / n^(t-1)

The “re-entry” rate, or the probability of hearing a new song can be computed with (n- avg. played) / n which is equivalent to the probability that a given song was never played P(x=0).

The graph bellows shows the probability that a song was never played for a library of 500 songs, after 0, 50, 100, etc. songs. It’s interesting to notice that the probability of new songs fall below 50% after about 300 songs. 