I recently tagged and imported all my mp3 into iTune. I noticed then that there were lots of albums that I had only partially listened to and I decided to use the feature “Party Shuffle” to listen to my library randomly and eventually hear all the songs.
After a couple of weeks, I observed that some songs would reappear in the playlist and were picked twice. Over the weeks the frequency of “re-entry” songs increased with the direct consequence that new music was played less and less. Even though I had already realized that it would not be possible to hear all the songs with approach, I was still surprised by the “re-entry” rate, which I would have intuitively expected to be much lower.
I turned to probability to better understand the situation.
Let’s n be the size of my library. After t songs played randomly, the probability that a given song was played at least once is:
P( song played at least once ) = t / n.
Absolutely not! This probability can be computed with 1 – probability that the song was never played. This gives:
P( song played at least once ) = 1 – (( n-1 )/ n) ^ t
More generally, the probability of a song having been played x times is given by the function
P( x ) = (1/n)^x * ( (n-1) / n )^(t-x) * C ( n, x )
Where C(n,x) is the number of possible permutation. The expanded
P( x ) = (1/n)^x * ( (n-1) / n )^(t-x) * n! / (n-x) ! x!
Note that the probability that the song was never played (x=0) is still (( n-1 )/ n) ^ t.
After t songs, the sum P(0) + P(1) + … + P(t) = 1, which proves that the formula is correct.
The average number of songs played in the library after t songs, can be computed with
= n * P( song played at least once )
= n * ( 1 – ((n-1)/n)^t ) = n – (n-1)^t / n^(t-1)
The “re-entry” rate, or the probability of hearing a new song can be computed with (n- avg. played) / n which is equivalent to the probability that a given song was never played P(x=0).
The graph bellows shows the probability that a song was never played for a library of 500 songs, after 0, 50, 100, etc. songs. It’s interesting to notice that the probability of new songs fall below 50% after about 300 songs.