The Universe's favourite number
The universe has a favourite number, and what's more: the secret service use this fact to catch crooks! Don't believe it? Read on....
Let's run a little experiment. Take the first 50 articles on the Naked Scientists website. Write down every number that appears in an article and take the first digit. We won't include dates because they have been too heavily influenced by people, and not the universe.
After listing all of these numbers what would you expect the distribution to look like? Perhaps, you would expect to see an even spread of 1s, 2s, 3s...etc. In fact, the universe seems to fundamentally prefer the number 1. The articles featured more 1s than anything else, followed by 2s, decreasing in frequency all the way down to 9.
Number of appearances
*Our sample is pretty close to the figures that Benford's law suggests. If we kept counting articles, we would have got closer and closer to the theoretical distribution.
When Frank Benford first discovered this strange phenomenon, now known as Benford's law, he thought that he must have made a mistake. "How can the universe have a favourite number?" he thought, and so went on a mission to collect data. He collected lengths of rivers, population sizes, molecular weights, physical constants, baseball statistics and any set of numbers that he could get his hands on. Frank was so obsessed with understanding the universe's preference, that by the time he came to write his first paper describing the phenomenon, he had collected over 20 000 observations. Startlingly, nearly all of them followed the pattern that around 30% of the numbers in each sample started with a 1, and around five percent of numbers started with a 9. This means that there were six times more 1s than 9s!
Fundamentally, the universe seems to prefer the number 1. We'll get to why later, but first - let's look at how this fact can be used to catch criminals.
Suppose that a group of gangsters are using a restaurant as a front for their criminal activities. To make their illegal income seem legitimate, every evening they create a series of fake customers that all pay different amounts of money to the restaurant. They then pay the fake customers bills using money that was earned illegally. To an outsider, these transactions look like any ordinary cash payment between a customer and the restaurant. Unless, that is, you use Benford's law.
Bank accounts follow Benford's law almost exactly, but the gangsters aren't aware of this. Just like most people, when they choose "random" numbers to cook their books, they make sure that they are dispersed evenly, representing 1s just as often as 9s. Unfortunately for the gangsters, when a forensic accountant comes along and analyses the restaurant's books they realise that they do not obey Benford's law and this immediately flags up the gangsters' fraudulent activities.
Evidence obtained using Benford's law can be, and has been, used as evidence in court. In fact, this same technique can be used in all sorts of other areas where fraudulent activities are possible. This could include clinical trials that are suspected of being tampered with or even elections that are alleged to be rigged. It has even been suggested that before Greece joined the Euro, their finances were manipulated so that they would meet the criteria for being part of the monetary union, as the numbers didn't follow Benford's law. Whatever the situation, if the universe's preference for lower first digits is not upheld, foul play may be afoot.
But why does the universe have a preference at all?
Baffled by Benford
One of the simplest ways to understand Benford's law is by imagining a town with a population of 100 people. For the town to reach 200 people, and change the first digit of its population size, it would have to double, or in other words increase by 100%. But for the town to increase from 200 to 300 people, only requires an increase of 50%, making it a lot easier. This means that populations tend to stick around in the low 100s, 1000s, 1 millions and quickly move past the higher numbers. So when someone like Frank Benford looks at a collection of town population sizes, he finds a preference for town sizes beginning with the number 1. What is so magical about Benford's law is that this same argument seems to apply to all data, not just populations.
There are some other great explanations for Benford's law, and I have included one of my favourites as a special bonus at the end of this article.
Every Day I'm Shufflin'
Many people find Benford's law pretty surprising, which is exactly why it is useful for catching fraud. The reason it is so surprising is that people are just not very good at telling whether something is random or not. So presented with the vast collection of the world's data our intuition tells us to expect the same number of 1s as 9s, but this just isn't the case. To top it off, our intuition is so strong, that when shown otherwise we often choose to ignore the evidence.
In the early days of the iPod, the Apple team introduced a mathematically perfect shuffle feature. Once shuffle was activated, you could be absolutely certain that your playlist would contain your favourite music in a truly random order. Unfortunately, the shuffle feature was mathematically random, not intuitively random, leading users around the globe to complain about the randomness of the shuffle function. Apple was forced to reprogram the feature, and Steve Jobs responded with the famous line: "we're making it less random to make it feel more random."
Benford's law tells us that people aren't so good at faking random chance, so our fictional gangsters should take this advice from Steve Jobs, and make their accounts look more random by making them feel less random.
For the Maths Enthusiasts...
Another explanation of Benford's law involves imagining a raffle. To win a prize in the raffle, you have to choose a ticket numbered with a leading digit of 1, i.e. 1, 10, 11, 12...etc. Suppose that there are only two tickets in the raffle, labelled 1 and 2. The probability of choosing the ticket numbered 1, which is the only winning ticket, is 50%. If you increase the number of tickets to three, then the probability of winning drops to 33%, and if you increase the number of tickets to four, then the probability of winning drops to 25%, since the ticket labelled 1 is still the only winning ticket. Continuing in this manner, by the time you have nine tickets the probability of choosing a winning ticket has dropped to 11% or 1 in 9. But when you add the tenth ticket to the raffle the probability of choosing a winning ticket increases to 20% or 2 in 10, because there are now two winning tickets: 1 and 10.
Adding ticket number 11 increases the probability of winning to 27% and adding ticket number 12 increases the probability of winning to 33%. The probability of winning then continues to increase until there are nineteen tickets in the raffle, and eleven winning tickets (the tickets labelled 1, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19) meaning that the probability of winning is 58%.
Unfortunately, this is where the probability of winning goes down hill. All of the tickets between 20 and 99 are not winning tickets. This means that increasing the raffle size to twenty, decreases the probability of winning to 55%, and increasing the raffle size to twenty-one decreases the probability of winning to 52%. This trend continues all the way to a raffle with ninety-nine tickets, where the probability of winning is just 11%.
But when you add the one hundredth ticket, the probability of winning goes up again. This pattern continues indefinitely, with the probability of choosing a winning ticket increasing and decreasing, as you add more tickets.
When picking a random selection of data, it's like having a raffle without knowing the total size of the raffle, and so we have to calculate an average. And it is by making this calculation that you end up with a winning probability of 30%, or in other words 30% of the numbers begin with a 1.