One number keeps appearing throughout the universe...
The universe has a favourite number, and whatís more: the secret service use this fact to catch crooks! Donít believe it? Read on....
Letís run a little experiment. Take the first 50 articles on the Naked Scientists website. Write down every number that appears in an article and take the first digit. We wonít include dates because they have been too heavily influenced by people, and not the universe.
After listing all of these numbers what would you expect the distribution to look like? Perhaps, you would expect to see an even spread of 1s, 2s, 3s...etc. In fact, the universe seems to fundamentally prefer the number 1. The articles featured more 1s than anything else, followed by 2s, decreasing in frequency all the way down to 9.
*Our sample is pretty close to the figures that Benfordís law suggests. If we kept counting articles, we would have got closer and closer to the theoretical distribution.
When Frank Benford first discovered this strange phenomenon, now known as Benfordís law, he thought that he must have made a mistake. ďHow can the universe have a favourite number?Ē he thought, and so went on a mission to collect data. He collected lengths of rivers, population sizes, molecular weights, physical constants, baseball statistics and any set of numbers that he could get his hands on. Frank was so obsessed with understanding the universeís preference, that by the time he came to write his first paper describing the phenomenon, he had collected over 20 000 observations. Startlingly, nearly all of them followed the pattern that around 30% of the numbers in each sample started with a 1, and around five percent of numbers started with a 9. This means that there were six times more 1s than 9s!
Fundamentally, the universe seems to prefer the number 1. Weíll get to why later, but first Ė letís look at how this fact can be used to catch criminals.
Suppose that a group of gangsters are using a restaurant as a front for their criminal activities. To make their illegal income seem legitimate, every evening they create a series of fake customers that all pay different amounts of money to the restaurant. They then pay the fake customers bills using money that was earned illegally. To an outsider, these transactions look like any ordinary cash payment between a customer and the restaurant. Unless, that is, you use Benfordís law.
Bank accounts follow Benfordís law almost exactly, but the gangsters arenít aware of this. Just like most people, when they choose ďrandomĒ numbers to cook their books, they make sure that they are dispersed evenly, representing 1s just as often as 9s. Unfortunately for the gangsters, when a forensic accountant comes along and analyses the restaurantís books they realise that they do not obey Benfordís law and this immediately flags up the gangstersí fraudulent activities.
Evidence obtained using Benfordís law can be, and has been, used as evidence in court. In fact, this same technique can be used in all sorts of other areas where fraudulent activities are possible. This could include clinical trials that are suspected of being tampered with or even elections that are alleged to be rigged. It has even been suggested that before Greece joined the Euro, their finances were manipulated so that they would meet the criteria for being part of the monetary union, as the numbers didnít follow Benfordís law. Whatever the situation, if the universeís preference for lower first digits is not upheld, foul play may be afoot.
But why does the universe have a preference at all?
Baffled by Benford
One of the simplest ways to understand Benfordís law is by imagining a town with a population of 100 people. For the town to reach 200 people, and change the first digit of its population size, it would have to double, or in other words increase by 100%. But for the town to increase from 200 to 300 people, only requires an increase of 50%, making it a lot easier. This means that populations tend to stick around in the low 100s, 1000s, 1 millions and quickly move past the higher numbers. So when someone like Frank Benford looks at a collection of town population sizes, he finds a preference for town sizes beginning with the number 1. What is so magical about Benfordís law is that this same argument seems to apply to all data, not just populations.
There are some other great explanations for Benfordís law, and I have included one of my favourites as a special bonus at the end of this article.
Every Day Iím Shuffliní
Many people find Benfordís law pretty surprising, which is exactly why it is useful for catching fraud. The reason it is so surprising is that people are just not very good at telling whether something is random or not. So presented with the vast collection of the worldís data our intuition tells us to expect the same number of 1s as 9s, but this just isnít the case. To top it off, our intuition is so strong, that when shown otherwise we often choose to ignore the evidence.
In the early days of the iPod, the Apple team introduced a mathematically perfect shuffle feature. Once shuffle was activated, you could be absolutely certain that your playlist would contain your favourite music in a truly random order. Unfortunately, the shuffle feature was mathematically random, not intuitively random, leading users around the globe to complain about the randomness of the shuffle function. Apple was forced to reprogram the feature, and Steve Jobs responded with the famous line: ďwe're making it less random to make it feel more random.Ē
Benfordís law tells us that people arenít so good at faking random chance, so our fictional gangsters should take this advice from Steve Jobs, and make their accounts look more random by making them feel less random.
For the Maths Enthusiasts...
Another explanation of Benfordís law involves imagining a raffle. To win a prize in the raffle, you have to choose a ticket numbered with a leading digit of 1, i.e. 1, 10, 11, 12...etc. Suppose that there are only two tickets in the raffle, labelled 1 and 2. The probability of choosing the ticket numbered 1, which is the only winning ticket, is 50%. If you increase the number of tickets to three, then the probability of winning drops to 33%, and if you increase the number of tickets to four, then the probability of winning drops to 25%, since the ticket labelled 1 is still the only winning ticket. Continuing in this manner, by the time you have nine tickets the probability of choosing a winning ticket has dropped to 11% or 1 in 9. But when you add the tenth ticket to the raffle the probability of choosing a winning ticket increases to 20% or 2 in 10, because there are now two winning tickets: 1 and 10.
Adding ticket number 11 increases the probability of winning to 27% and adding ticket number 12 increases the probability of winning to 33%. The probability of winning then continues to increase until there are nineteen tickets in the raffle, and eleven winning tickets (the tickets labelled 1, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19) meaning that the probability of winning is 58%.
Unfortunately, this is where the probability of winning goes down hill. All of the tickets between 20 and 99 are not winning tickets. This means that increasing the raffle size to twenty, decreases the probability of winning to 55%, and increasing the raffle size to twenty-one decreases the probability of winning to 52%. This trend continues all the way to a raffle with ninety-nine tickets, where the probability of winning is just 11%.
But when you add the one hundredth ticket, the probability of winning goes up again. This pattern continues indefinitely, with the probability of choosing a winning ticket increasing and decreasing, as you add more tickets.
When picking a random selection of data, itís like having a raffle without knowing the total size of the raffle, and so we have to calculate an average. And it is by making this calculation that you end up with a winning probability of 30%, or in other words 30% of the numbers begin with a 1.
I don't see an article on that page. dlorde, Thu, 20th Nov 2014
There is, of course, a number on the page, or more precisely a date.
As we have ten fingers, and rarely use leading zeroes, the number 1 is likely to appear before the decimal point more often than any other, though the "anomaly" decreases with increasing numbers of significant digits.
Benford's Law and the explanations appear to be consistent with the principle that it is not the values themselves that occur with approximately equal chance, but rather their logarithms that do. That such would be generally the case across a broad range of phenomena is not unreasonable, inasmuch as it appears to be generally true that any phenomenon that occurs in various values will tend to have a range of values that is a small multiple of or small divisor of the mean. Thus, atom's radii will range from roughly 1 to 4 or 5 times that of the hydrogen atom, but not several million times. Likewise, the radii of galaxies might typically cover a range of 1 to 10 times that of the smallest, but will not be comparable to atomic radii. Therefore, a broad collection of data pulled randomly from many fields will tend to exhibit the property that the likely interval between any two readings will be proportional to their mean, resulting in a drop in probability density per linear unit when moving to higher values. Of course, the implication of this is that the density should continue to drop as we move from values beginning with 9 to values beginning with 10; however, the addition of the extra zero changes the scale by a factor of 10, putting us back in the "1s" mode but with a ten times wider sampling width, thereby returning us to the dominance of 1s due to their covering the lowest interval in any decade.
Benford's Law applies most precisely to measurements that span many orders of magnitude.