Naked Science Forum

Non Life Sciences => Physics, Astronomy & Cosmology => Topic started by: Jarek Duda on 18/01/2013 13:12:10

Title: Sociophysics question - 2.3327 bits of entropy per individual
Post by: Jarek Duda on 18/01/2013 13:12:10
Imagine there is a population and we are interested in its complexity/entropy. Depending on complexity type, intuitively there could be considered different cases:
1) non-distinguishable specimens - entropy: S ~ log(n)
2) there is some order/hierarchy in the group: S ~ log(n!) ~ n log(n)
3) there are essential interactions within pairs: S ~ n^2
4) there are essential any interactions: S ~ n^n

What is kind of surprising is additional class between the first two: when specimens are distinguishable, but there is no hierarchy yet:
1.5) distinguishable specimens - entropy: S ~ n
and the minimal linear "distinguishability/individuality constant" here is about 2.332746 bits/element.

To see it, imagine attaching to each individual a random 0/1 sequence (identifier representing its features). Now for each one take the shortest distinguishable prefix - we get a prefix tree. We need to calculate entropy in this family of random trees. H_n for n elements, like 3 bits for n=2:
(https://www.thenakedscientists.com/forum/proxy.php?request=http%3A%2F%2Fdl.dropbox.com%2Fu%2F12405967%2Fhash.jpg&hash=0ead795096ec801d8cadff9250e83b18)
The asymptotic growth of this entropy is 2.77544n. It turns out it can be further reduced to 2.3327n and it is probably the ultimate limit(?)
Here are arxiv paper (http://arxiv.org/abs/1206.4555) and presentation (http://dl.dropbox.com/u/12405967/hashsem.pdf)

While this distinguishablity constant has clear meaning in computer science as boundary of operating on hash values (databases), I would like to ask about using it in sociophysics.
Have you meet with measuring entropy of societies?
How to quantitatively measure such entropy?
I thought about using dU=SdT relation in analogy to thermodynamics, where T would be some level of noise - have you heard about some kind of "phenomenological sociodynamics" approach?
Title: Re: Sociophysics question - 2.3327 bits of entropy per individual
Post by: RD on 18/01/2013 15:09:49
Imagine there is a population and we are interested in its complexity/entropy ...
Have you meet with measuring entropy of societies?

the "distinguishability" of Internet browsers ? ... https://panopticlick.eff.org/

https://panopticlick.eff.org/browser-uniqueness.pdf
Title: Re: Sociophysics question - 2.3327 bits of entropy per individual
Post by: Jarek Duda on 18/01/2013 16:26:26
RD, thanks for the interesting link, but the theoretical distinguishability limit I'm talking about is muuuch smaller - literally only 2.33 bits per element (one letter is about 5 bits).
It is really difficult to get only linear entropy growth: in size n population, an individual requires unique label of length lg(n), so we would need n lg(n) bits to directly store all of them ... but we could store these labels in n! equivalent ways, so we should be able to save log(n!)~n lg(n) bits - after subtracting two exact values we surprisingly get linear growth: asymptotically 2.77544n:
H_n + lg(n!) = n D_n
where
H_n is entropy ~ 2.77544n
D_n is average length of required unique identifier ~ 1.33275 + lg(n)   (it would be just lg(n) for a perfect tree, the constant before is obtained for random trees).

This 2.77544n is entropy of the whole group considered together - to encode the prefix tree. Considering labels separately would cost n lg(n).
It can be reduced to 2.33275n by not encoding direction of degree 1 nodes, which is not required to distinguish inside the population.
Can it be reduced further?

To find such small entropies we would need to consider a group of interacting objects which indeed have only distinguishability - maybe some microorganism colonies or human crowd...
Title: Re: Sociophysics question - 2.3327 bits of entropy per individual
Post by: evan_au on 19/01/2013 07:41:46
The Baudot code was used to encode English letters (after the invention of Morse Code, but before ASCII). It used 5 bits, regardless of the frequency of each letter. In contrast, Morse code (and the more recent Huffman coding) uses shorter sequences for the more common letters, and thus is more efficient than 5 bits per letter. 

The entropy of English letters is around 1 to 1.5 bits per letter. This can probably be further reduced if one considers the entropy of English words, rather than individual letters. http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

In a more Sociobiological case:The entropy of forensic DNA samples is discussed here: They ask "is it more than 33 bits, i.e., can it uniquely identify individuals?". http://33bits.org/2009/12/02/the-entropy-of-a-dna-profile/
Title: Re: Sociophysics question - 2.3327 bits of entropy per individual
Post by: Jarek Duda on 19/01/2013 10:58:33
Evan, even focusing on English, the amount of information in a single letter is indeed not a simple question. While ASCII uses 8 bits, including capital letters the number is smaller than 64, so 6 bits is enough to directly write them (Baudot code, Braille).
But optimal compression uses -lg(p) bits per situation of p probability and Morse code tries to get toward this direction.
Treating letters independently, according to letter frequency from http://en.wikipedia.org/wiki/Letter_frequency
(0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.0236, 0.0015, 0.01974, 0.00074)
the Shannon entropy is  -sum_i p_i lg(p_i) ~ 4.176
what is not much smaller than lg(26) ~ 4.7
so in Morse-like approaches we cannot get much better (we would need to group letters like in Chinese).
... but language contains a lot of redundancy - very complex correlations between letters. It allows us to reconstruct from damaged messages - removing it allows to further reduce the amount of bits/per letter. And so the best known compressors ( http://mattmahoney.net/dc/text.html ) can write 1GB of text into 127MB, what is about 1 bit per letter as you have written.
But it is not simple or straghforward - here is a nice table for a Scruples novel from 17th page of http://kuscholarworks.ku.edu/dspace/bitstream/1808/411/1/j42-hamid.pdf - succeeding numbers are entropy in bits while using letter groups of lengths correspondingly 1 to 7:
Upper Bound Entropy 4.07 3.43 2.71 2.62 3.46 2.75 2.73
Lower Bound Entropy 3.14 2.49 1.68 1.59 2.44 1.71 1.67

Going to biology, the 33 bits of the article you mention is absolute minimum to distinguish inside the world population (almost 2^33).
In practice we need more ... because of randomness, the average length of distinguishing identifier is 1.33275 bits longer (D_n above), but the DNA test has to be general - don't know how many bits are required to distinguish given individual.
The article says that current tests use about 54 bits, what seems reasonable - probability that beside your own profile it will also fit to someone's else is about 1/2^21 ~ 1 per 2 millions (assuming well used bits (no twins etc...) and no errors).
Our genetic individuality is of scale of 0.2% of the whole genome ( http://en.wikipedia.org/wiki/Human_genetic_variation ), what is a few orders of magnitude larger (plus epigenetic, variations between cells ... ). And we have muuuch more individual features...

The limit 2.33 bits/individual is muuuch stricter lower boundary (or maybe there is a lower one?) - we would need very simplistic models to get it, or observe from a specific point of view - like in sociophysical models of crowd.
And it requires looking at population as a whole - summing lengths of distinguishing labels of individuals requires faster than linear entropy growth (at least n lg(n)).
Title: Re: Sociophysics question - 2.3327 bits of entropy per individual
Post by: Jarek Duda on 19/01/2013 16:05:28
I have asked someone about this socio/econophysical analogue of phenomenological thermodynamics and was just suggested to look at prof. Mimkes ... and it turns out that, beside dozens of books, this "human thermodynamics" have its own "Hmolpedia" with 2600+ articles :)
http://www.eoht.info/
http://www.eoht.info/page/Human+thermodynamics
http://www.eoht.info/page/Economic+thermodynamics