It is Derived From Statistical Mechanics and the Set Theoretic Approach. Please follow the derivation:,

Imagine Overlapping sets A and B, so that there is an intersection A^B, also a union AUB, and for each of these in a universe Omega, there are definite probabilities of occurring. So these would be:

p(Omega)=1, containing N items.

p(A) is the probability of set A, containing Na items, so p(A)=Na/N

p(B) is the probability of set B, containing Nb items, so p(B)=Nb/N

p(A^B) is the probability of the intersection of sets A and B.

SO p(AUB)=p(A)+p(B)-p(A^B) or p(A^B)=p(A)+p(B)-p(AUB) and

Omega=Omega(A)+Omega(B)-Omega(A^B)+Omega(Not AUB)

therefore from statistical mechanics:

1 =Sum over A : p(xi/A) + Sum over B: p(xi/B) - Sum over A^B: p(xi/A^B)+Sum over Not AUB: p(xi/Not AUB)

-k ln(Omega)=k ln(Sum: p(xi/A))+k ln(Sum:p(xi/B))- k ln(Sum:p(xi/A^B))+k ln(Sum:p(xi/Not AUB))

if p(xi/A)= SUm over A: exp(beta*xi)*p(A)/Sum over A:exp(beta*xi)=kNa+k ln(p(A))

therefore

simplifying:

Entropy total = RTot=-k ln (p(A)) - k ln(p(B)) + k ln(p(A)+p(B)-p(AUB)) - k ln( p(Not AUB))

If A and B are independent so that A^B= NULL then

RTot = -k ln(p(A)) - k ln(p(B) - k ln(p(Not AUB))

If Na+Nb=N and Not AUB= NULL then

p(B)= 1-p(A)= 1-p and

R = -k ln(p(1-p))

imagine a statistical distribution of xi, then RTot= -k*Sum over xi: ln(p(1-p))

if xi is continuous , x

then

RTot = -k*Integral from a to b: ln(p(1-p))

so this formula would replace Shannons formula:

ShanSTot=-p*ln(p)

Shannons formula implicitly states that if the probability of an event is zero, it contains infinite information Entropy and is highly informative, but if it is certain to occur with probability 1, it contains no information entropy.

This new law asserts that if the probability is zero or one, ie. completely impossible or absolutely certain to occur, it is highly informative and the information entropy is infinity. If the probability is .5, ie completely random in occurrence, then the information is at a minimum.

This latter formula makes more intuitive sense to me but that is my personal opinion. We will have to let the population of information Theorists decide.

Best wishes,

Richard.

Similarly inspired by Statistical Mechanics we can define an information Temperature and making use of the formula

L= Exp(5/2)*V/h^3*(2*PI*m*k*T)^(3/2)/N

where

R=Nk ln(L)

it follows that after some work:

5/2+3/2*ln(2*Pi*k*Ta/Na)=Ta*ln(Na)

Solve for Ta, similarly for Tb.

R=kTa*ln(Na)+k*Tb*ln(Nb)=approx. k*(p*Ta+(1-p)*Tb)*ln(N)=

k*ln(N^(Ta+Tb)*p^Ta*(1-p)^Tb)=

approx.k*ln(N^(p*Ta+(1-p)*Tb))

R= -kln(p*(1-p)) in its simplest form