Regarding the ideal compression of telecommunications transmissions:

For varuis types of data there are various ways to compress them. For sound there is the .MP3; for pictures the .JPG . Presumably they are not interchangeable. To send, therefore, a stream of assorted types of data efficiently thru a telecommunications channel, one cannot use just one of these schemes, but must invoke for each message (file) the appropriate scheme. But if the message traffic is general -- not limited to specific recognized data types, but potentially composed of arbitrary data types some of which may not be known in advance -- this will not work. You need a generalized compression algorithm to make it applicable to arbitrary data streams.

But how?

The information value of any symbol (e.g., byte) in a message is equal to the logarithm of its unlikelihood; and the information content of the whole message equals the sum of those values. Thus, a character in a long string of the same character may be regarded as having a 100% probability of occurance, so that its information value is (expressed in bits) log_2(1) = 0; so that the message has an information value of 0; there is therefor no point in sending it, because its content is already known. At the other extreme, in a string of random bytes, the odds against the occurance, in any one instance, of any particular value is 256 to 1, so that it, having an unlikelihood of 256/1, has an information value in bits of log_2(256/1) = 8, as expected, and a file having n random bytes has an information content of 8n bits .

All other messages fall somewhere in between these extremes. In standard English text, the letter "e" occurs quite frequently, thus on the average has a lower significance than, say, "x". The information value of each letter therefore is different. Compression would say, let us arrange all these letters in order of probability, and then assign short bit strings to the most probable letters, and longer bit strings to the less probable (as in the Morse code). This does indeed achieve compression; however this process is imperfect in that it assumes that the probability of occurance of any letter is simply the number of occurrances of it in the average string of such data, divided by the total number of characters in the string. It is equivalent to saying that we will create a loaded die having as many faces as possible characters, such that upon tossing it, the probability of any character coming up will match the number of such characters in a long string of text. But this approach is weak in two ways: First, the die has to be constructed based upon preconcieved understandings as to what the data will typically be like, which some kind of an assume in advance that it will be one or another kind (pictures, sound, text, whatever), which is what we want to avoid. Secondly, this scheme does not take into account the effect of conditional probility: namely, that the probability of the occurance of a certain character at a particular place in a message, may depend upon other elements in the message.

To take all this into account, it would appear necessary to seek a general compression algorithm which, without knowing in advance what would be coming through the channel, would be able to analyze it, reduce it to its minimum size, and reconstruct it on the other end. And in so doing it would have to take into consideration the fact that the probability, at any one position, of any particular character could depend, in general, upon exactly what other characters occur everywhere else in the message (or more generally, in the entire traffic stream). This means that we do not simply consider how frequently a specific character may occur on the average, but that we examine the probability of any specific occurrance of it in the light of the entire content of the rest of the message, and to do so in a situation where the messages may in general include types not previously known or anticipated. One suspects that such a process would have to be capable of "learning". All this would in general seem to be a formidable mathematical problem.

Does anyone know what is involved in solving it, and just how close to solving it practical science has been able to come?