Naked Science Forum

Non Life Sciences => Technology => Topic started by: Jarek Duda on 22/01/2017 07:12:46

Title: Huffman coding being replaced with ANS - ongoing revolution in data compression?
Post by: Jarek Duda on 22/01/2017 07:12:46
Beside imaginary data compression revolution in some HBO TV series, it turns out a lot has also actually changed in the real world in the last years - much better compressors you can just download and use.

One reason is more efficient coding.
Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems (https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems)

Arithmetic coding in 2013 had ~50MB/s/core decoding ... now analogous task is made by ANS with ~1500MB/s decoding on the same processor - nearly 30x software boost for the bottleneck of data compressors.
benchmarks: https://sites.google.com/site/powturbo/entropy-coder (https://sites.google.com/site/powturbo/entropy-coder)

ANS is used in data compressors since 2014, like in the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
https://github.com/facebook/zstd/ (https://github.com/facebook/zstd/)
7-zip with zstd: https://mcmilk.de/projects/7-Zip-zstd/ (https://mcmilk.de/projects/7-Zip-zstd/)
Total Commander plugin: http://totalcmd.net/plugring/zstdwcx.html (http://totalcmd.net/plugring/zstdwcx.html)

(https://github.com/facebook/zstd/blob/dev/doc/images/DCspeed5.png?raw=true)
Title: Re: Ongoing revolution in data compression
Post by: evan_au on 22/01/2017 10:27:55
Please rephrase the title as a question, as per forum guidelines - Moderator.