Results 1 to 5 of 5

Thread: Entropy (?) question

  1. #1
    Member
    Join Date
    Apr 2015
    Location
    Norway
    Posts
    16
    Thanks
    5
    Thanked 2 Times in 2 Posts

    Entropy (?) question

    Im no math-viz. When reading about entropy I understand that it has something to do with the information contained.
    High entropy means more randomness.
    Low entropy means more compact structures/patterns etc?

    Then I heard about the Kolmogorov complexity, the Shannon Entropy etc...
    I viewed also a video explaining entropy. But I wasnt quite sure if I understood if one has a mathematical way of finding the low entropy of a chunk of information.

    Say I have a file of 256 bytes. The data is computer opcodes, so its some patterns, but it looks random, but its not very random.
    Is it possible to find the low-entropy value of these bytes (or 2048 bits)? meaning it contains all the information but the entropy-value tells how much it could have been compressed?

    What is this mathematical formula called? Shannon or? (since I am no math-viz I was wondering if anyone here could give me a hint or direction, or if it is no such formula).

    Thanks.

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Your question asks about entropy, but subtly introduces "modeling".

    Entropy can be defined in the Shannon sense as long as the way you represent your data is simple enough.
    For example, in your 256 bytes example, take all bytes, make a histogram of their value. Now you know that each symbol has an optimal cost of -lop2(p = occurence/total). Sum them. The result is a nice figure, which is typically called the "entropy" of a message.

    That being said, this figure depends on an implicit modeling : in this case, each byte is considered an "independent" unit, with probability of apparition only decided by global histogram.

    And we all know this is a very poor modeling, which compresses badly. We can do much better.
    For example, you could sort all these probabilities depending on the previous byte. For human words, it works well : after a "w", you'll more probably get an "o" or an "e", rather than a "k" in an english text.
    And then, you start wondering : why not sort them depending on the 2 previous bytes ? or depending on the previous full word ?
    What about arabic sentences ? chinese characters ?
    What about binary data ? should we still consider we have bytes, or a lot of bits packed together (a jpg image for example) ?
    Are there regular patterns, or some typical distances between them ? Are some bytes special after a definable event ?

    Pretty soon, you understand the ways to model your data are so numerous that you can't see the end of them. It's infinite.

    That's Kolmogorov complexity.

  3. The Following 2 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (2nd November 2015),Rudi (4th November 2015)

  4. #3
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Cyan View Post
    Your question asks about entropy, but subtly introduces "modeling".

    Entropy can be defined in the Shannon sense as long as the way you represent your data is simple enough.
    For example, in your 256 bytes example, take all bytes, make a histogram of their value. Now you know that each symbol has an optimal cost of -lop2(p = occurence/total). Sum them. The result is a nice figure, which is typically called the "entropy" of a message.
    To be precise, this is the zero-order entropy, or rather, the entropy of a very simple model where you suppose that data is generated by a black box, without memory, one character at a time. A poor model indeed.

    Quote Originally Posted by Cyan View Post
    Pretty soon, you understand the ways to model your data are so numerous that you can't see the end of them. It's infinite.
    That's Kolmogorov complexity.
    Well, not exactly. Kolmogorov complexity has not much to do with entropy. Entropy is a number assigned to a random process, and not assigned to a particular sequence. Hence, one cannot point at a sequence and say, "this sequence has entropy XY". You need to specify a model, and by that a random source, and by that probabilities. That's often implicitly understood as part of "computing entropy", but it isn't.

    Kolmogorov complexity is an attribute assigned to a particular realization of a random process, i.e. a given message. It is defined as the smallest computer program regenerating the input message. This is not a very precise definition, unfortunately, but defining it precisely takes a lot of time...

    For example: Why does the programming language not matter, computer programs can be written in various styles and so on... In the end, one finds that it does indeed not matter (much) if one only considers very long messages, or rather, sending the size of the message to infinity. In such a setting, the choice of the computer creates a vanishing contribution, i.e. the size of the message dominates, at least for "interesting messages".

  5. The Following User Says Thank You to thorfdbg For This Useful Post:

    Jiwei Ke (3rd November 2015)

  6. #4
    Member
    Join Date
    Apr 2015
    Location
    Norway
    Posts
    16
    Thanks
    5
    Thanked 2 Times in 2 Posts
    Interesting. Thanks for the replies. It seems that i'd have to do some reading.

    Cyan: Sorting based on probabilities of bits in their neighborhoods sounds like a good idea. Id have to write something else than lzw to compress using this method. I just recently made a lzw-compressor that is. Since a file for example has finite number of bits, we dont involve infinite numbers into these equations. Atleast not me :P Also let's not base our algorithms on english or any other language (even if its a good way testing the compressor), since our data has information content that can be anything.

    Working with small amounts of data (to compress) is very nice practise for me, and I also think that if one is able to compress small amounts of data to something very small, and the algorithm is adaptive enough to compress large amounts, better compression of larger amounts of data will follow. How such a scheme should be set up is another issue. For example I dont want it to be too complex, I want the algorithm to do the job, but I want to understand their underlying mechanics. I am looking into Cellular Automatons, if for example a CA can do some compression work alongside or independent of any other algorithm, then I am fine with it as long as I understand the underlying mechanics of the CA. I recently made a computer simulation of Totalistic CA which I will try and research some more on. I like also reversible CA's because it means after some bits of information is processed one can go backwards to the initial configuration (bit-strings), there are not many CA-rules that are but still interesting stuff.
    If anyone would like to see some screenshots of the TCA's, feel free to see and read a little about them on my homepage here: Totalisitc CA

    The problem with these CA-computations is that often information gets lost. I always work with Periodic boundary conditions because the edges of some rules often scrambles the data too.

  7. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Entropy is a function of probability. Usually you don't know what the probability distribution is, so you have to guess. Kolmogorov complexity does not depend on guessing a probability distribution. Instead, it is defined as the length of the shortest program which outputs (or describes) your data.

    Kolmogorov (also Solomonoff and Levin, all working independently in the 1960's) proved that the choice of language matters only up to a constant that is independent of the data. For any description in language L1, you can describe it in L2 by appending a fixed sized translator from L1 to L2 written in L2.

    More importantly, they proved (again independently) that there is no algorithm that takes strings as input and tells you what the Kolmogorov complexity is. Suppose there was. Then I could describe "the first string (in some lexicographical order) that cannot be described in a million bits". But I just did.

    This is why data compression is an art, instead of a solved problem.

  8. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    biject.bwts (4th November 2015),Rudi (5th November 2015)

Similar Threads

  1. Benchmarking Entropy Coders
    By dnd in forum Data Compression
    Replies: 183
    Last Post: 27th June 2018, 13:48
  2. Finite State Entropy
    By Cyan in forum Data Compression
    Replies: 203
    Last Post: 20th February 2015, 21:22
  3. Entropy coding on GPU
    By Piotr Tarsa in forum Data Compression
    Replies: 1
    Last Post: 24th October 2014, 07:56
  4. ENTROPY 0.5
    By comp1 in forum Download Area
    Replies: 3
    Last Post: 7th February 2013, 00:21
  5. What is best for a pure Entropy Encoder?
    By biject.bwts in forum Data Compression
    Replies: 4
    Last Post: 28th September 2011, 12:45

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •