Results 1 to 16 of 16

Thread: Hi Everybody!!!

  1. #1
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts

    Hi Everybody!!!

    Hi Everybody!
    I'm very glad to finally join you all in this forum.
    Let me introduce myself...
    My name is Gonzalo, and i live in Argentina.
    I've been reading at encode.ru for years (maybe 4 or 5 surely) but just registered now. So i've tested by myself almost all ending with .exe around here, starting with FA's PowerPack and updates (except some *paq and cousins which sets deep blue as minimum requirement - sorry Matt.)
    Sadly, i'm not a programmer, but i'd love to. I think it's matter of time. But data compression is such an addictive hobby.
    As you will soon notice, English is not my 1st language. So, BTW, feel free to correct me when needed. I will not get angry... guess we're all here to learn as well. Thank you.
    ------------------------------------------------------

    Well... a first (contribution?) to the forum:
    Recently, randomly googling for data compression in Portuguese, i read something new. Googled deeper, and seems like you neither heard about it.
    So, here is the same article in english: http://www.technologyreview.com/view...ssion/?ref=rss
    At first sight, seems like those attempts to find certain data stream at x offset of PI... theoretically possible but extremelly impractical.
    Anyway... Let's talk about magic.
    Last edited by Gonzalo; 17th August 2014 at 01:33. Reason: Misspelling

  2. #2
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Mathemattical backgrounds o' mine isn't very deep. But AFAIK, the amount of RAM needed to store all combinations of 256...

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Hi Gonzalo,

    If I understood the paper correctly (at a first glance), it's about improving the theoretical lower bounds - i.e. defining them more precisely, not shifting them lower. It can influence further theoretical considerations, but it's not magic nor does it show any method for improving compression.
    Last edited by Piotr Tarsa; 17th August 2014 at 01:52. Reason: greeting :)

  4. #4
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    "The key is to arrange the deck in advance so that the sequence of the card colours follows a specific pattern called a binary De Bruijn cycle. A De Bruijn sequence is a set from an alphabet in which every possible subsequence appears exactly once. So when a deck of cards meets this criteria, it uniquely defines any sequences of six consecutive cards. All you have to do to perform the trick is memorise the sequences."

    http://en.wikipedia.org/wiki/De_Bruijn_sequence

    At first impression, it´s a specifc case. Has not to do with real world data, since the cards must be deliveratelly arranged in a particular way...


    BTW... Thanks for the reply!

  5. #5
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Hi Gonzalo,

    If I understood the paper correctly (at a first glance), it's about improving the theoretical lower bounds - i.e. defining them more precisely, not shifting them lower. It can influence further theoretical considerations, but it's not magic nor does it show any method for improving compression.
    Mmmm. Let's see if i understood you. It's about ratio bounds?

  6. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Well... i just read more carefully. Actually he propose a statistic method to predict next card/byte.
    Maybe really has to do with data packing

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Abstract says it all:
    Abstract
    We describe a new variation of a mathematical card trick, whose analysis leads
    to new lower bounds for data compression and estimating the entropy of Markov
    sources.
    The paper is about providing better estimations of compression ratio of data generated by Markov sources.


    Good theoretical bounds don't necessarily translate to good actual performance. They haven't presented any method of predicting data, not to mention a benchmark results.

  8. #8
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Indeed. It's just a comment without any significative result or algorithm to be tested.
    So... if some c++ genius would please... .
    It might not be compatible with sliding window algos but *cm perharps...

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    From my understanding of the paper, it does not seem to describe any new compression algorithm. What it describes is a bound on how much data you need to distinguish a k-order deterministic Markov process from a random data stream. A k-order deterministic Markov process outputs the next character by looking up the last k characters in a table of size s^k, where s is the size of the alphabet. What the theorem says is you need about s^(k/2) characters to distinguish the output from random, because by then you are likely to go to a previously used state and the cycle repeats.

  10. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Gonzalo (19th August 2014)

  11. #10
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    The author is familiar, so I'm sure it's valuable. But it looks like its value is mainly academic and theoretical rather than something you can apply directly.

  12. #11
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts

  13. The Following User Says Thank You to Sportman For This Useful Post:

    Gonzalo (19th August 2014)

  14. #12
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    So we finally got something to test.
    Wonder if that lower limit renders in better performance, since there's less operations to do than in the traditional approach, if i am not wrong.
    Let's see...

  15. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Are you talking about using deBruijn sequences to benchmark compressors? It seems to me the best compressors will just generate the sequences, like "compressing" pi or prime numbers as discussed in some other threads.

  16. #14
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Are you talking about using deBruijn sequences to benchmark compressors? It seems to me the best compressors will just generate the sequences, like "compressing" pi or prime numbers as discussed in some other threads.
    The number of DeBruijn sequences is large. I had to stare at the formula for quite a while before it started to become meaningful to me. The number of bits needed to represent binary DeBruijn sequences of length n bits is (where n is a power of two):

    n/2 - lg(n)

    Since the number of bits is an integer, they are very amenable to enumerative compression.

  17. #15
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Are you talking about using deBruijn sequences to benchmark compressors? It seems to me the best compressors will just generate the sequences, like "compressing" pi or prime numbers as discussed in some other threads.
    I'm talking about test the source stored at github and see for what is useful.

  18. #16
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Maybe i'm misunderstanding the paper, but seems like something figured out starting from DeBrujin secuence up, not the secuence itself, which is like those pseudo random numbers regenereable from a function.
    Gagie achieves this new bound by considering a related trick. Instead of pre-arranging the cards, you shuffle the pack and then ask your friend to draw seven cards. He or she then lists the cards’ colours, replaces them in the pack and cuts the deck. You then examine the deck and say which cards were drawn.

    This time you’re relying on probability to get the right answer.
    “It is not hard to show that the probability of two septuples of cards having the same colours in the same order is at most 1/128,” say Gagie.
    He goes on to consider the probability of correctly predicting the sequence of cards pulled at random from a deck of a certain size and after a few extra steps, finds a lower bound on the probability of doing this correctly.
    The card trick relies in a deck pre-arranged, but this is about a "secuence of card pulled at random" (like what you get from any real data: unknown content or entropy)... then perform a "few extra steps" to predict what cards are in the array

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •