Results 1 to 7 of 7

Thread: task1

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

    task1

    Sportman's thread reminded me of something -
    one rather common idea of random compression people is that its possible to
    split random data into substrings without repeated symbols and then compress these.

    Obviously it doesn't actually work.
    But I kinda liked the idea, and sometimes use it as a "data compression problem" -
    given a file of length N, and knowing that each byte value occurs only once within each
    aligned 256-byte block, what's the guaranteed compression ratio for this file?

    I even made a demo coder for this - http://nishi.dreamhosters.com/u/task1.rar
    but its a purely mathematical problem.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it's 256!/2^256 and has obvious implementation via aritmetic encoding, why it should be interesting?

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1. Its supposed to be something like an exercise in a data compression textbook.
    In fact I'd like to find more reasonably easy problems like this - practical, but still solvable.

    2. Its not 256!/2^256 and not even 256!/256^256 - see http://wolframalpha.com

    3. The "obvious implementation" is not so obvious, if it has to work for files of arbitrary length.
    Sure, the implementation with bytewise rc and linear scans would be rather simple, but it would
    be slow, and likely stop working after 16M.
    And an implementation with a bitwise rc may be a little more interesting.

    4. The "random" files generated in this task can be used for compressor benchmarks - most
    compressors don't see any redundancy at all, while paq8 still manages to find some.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    may be i misunderstand something, but
    1. you can compress each 256 bytes independently
    2. for first byte in block you have 256 variants, for second 255 variant and so

    so it's easy to count and easy to impelement

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > so it's easy to count and easy to impelement

    Yes, and your answer is still wrong. Even if you'd eventually remember about missing logarithm(s),
    there's still more to it, because I didn't say that N%256==0.

    Also if its too easy for you, maybe you can suggest something better?

  6. #6
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Shelwien View Post
    > so it's easy to count and easy to impelement

    Yes, and your answer is still wrong. Even if you'd eventually remember about missing logarithm(s),
    there's still more to it, because I didn't say that N%256==0.

    Also if its too easy for you, maybe you can suggest something better?

    (log(256!)/log(2))/8 = 210.5 BYTES so you can compress this to 211 bytes / block.

    The easy way to do this is so mod ARB255 you have a 255 binary cell tree with 256 leaves
    you start with the path to each leaf with a weight of one so top cell 128 128 each time you
    go down a level you half the weights so the last 128 binary cells each weigh 1 1 when you
    go through the tree you subtract the weight till done.

    I am pretty sure I did this mod years ago its not to hard. The only trick that most might miss
    it that each cell for the propabilites used the given weights. The means when you get to a cell
    that has a weight like 20 0 (nonzero on one path and zero on the other) you have no output to
    the file since it it would always take the path with no weight.

    What I am talking about is a file made of only 256 bytes blocks where each character used exactly once.

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Ok at first I was talking about files only made of 256 byte blocks. However I see that you might have the last block shorter than 256 characters. That actually is no big deal. I was basically getting the last character for free since all the binary cells would have one weight of zero in them. What you do to allow short ending is call the update on those last cells as for every character. You however ending up only updating the freeends. This over all would make less then one half byte more for the whole file. Also ARB255 using 64 bit high and low for this problem thats over kill a 32 bit high and low would still work. This mod would also make the output bijective to a set of all files of the form we are compressing. Just remember if file being compressed it not of this form it fails.

    Let me make it clear. When you do the uncompress it will map every possible bit pattern in a file to a file of the form we are talking about. it will not compress files of some other form to anything.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •