Results 1 to 10 of 10

Thread: durilca4linux_3 beats paq8hp12any

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    http://cs.fit.edu/~mmahoney/compression/text.html

    But it requires 64-bit Linux, 4 GB RAM + 1 GB swap. I only have 2 GB so I can't test it as benchmarked. It is tuned to LTCB and may not work with other files. Size includes a custom dictionary.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Any chance of getting a real paq9 with results better than paq8?

    Because this contest gets really funny, featuring durilca with
    2002 engine + preprocessor and redundant due to speed optimization,
    but still slow as hell, paq8... + preprocessor despite its built-in
    text modelling (and I'd say there's not much improvement since paq6 in 2003,
    for preprocessed text compression at least).

  3. #3
    Member
    Join Date
    May 2008
    Location
    Earth
    Posts
    115
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Nobody have tested paq8hp12any with 4 GB memory usage. Would be interesting to see...

  4. #4
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Great! Nice to see someone is able to squeeze this improvement from engine that's 6 years old

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Great! Nice to see someone is able to squeeze this improvement from engine
    > that's 6 years old

    Actually there just wasn't much development in these years.
    (Of course I mean compression quality, not speed or usability).

    And also durilca uses external preprocessing, so it should be
    possible to feed durilca's preprocessed data to paq8 and see
    what happens... wonder if i should try disabling the compression
    part... 64bit and linux complicate things though...

  6. #6
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    wonder if i should try disabling the compression
    part... 64bit and linux complicate things though...
    Ай-йяй-йяй!

  7. #7
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by IsName
    Nobody have tested paq8hp12any with 4 GB memory usage. Would be interesting to see...
    It is simpler and faster to run DURILCA at lower memory:

    ./DURILCA e -t2 -m1600 -o10 enwik9
    131839692 bytes 3708.41 sec.
    ./DURILCA e -t2 -m1800 -o10 enwik9
    131505803 bytes 3644.65 sec.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Ай-йяй-йяй!

    Too lazy to do it anyway...
    Btw, why not you just say what's paq performance with your preprocessed data? I bet you've tested it already

  9. #9
    Member
    Join Date
    Feb 2008
    Posts
    31
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien
    Btw, why not you just say whats paq performance with your preprocessed data? I bet youve tested it already
    It is slightly worse, results for enwik7 (I am too inpatient for larger files):
    PAQ8n: 1786603 bytes
    DURILCA: 1777587 bytes
    For larger files, DURILCA will use statistics of higher orders and PAQ will not.
    PAQ8hp try to preprocess already preprocessed file for this test.

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Quote Originally Posted by Dmitry Shkarin
    It is slightly worse, results for enwik7
    try paq8o9 it has new apm.

    Quote Originally Posted by Dmitry Shkarin
    For larger files, DURILCA will use statistics of higher orders and PAQ will not.
    paq has match model which reduces somewhat the need for high order statistics.

    bit based and hash based approach like in paq makes it inefficient for stationary low noise data like text, but its better for mixed data types like hlp files:
    http://www.maximumcompression.com/data/hlp.php
    also paq does better when memory is constrained.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •