Results 1 to 15 of 15

Thread: Diqs

  1. #1
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts

    Diqs

    I made a new compresser like krc. better compression for text.
    Code:
     to compress : Diqs c[blocksize][k|m] infile outfile
        blocksize:1024 - 8388608. 1k is 1KB, 1m is 1MB. default is 100k\n"
    
     to decompress : Diqs d infile outfile
    Attached Files Attached Files

  2. The Following User Says Thank You to xezz For This Useful Post:

    Sportman (19th May 2014)

  3. #2
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    I tryed, but doesn not seems very effective on ASCII-SQL-DUMP (411MB => 89MB with diqs, ~70MB with freearc\zpaq\7z in fastest mode)
    How does it work?

  4. #3
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Nice job, it's faster and better at small files:

    enwik3 383 bytes, diqs 1m
    enwik4 4,277 bytes,diqs 1m
    enwik5 57,347 bytes, diqs 1m
    enwik6 660,773 bytes, diqs 1m
    enwik7 6,628,626 bytes, diqs 1m

    enwik3 434 bytes, krc 1
    enwik3 412 bytes, krc 4d
    enwik4 4,733 bytes, krc 1
    enwik4 4,443 bytes, krc 4d
    enwik5 45,454 bytes, krc 1
    enwik5 43,318 bytes, krc 4d
    enwik6 375,784 bytes, krc 1
    enwik7 3,620,957 bytes, krc 1

  5. The Following User Says Thank You to Sportman For This Useful Post:

    xezz (19th May 2014)

  6. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    When I look to the output file, it looks like you use a 4 bytes fixed length word size?

  7. #5
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Diqs crash on 0byte file, so fixed, and incompressible block is a few bytes smaller.


    I tryed, but doesn not seems very effective on ASCII-SQL-DUMP (411MB => 89MB with diqs, ~70MB with freearc\zpaq\7z in fastest mode)
    How does it work?
    It is dictionary based compresser. It does not use entropy coder, so usually worser than freearc\zpaq\7z.


    When I look to the output file, it looks like you use a 4 bytes fixed length word size?
    2 to 16 bytes. but be able to up to 18.
    Attached Files Attached Files
    Last edited by xezz; 19th May 2014 at 17:42.

  8. #6
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Decoding bug fixed. And adds some options.
    enwik5 57,347 bytes, diqs 1m
    enwik6 660,773 bytes, diqs 1m
    enwik7 6,628,626 bytes, diqs 1m
    Was command c1m? Result is too worse.
    Attached Files Attached Files
    Last edited by xezz; 25th May 2014 at 11:28. Reason: bug fixed

  9. #7
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by xezz View Post
    Was command c1m? Result is too worse.
    I guess, I re-tested with new diqs version:

    enwik3 383 bytes, 0.015 sec. - 0.000 sec., diqs 8000k
    enwik4 4,242 bytes, 0.093 sec. - 0.000 sec., diqs 8000k
    enwik5 42,595 bytes, 4.968 sec. - 0.140 sec., diqs 8000k
    enwik6 358,480 bytes, 353.822 sec. - 9.500 sec., diqs 8000k
    enwik7 3,918,833 bytes, 4141.060 sec. - 67.142 sec., diqs 8000k

    (enwik7 compare fail 9,999,984 bytes out)

    xml 670,236 bytes, 1100.626 sec. - 20.281 sec., diqs 8000k

    fp.log 939,981 bytes, 1261.396 sec. - 27.235 sec., diqs 8000k

  10. The Following 2 Users Say Thank You to Sportman For This Useful Post:

    Matt Mahoney (27th May 2014),xezz (25th May 2014)

  11. #8
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by fcorbelli View Post
    I tryed, but doesn not seems very effective on ASCII-SQL-DUMP
    Database dumps and text are different data types with different redundancy. Neither is enwiki a text file. But it contains quite a lot of text.

    I don't know whether this is the problem with Diqs but in general a problem that exists with data compression methods is, that they are made for a special redundancy. Not everything that can be opened with a text editor is of the data type "text".
    Last edited by just a worm; 27th May 2014 at 09:01.

  12. #9
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Some bugs fixed.
    I don't know whether this is the problem with Diqs but in general a problem that exists with data compression methods is, that they are made for a special redundancy. Not everything that can be opened with a text editor is of the data type "text".
    If block size is too large, compression is worse. Because max code size is 2.
    Attached Files Attached Files
    Last edited by xezz; 31st May 2014 at 06:51.

  13. #10
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    I redid test with Diqs v0.3.1 c8000k:

    enwik3 383 0.007 0.000
    enwik4 4242 0.098 0.000
    enwik5 42595 5.431 0.140
    enwik6 358458 371.820 9.328
    enwik7 3916754 4234.082 65.784
    enwik8 39712830 40170.005 585.202

    Compare fail:
    enwik7 9999978 bytes
    enwik8 99999946 bytes

  14. #11
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Thank you for test large files.
    Compare fail:
    enwik7 9999978 bytes
    enwik8 99999946 bytes
    I found the cause of the bug. Maybe selected bad replacement value. Please wait for fix.

  15. #12
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    If v0.3.1 gives fail, use v0.3.2.
    Both compression are not same.
    Attached Files Attached Files

  16. #13
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Added 3bytes replacement value.
    Attached Files Attached Files
    Last edited by xezz; 9th June 2014 at 15:36. Reason: bugfix

  17. #14
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Got crash compressing text file - The Sonnets by William Shakespeare, 1609:

    Last console output:
    diqs c8388608:e2046:l19 text.txt text.diq
    blocksize:5560734
    block:1 / 1
    cs3:1890951 bytes, 32 words 93.73%
    94 :1886398 bytes, 255 words 100.00%

  18. #15
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    149
    Thanks
    30
    Thanked 59 Times in 35 Posts
    I have tried fix since few days ago, but could not yet.
    I tested some large files.
    if block size is too large, v0.4 crashes.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •