Page 2 of 2 FirstFirst 12
Results 31 to 53 of 53

Thread: LZPM 0.09 is here!

  1. #31
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Thanx Encode!

  2. #32
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Creating archive: d:a.arc using lzpm4k
    Compressed 1 file, 28.183.463 => 4.429.857 bytes. Ratio 15.7%
    Compression time 148.36 secs, speed 190 kb/s. Total 149.66 secs

    Creating archive: d:a.arc using lzpm8k
    Compressed 1 file, 28.183.463 => 4.397.292 bytes. Ratio 15.6%
    Compression time 246.85 secs, speed 114 kb/s. Total 249.92 secs

    Creating archive: d:a.arc using lzpm16k
    Compressed 1 file, 28.183.463 => 4.384.432 bytes. Ratio 15.5%
    Compression time 396.67 secs, speed 71 kb/s. Total 403.44 secs

    Creating archive: d:a.arc using 7z
    Compressed 1 file, 28.183.463 => 4.112.522 bytes. Ratio 14.5%
    Compression time 92.60 secs, speed 304 kb/s. Total 95.09 secs

    Creating archive: d:a.arc using ppmd:16:384mb
    Compressed 1 file, 28.183.463 => 3.939.418 bytes. Ratio 13.9%
    Compression time 27.05 secs, speed 1.042 kb/s. Total 31.67 secs

  3. #33
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some timings for LZPM 0.09 on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.

    Test file is FP.log.

    LZPM v0.09

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 774.915 = 00:12:54.915 = 100%


    Decompression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 1.882 = 00:00:01.882 = 100%


    Compressed Size: 628 KB (643,807 bytes)






    QUAD v1.12

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 6.386 = 00:00:06.386 = 100%


    Decompression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 2.728 = 00:00:02.728 = 100%


    Compressed Size: 700 KB (717,207 bytes)


    I will test the various versions of LZPM (4K, 8K, 16K) on the same machine and post result later.

  4. #34
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    fp.log is too small and unusual file

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    fp.log represents the worst case for Flexible Parsing.

    How to calculate the min. distance (how far algo can look in worst case - i.e. random data) for ROLZ:

    256 * TABSIZE (if we use 1-byte context)

    256 * 4096 = 1048576 (1 MB) (current)
    256 * 8192 = 2097152 (2 MB)
    256 * 16384 = 4194304 (4 MB)

    In practice, these values should be multiplied by, say, 4 or 8. For highly redundant data the actual distance is far more longer.


  6. #36
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    try to use smth more sophisticated than linear search

  7. #37
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    for example suffix trees

    somewhere i read about 'sliding' suffix trees (ie. where you can not only add new symbols but also remove old ones). with such structure you can achieve linear time for building suffix tree plus linear time for parsing.

    such thing would surely kick out all other lz algos.

    (additionally, with suffix trees you can use matches of lengths of thousands of bytes, up to size of sliding window without speed penalty).

  8. #38
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some timings for LZPM on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.

    Test file is FP.log.

    LZPM4K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 581.403 = 00:09:41.403 = 100%

    Compressed Size: 635 KB (650,310 bytes)


    LZPM8K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 869.416 = 00:14:29.416 = 100%

    Compressed Size: 631 KB (646,874 bytes)


    LZPM16K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 1175.059 = 00:19:35.059 = 100%

    Compressed Size: 630 KB (645,864 bytes)

  9. #39
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    fp.log is too small and unusual file
    What file do you suggest?

    Which test file did you use for the results above?

  10. #40
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by LovePimple
    What file do you suggest?
    text file of 10 mb at least - sources, natural text and so on. ive used sources of ghc. you can download smth alike as http://www.haskell.org/ghc/dist/6.4/ghc-6.4-src.ta r.bz2

  11. #41
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Bulat!

  12. #42
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I have downloaded the archive (http://www.haskell.org/ghc/dist/6.4/ghc-6.4-src.t a r.bz2) but still cant find the file of 28.183.463 bytes that you used in your test above. I'm sure you will understand that there is little point posting results if we don't all have access to the same test files.

  13. #43
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    but you don't asked me to give my file. it's http://haskell.org/bz/ghc-src.7z

  14. #44
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    here results for my testset (45596394 bytes in 14 files):
    version / size / comp / decomp speed (kb/s)
    0.09 / 19543628 / 280 / 6500
    0.09 4k / 19515844 / 297 / 6092
    0.09 8k / 19462255 / 224 / 6289
    0.09 16k / 19431576 / 165 / 5919
    7-zip 4.52b (ultra, word 64, lzma only) / 17901244 / 677 / 9119

  15. #45
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Just made a mistake with a new parsing scheme, which leads to some compression loss, sometimes notable loss (look at fp.log results).

    Results with fixed scheme:

    world95.txt: 575,958 bytes
    fp.log: 631,821 bytes (617 KB!)

    In addition, I have an idea how do further improve parsing... Will make some experiments...

  16. #46
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    but you dont asked me to give my file. its http://haskell.org/bz/ghc-src.7z
    Thanks Bulat! I will download it asap.

  17. #47
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some timings for LZPM on my old Intel P3 EB (Coppermine) 750 MHz, 512MB RAM, WinME machine.

    Test file is ghc-src.

    LZPM4K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 102.605 = 00:01:42.605 = 100%

    Compressed Size: 4.22 MB (4,429,856 bytes)


    LZPM8K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 174.498 = 00:02:54.498 = 100%

    Compressed Size: 4.19 MB (4,397,291 bytes)


    LZPM16K

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 288.412 = 00:04:48.412 = 100%

    Compressed Size: 4.18 MB (4,384,431 bytes)




    QUAD v1.12

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 21.573 = 00:00:21.573 = 100%

    Compressed Size: 4.46 MB (4,678,053 bytes)


    QUAD v1.12 (-x)

    Compression Time:

    Kernel Time = 0.000 = 00:00:00.000 = 0%
    User Time = 0.000 = 00:00:00.000 = 0%
    Process Time = 0.000 = 00:00:00.000 = 0%
    Global Time = 50.641 = 00:00:50.641 = 100%

    Compressed Size: 4.29 MB (4,505,182 bytes)


    Thanks Bulat! This is an excellent test file.

    If we can get several more people posting results from this same test file, it will give us some idea as to whether the 16K is worth it or not.

  18. #48
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    much fatser than my 1 ghz duron, probably due to 512kb cache and better cache organization

  19. #49
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    much fatser than my 1 ghz duron, probably due to 512kb cache and better cache organization
    Here is my CPU spec (from CPU-Z v1.40.5):

    Processor 1 (ID = 0)
    Number of cores 1
    Number of threads 1 (max 1)
    Name Intel Pentium III EB
    Codename Coppermine
    Specification
    Package Socket 370 FC-PGA (platform ID = 4h)
    CPUID 6.8.6
    Extended CPUID 6.8
    Brand ID 2
    Core Stepping cC0
    Technology 0.18 um
    Core Speed 747.8 MHz (7.5 x 99.7 MHz)
    Stock frequency 1000 MHz
    Instructions sets MMX, SSE
    L1 Data cache 16 KBytes, 4-way set associative, 32-byte line size
    L1 Instruction cache 16 KBytes, 4-way set associative, 32-byte line size
    L2 cache 256 KBytes, 8-way set associative, 32-byte line size
    FID/VID Control no


    The machine originally came with a 600 MHz Celeron chip which gave performance that was little more than a joke. I replaced it with a second hand 1 GHz Pentium III chip (7.5 x 133 MHz). The motherboard has a maximum bus speed of only 100 MHz so the CPU runs at 750 MHz (7.5 x 100 MHz). The performance is now MUCH FASTER and far, far more reliable.

    Im sure it would make a BIG difference to performance if you were to install a 1 GHz (or better) Athlon chip in that machine of yours.

  20. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Results on my machine, newer LZPM 0.10:

    ghc-src: 4,406,897 bytes, 17 sec

    Hm, will perform some optimizations...

    LZPM from test pack (LZPM 4K):

    ghc-src: 4,429,856 bytes, 15 sec


  21. #51
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by LovePimple
    L2 cache 256 KBytes
    oh, yes, 256kb. and afair EB series should have 133 mhz bus?

  22. #52
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Bulat Ziganshin
    and afair EB series should have 133 mhz bus?
    Correct! See my notes after the CPU spec.


    The "E" and "B" designators distinguish between Intel Pentium III processors with the same core frequency but different system bus frequencies and/or cache implementations.


    B = 133 MHz System Bus

    E = Processors with "Advanced Transfer Cache" (CPUID 068x and greater only if a frequency overlap exists)
    http://www.buildorbuy.org/p3-ram.html

  23. #53
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by LovePimple
    Im sure it would make a BIG difference to performance if you were to install a 1 GHz (or better) Athlon chip in that machine of yours.
    differences between duron and athlon is much less than between celeron and p3 processors (and it is reason why duron was so popular 5 years ago)

Page 2 of 2 FirstFirst 12

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •