Results 1 to 9 of 9

Thread: "Implementing ppmc with hash tables"

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts

    Angry "Implementing ppmc with hash tables"

    I've just read this article by Arturo San Emeterio Campos and made a working order-4-2-1-0-(-1) implementation (https://github.com/richox/slimox), but I found it even slower than high order PPMd. Does a hash table implementation always perform worse than one based on context trie, or do I misunderstand something?


  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Order-2 and lower ones could be implemented with direct lookup. Higher orders generally needs hashing, but you can hide some memory latency by using prefetching.

    Trees probably uses cache more effectively, especially when you do context mixing.

  3. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Yes, large hash table typically cost a lot in term of cache misses. Likely culprit.

  4. #4
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Order-2 and lower ones could be implemented with direct lookup. Higher orders generally needs hashing, but you can hide some memory latency by using prefetching.

    Trees probably uses cache more effectively, especially when you do context mixing.
    I've read your TarsaLZP code and saw that trick (but I got SegmentFault when prefetching enabled ), now I'm thinking of running arithmetic coder in another thread

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    What CPU do you have? Prefetching shouldn't cause SegFaults even if one tries to prefetch inaccessible piece of memory. But prefetching can cause bad opcode/ unrecognized instruction when CPU doesn't support prefetching (in this case, prefetching requires CPU with SSE2).

    Arithmetic coder in another thread could help at encoding, where you can delay encoding until you have a sufficiently large batch of symbols to encode. At decoding you would need the threads synchronized and synchronization is costly on common CPUs.

  6. #6
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    What CPU do you have? Prefetching shouldn't cause SegFaults even if one tries to prefetch inaccessible piece of memory. But prefetching can cause bad opcode/ unrecognized instruction when CPU doesn't support prefetching (in this case, prefetching requires CPU with SSE2).

    Arithmetic coder in another thread could help at encoding, where you can delay encoding until you have a sufficiently large batch of symbols to encode. At decoding you would need the threads synchronized and synchronization is costly on common CPUs.
    I downloaded this version: http://encode.ru/attachment.php?atta...5&d=1353267497

    compiled with these options:
    gcc -std=c99 -msse2 -O3 main.c -o tarsalzp (completed)
    gcc -std=c99 -msse2 -O3 main.c -DGCC_SSE -o tarsalzp (completed)
    gcc -std=c99 -msse2 -O3 main.c -DGCC_PREFETCH -o tarsalzp (completed)
    gcc -std=c99 -msse2 -O3 main.c -DGCC_PREFETCH -DGCC_SSE -o tarsalzp (segment fault)

    Running on archlinux-i686, gcc-4.7.2, CPU: AMD-E350 1.6GHz

    It's very strange that when I debug with valgrind no error happened and compression completed, but running it directly leads to segment fault.

  7. #7
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    156
    Thanks
    18
    Thanked 50 Times in 26 Posts
    gdb alsa complained with SEGSIGV at encodeSymbol (nextSymbol=nextSymbol@entry=127, mispredictedSymbol=mispredictedSymbol@entry=181) at encoder.h:157
    Code:
    b.m = __builtin_ia32_pand128(b.m, c.m);

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Very strange indeed. For now I don't have an idea on what could go wrong. I have a netbook with the same CPU so I may be able to debug it if it will occur to me.

  9. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I have tested both most recent version and the version you've linked and it seems that everything works correctly when I test on maximumcompression.com's test files.

    I'm running netbook with AMD E-350 1.6 GHz (the same as yours), Ubuntu 12.04 64-bit, 64-bit compiles (compiler was complaining about missing bits/predefs.h or smth like that when I chose 32-bit compilation) and both vector and prefetch optimizations turned on.


    The project is a NetBeans project, so you could download NetBeans (version 7.2 or newer) and Ubuntu 12.04 64-bit LiveCD to replicate my environment and then compile (from NetBeans) and test the program again. There are NetBeans configurations for either enabling or disabling of SSE optimizations.

Similar Threads

  1. Replies: 7
    Last Post: 4th January 2016, 15:06
  2. "Extreme" compression of DNS domains
    By nickety in forum Data Compression
    Replies: 20
    Last Post: 22nd October 2011, 01:20
  3. PAQ8 C++ precedence bug (or "-Wparentheses is annoying")
    By Rugxulo in forum Data Compression
    Replies: 13
    Last Post: 21st August 2009, 20:36
  4. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 21:53
  5. Freeware "Send To" interface for CCM and QUAD
    By LovePimple in forum Forum Archive
    Replies: 2
    Last Post: 20th March 2007, 17:22

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •