Results 1 to 10 of 10

Thread: A (new) Neural Network and XWRT implementation in C/C++

  1. #1
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts

    A (new) Neural Network and XWRT implementation in C/C++

    I have changed the neural network implementation; which was done in assembly; into C/C++.
    This has as advantage that the compiler can inline the implementation, which results in faster code.
    Below a part of the disassembled C/C++ version, compare these with the original
    assembler version almost identical.


    000002df <__ZL11dot_productPKsS0_i>:
    2df: 0f 57 c9 xorps %xmm1,%xmm1
    2e2: eb 15 jmp 2f9 <__ZL11dot_productPKsS0_i+0x1a>
    2e4: 0f 28 04 48 movaps (%eax,%ecx,2),%xmm0
    2e8: 66 0f f5 04 4a pmaddwd (%edx,%ecx,2),%xmm0
    2ed: 66 0f 72 e0 08 psrad $0x8,%xmm0
    2f2: 66 0f fe c1 paddd %xmm1,%xmm0
    2f6: 0f 28 c8 movaps %xmm0,%xmm1
    2f9: 83 e9 08 sub $0x8,%ecx
    2fc: 79 e6 jns 2e4 <__ZL11dot_productPKsS0_i+0x5>
    2fe: 0f 28 c1 movaps %xmm1,%xmm0
    301: 66 0f 73 d8 08 psrldq $0x8,%xmm0
    306: 66 0f fe c8 paddd %xmm0,%xmm1
    30a: 0f 28 c1 movaps %xmm1,%xmm0
    30d: 66 0f 73 d8 04 psrldq $0x4,%xmm0
    312: 66 0f fe c8 paddd %xmm0,%xmm1
    316: 66 0f 7e c8 movd %xmm1,%eax
    31a: c3 ret


    00000358 <__ZL5trainPKsPsii.part.1>:
    358: 66 0f 6e 44 24 04 movd 0x4(%esp),%xmm0
    35e: 66 0f 61 c0 punpcklwd %xmm0,%xmm0
    362: 66 0f 70 c8 00 pshufd $0x0,%xmm0,%xmm1
    367: 0f 28 15 10 0d 00 00 movaps 0xd10,%xmm2
    36e: eb 1e jmp 38e <__ZL5trainPKsPsii.part.1+0x36>
    370: 0f 28 04 48 movaps (%eax,%ecx,2),%xmm0
    374: 66 0f ed c0 paddsw %xmm0,%xmm0
    378: 66 0f e5 c1 pmulhw %xmm1,%xmm0
    37c: 66 0f ed c2 paddsw %xmm2,%xmm0
    380: 66 0f 71 e0 01 psraw $0x1,%xmm0
    385: 66 0f ed 04 4a paddsw (%edx,%ecx,2),%xmm0
    38a: 0f 29 04 4a movaps %xmm0,(%edx,%ecx,2)
    38e: 83 e9 08 sub $0x8,%ecx
    391: 79 dd jns 370 <__ZL5trainPKsPsii.part.1+0x18>
    393: c3 ret


    I have embedded the new NN implementation into 'paq8hp12any' and 'paq8pxd_v6'.
    Which leaves a few questions:


    1) Is in the DMC model the handling of top correct?
    There is allocated 'static Array<DMCNode> t(MEM*2)' while the handling is: 'if (top==MEM*2) threshold=512; if (top==MEM*3) threshold=768;'.


    2) Is the use of static dictionaries fair?
    'paq8hp12any' used a static dictionary while 'paq8pxd' uses a dynamic one.
    But in the benchmark this is not encountered or am I mistaken?


    3) While resolving a few compiler warnings in both implementations. I became a
    little frustrated about the horrible implementation (excuse my French) of
    'textfilter.hpp' and 'wrtpre.cpp'. Both are difficult to read, and have a large
    collection of (tiny) mistakes. For example the 'bounds' are incorrect calculated
    before sorting and encoding.
    I decided to write a complete new implementation with preserving the main idea
    of the XWRT algorithm. I tried to implement the algorithm with less code as
    possible, in a straight forward way.


    Can someway give comment about this implementation?
    Are there new improvements in the XWRT algorithm that I missed?


    Kind regards,
    Marwijn
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    May 2007
    Location
    Poland
    Posts
    85
    Thanks
    8
    Thanked 3 Times in 3 Posts
    In case someone wants to test this PAQ but cannot find the correct dll versions, here are those which seem to work.
    Attached Files Attached Files

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    IIRC you can compile most PAQ versions with -DNOASM and it will use pure C++ without the assembler code. The last time I tested it (a couple years ago since I don't maintain PAQ any more), g++ hadn't learned to generate SSE2 code so it was slower. Now it does, sometimes.

    paq8hp12any was a version of the Hutter prize winner adapted for general use (but especially enwik9). For both the Hutter prize and LTCB the size of the decompression program (and any external files it needs) is included in the decompressed size, so it is fair. Of course dynamic dictionaries like XWRT will compress other files better.

    Not sure about the DMC code. Could be a bug or something I forgot to take out.

  4. #4
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    The above compile does not run properly on my system even if the dll's are in the same directory.
    It keeps crashing right after start.

  5. #5
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Here are the executables made with VS2010.
    Beware these only work on a system with SSE2 (or better). The previous executables where made with GNU (MINGW), I forgot that there are external dependencies.
    On older systems I recommend too recompile the given source files. The selection between SSE and SSE2 or no SSE at all should be done automatically.
    This implementation should be faster than the previous NN implementations. Because the compiler can optimize a lot more than with assembler.
    Attached Files Attached Files

  6. The Following User Says Thank You to mahessel For This Useful Post:

    Stephan Busch (12th June 2013)

  7. #6
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Did somebody look at the txtprep implementation?
    I dying to get some feedback or improvements on the algorithm

  8. #7
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    Dear Marwijn,

    thank you for paq8pxd_v6 which I tested today on the SqueezeChart's Text Bible corpus where we have the bible in 32 languages (1 file per language).
    Each file is compressed separately. paq8pxd_v6 did no wrt transform on the hebrew and thai file and crashed on the chinese file.
    Nevertheless it would be on rank 1 here because if summed up it compresses little better than all other paq variants.
    And it is roughly twice as fast as paq8oSSE2 and paq8hp12 which is a really impressive result.

    If the chinese text would be compressed to not more than 698.000 bytes, the results would be:

    paq8pxd_v6 - 21.634.677 bytes in 10,761 seconds
    paq8pxd_v5 - 21.661.167 bytes (time not measured)
    paq8oSSE2 - 22.015.914 bytes in 18,561 seconds
    paq8hp12 - 22.722.510 bytes in 21,039 seconds

  9. #8
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    Quote Originally Posted by mahessel View Post
    Did somebody look at the txtprep implementation?
    I dying to get some feedback or improvements on the algorithm
    I'm glad that someone is interested in my work (XWRT).
    XWRT was written mainly for XML files and many different numerical formats (pages, dates) placed in them.
    Another reason why the code is so large is a separate optimization for LZ77, BWT, PPM, and PAQ.

    I've made experiments with your code and the world95.txt file. For comparison I've used paq8pxd_v5 (with store option (-0)) and XWRT 3.2 (the version used in paq8pxd is based on my code, but it's not mine).
    The XWRT options -0, -1, -2, -3 select optimization for further LZ77, LZMA, PPM, PAQ.

    The preprocessed files were compressed with PPMonstr J (default options):
    1996-04-20 15:00 2.988.578 world95.txt
    2013-06-13 10:58 412.559 world95.xwrt-0.pmm
    2013-06-13 10:58 403.267 world95.xwrt-1.pmm
    2013-06-13 10:58 398.383 world95.xwrt-2.pmm
    2013-06-13 10:35 398.459 world95.xwrt-3.pmm
    2013-06-13 11:29 410.039 world95.paq8pxd5-0.pmm
    2013-06-13 10:35 407.252 world95.Marwijn.pmm

    The above results show that your preprocessor and paq8pxd5 as expected don't work well with the PPM algorithm.
    The same files compressed with paq8n -7:

    2013-06-13 11:21 383.312 world95.xwrt-0.paq8n
    2013-06-13 11:01 365.678 world95.xwrt-1.paq8n
    2013-06-13 10:57 365.020 world95.xwrt-2.paq8n
    2013-06-13 10:49 363.915 world95.xwrt-3.paq8n
    2013-06-13 11:33 358.239 world95.paq8pxd5-0.paq8n
    2013-06-13 10:41 354.931 world95.Marwijn.paq8n

    These results show that Kaitz did a good job with optimizations of XWRT and your preprocessor works even better with paq8n.
    To confirm this experiment I've compared paq8pxd v5 with yours and original preprecessor:

    2013-06-13 11:44 335.606 world95.Marwijn.paq8pxd5
    2013-06-13 11:39 336.267 world95.paq8pxd5

    Good job, your preprocessor is slightly better. One still need to check whether this advantage also applies to other files.

    best regards,
    Przemyslaw
    Last edited by inikep; 13th June 2013 at 12:59.

  10. #9
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    377
    Thanks
    139
    Thanked 198 Times in 108 Posts
    Code:
    paq8pxd -8 enwik8
    Creating archive enwik8.paq8pxd with 1 file(s)...
    
    
    File list (18 bytes)
    Compressed from 18 to 18 bytes.
    
    
    1/1  Filename: enwik8 (100000000 bytes)
    Block segmentation:
     0           | utf-8     | 100000000 bytes [0 - 99999999] (wrt: 61949839)
    Compressed from 100000000 to 16672430 bytes.
    
    
    Total 100000000 bytes compressed to 16672458 bytes.
    Time 8872.79 sec, used 1767565157 bytes of memory
    KZo


  11. #10
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    872
    Thanks
    457
    Thanked 175 Times in 85 Posts
    I've got an error message on the TAR'e Gutenberg testset of the SqueezeChart:

    E:\TESTSETS>paq8pxd -8 gb.tar
    Creating archive gb.tar.paq8pxd with 1 file(s)...

    File list (18 bytes)
    Compressed from 18 to 20 bytes.

    1/1 Filename: gb.tar (711358464 bytes)
    Block segmentation:
    0 | default | 287050 bytes [0 - 287049]
    1 | utf-8 | 25595020 bytes [287050 - 25882069]
    2 | text | 17189151 bytes [25882070 - 43071220]
    3 | default | 999648 bytes [43071221 - 44070868]
    4 | text | 5707872 bytes [44070869 - 49778740]Transform fails
    at 0, skipping...


    5 | default | 1336136 bytes [49778741 - 51114876]
    6 | text | 2218006 bytes [51114877 - 53332882]
    7 | default | 59085 bytes [53332883 - 53391967]
    8 | utf-8 | 16491637 bytes [53391968 - 69883604]
    SSE2-Compressing... 8.44%

    at 70% progress, paq8pxd ha created 93 files namend 't56ig with extensions from .a to .4d;
    all 93 files are 60.7 GB in size. It seems like it has tried WRT again and again.

    On camera raw testset it detects executable code (which previous versions doesn't)
    but there is no executable code. It detects only 3 of 44 JPEG in camera raw testset.
    Those JPEG are previews that are stored inide the camera raw photo.
    All other paq8 variants and FP8 detect 7 JPEG here.

    E:\TESTSETS>paq8pxd -8 c.tar
    Creating archive c.tar.paq8pxd with 1 file(s)...

    File list (17 bytes)
    Compressed from 17 to 19 bytes.

    1/1 Filename: c.tar (577854464 bytes)
    Block segmentation:
    0 | default | 78385812 bytes [0 - 78385811]
    1 | jpeg | 728279 bytes [78385812 - 79114090]
    2 | default | 169243 bytes [79114091 - 79283333]
    3 | default | 554419 bytes [79283334 - 79837752]
    4 | exe | 12805 bytes [79837753 - 79850557]
    5 | default | 72486865 bytes [79850558 - 152337422]
    6 | default | 53676017 bytes [152337423 - 206013439]
    7 | hdr | 131584 bytes [206013440 - 206145023]
    8 | 24b-image | 57600 bytes [206145024 - 206202623] (width: 480)
    9 | default | 45611807 bytes [206202624 - 251814430]
    10 | default | 33970145 bytes [251814431 - 285784575]
    11 | jpeg | 703899 bytes [285784576 - 286488474]
    12 | default | 10238372 bytes [286488475 - 296726846]
    13 | default | 8925134 bytes [296726847 - 305651980]
    14 | exe | 35709 bytes [305651981 - 305687689]
    15 | default | 1425059 bytes [305687690 - 307112748]
    16 | default | 44515027 bytes [307112749 - 351627775]
    17 | hdr | 125038 bytes [351627776 - 351752813]
    18 | 24b-image | 57600 bytes [351752814 - 351810413] (width: 480)
    19 | default | 43224027 bytes [351810414 - 395034440]
    20 | default | 19209293 bytes [395034441 - 414243733]
    21 | default | 18578753 bytes [414243734 - 432822486]
    22 | default | 537332 bytes [432822487 - 433359818]
    23 | default | 18587698 bytes [433359819 - 451947516]
    24 | default | 12673831 bytes [451947517 - 464621347]
    25 | jpeg | 9201425 bytes [464621348 - 473822772]
    26 | default | 5210956 bytes [473822773 - 479033728]
    27 | default | 98820735 bytes [479033729 - 577854463]
    SSE2-Compressing... 85.81%
    Last edited by Stephan Busch; 22nd June 2013 at 13:58.

Similar Threads

  1. Fast Huffman implementation
    By Gribok in forum Data Compression
    Replies: 5
    Last Post: 26th January 2012, 01:26
  2. Replies: 6
    Last Post: 5th April 2011, 18:04
  3. Random neural network weights for paq8
    By byronknoll in forum Data Compression
    Replies: 2
    Last Post: 25th March 2011, 00:53
  4. Move-to-Front Implementation
    By Cyan in forum Data Compression
    Replies: 34
    Last Post: 8th August 2010, 01:11
  5. XWRT 3.2 (former XML-WRT) with LPAQ6 support released
    By Bulat Ziganshin in forum Forum Archive
    Replies: 2
    Last Post: 3rd November 2007, 00:51

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •