Results 1 to 7 of 7

Thread: GCC mmx support

  1. #1
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    When looking to assembly code produces by gcc, i was shocked, that it wasn't translated to mmx or sse (compiler switches turned on!), something like:

    a[0] += b[0]
    a[1] += b[1]
    a[2] += b[2]
    a[3] += b[3]

    The code around is vectorized too, in the same manner. Gcc won't assemble it mmx or something natively? (gcc 4.2)
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    That's why I wrote paq7asm

  3. #3
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ok, one more thing: is gcc's inline assembler stable? Compiling with optimization seems to mess up the code... The gcc inline assembler normally worked for me, but this seems to be strange?!

    I've done a piecewise test of my new mixing code, here's a small piece of disassembly:

    Compiled with -O0:

    /APP

    /* Start */
    movq mm0, [ecx]
    movd mm2, edx
    punpcklwd mm2, mm2
    punpcklwd mm2, mm2
    pmulhw mm0, mm2
    pmulhw mm1, mm2
    movq [ecx], mm2

    /* End */

    /NO_APP
    mov ebx, 0
    mov esi, ecx

    which is ok and produces correct results, however with -O2 or 3:

    /APP

    /* Start */
    movq mm0, [ecx]
    movd mm2, edx
    punpcklwd mm2, mm2
    punpcklwd mm2, mm2
    pmulhw mm0, mm2
    pmulhw mm1, mm2
    movq [ecx], mm2

    /* End */

    /NO_APP
    mov DWORD PTR [esp+4], 32768

    The bold line stores 1<<15 and OVERWRITES a mixing result. Why the hell does this happend?! Constrains and input/output are correct...

    Matt, is this the reason, why you used nasm?

    The version is 3.4.5
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I used nasm because I wanted the code to work with other compilers like Borland, Mars, VC++, etc., not just g++. nasm also works in Linux (as yasm), but masm and tasm don't.

    I didn't mess with mixing C++ and assembler in the same function. g++ has some strange syntax to tell the compiler which registers you are using, but I didn't bother. If every function is either pure C++ or pure assembler there is no problem.

    Also, isn't mm1 uninitialized?

  5. #5
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Yes it is, but this is just a rough copy and paste, i missed its value. The original mixer update function is several times longer... As i said, i just tested each few lines of code seperatley; since i'm not familiar with mmx , but with 64 bit asm ^^. I usually use gcc's inline asm, but only with at&t syntax disabled - it my eyes it's redundant and horrible

    I managed to get a mixer update of 8 inputs twice as fast as my c++ version with only 4 inputs. However this bug disappeared, i don't know why?!

    But i figured out, that the main speed hit is a 32 bit / 32 bit division! Without this, the code would overall be about 4 times faster

    EDIT: good news for me - i managed to replace the division by a multiplication and a table lookup This even improves compression! Maybe i can put everything together this weekend - there will be a massive imrpovement (i could improve sfc to 11.423.424 bytes without any higher order modelling)
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    What's wrong with the MMX code in paq7/8?

    Yes, division is slow. paq doesn't use it anywhere except for initialization.

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I knew division is slow, but i didn't expect it eating up more than 70% of the mixer adjustment time.

    There's nothing wrong with paq's MMX. I just wanted to use gcc's inline asm, since you don't have a function call overhead (stack frame, call, ret, etc...), when the function is inlined. This caused a 10% speed increase in my tests, so it is worth it.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

Similar Threads

  1. Srep with multiple files support ?
    By SvenBent in forum Data Compression
    Replies: 3
    Last Post: 30th September 2010, 19:41
  2. GCC 4.4.1 for Windows
    By Bulat Ziganshin in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 16th January 2010, 00:39
  3. GCC 4.4 and compression speed
    By Hahobas in forum Data Compression
    Replies: 14
    Last Post: 5th March 2009, 18:31
  4. Free software to support RARv3
    By lunaris in forum Data Compression
    Replies: 11
    Last Post: 21st January 2009, 20:13
  5. PeaZip v1.3 now with PAQ8 support!
    By LovePimple in forum Forum Archive
    Replies: 29
    Last Post: 9th February 2007, 16:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •