Results 1 to 20 of 20

Thread: Java port of TarsaLZP

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts

    Java port of TarsaLZP

    Hi!

    Just for your information: I have started porting TarsaLZP into Java. Java code is much more readable than assembly code so it will be easier for you to figure out what is going on in my algorithm. As for now it's not working. There are only some common (for decoder and encoder) fuctions related to modelling and simple GUI.

    Archive with NetBeans project of TarsaLZP is here: http://www.ii.uj.edu.pl/~tarsa/jTarsaLZP.7z .

    Currently I have exams so I can't work on the code but I will be free at the beginning of next month.

    If you know some good (fast and efficient) arithmetic coder written in Java, please let me know.

    I will experiment with Java version of TarsaLZP and when big achievments are done I will port them to assembly.

    On my ToDo list are:
    - order-12 LZP (or another order higher than ,
    - skipping read from higher order models if symbol is not seen in lower order models (just write because reading in that case is unnecessary) - I don't know if it will speed things up as AFAIK memory controller writes and reads entire cache lines (maybe I'm wrong here and it can update only a few bytes),

    Cherrs,
    Piotr.


    EDIT:
    jTarsaLZP is working now. Read the rest of topic to know how it works and how it progressed.
    Last edited by Piotr Tarsa; 6th July 2009 at 22:52.

  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi Piotr!

    Thanks for porting! C++ version is possible ?

  3. #3
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Help Please!

    void encodeFlag(int probability, boolean match)
    {
    if (match) {
    rcRange = rcRange * probability / 65536;
    } else {
    rcBuffer += rcRange * probability / 65536;
    rcRange -= rcRange * probability / 65536;
    }
    while (rcRange < 0x01000000) {
    //outputByte(rcBuffer >> 24);
    rcBuffer <<=8;
    rcRange <<=8;
    }
    }

    Int decodeFlag(int probability)
    {

    ...............????

    return match;
    }

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    It should look like that:
    Code:
    boolean decodeFlag(int probability)
    {
      int rcHelper = rcRange * probability / 65536;
      if (rcHelper > rcBuffer) {
        rcRange = rcHelper;
        return true;
      } else {
        rcRange -= rcHelper;
        rcBuffer -= rcHelper;
        return false;
      }
      // normalization is missing here
    }
    But I don't have time to check.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    It should look like that:
    Code:
    boolean decodeFlag(int probability)
    {
      int rcHelper = rcRange * probability / 65536; // overflow!
      if (rcHelper > rcBuffer) {
        rcRange = rcHelper;
        return true;
      } else {
        rcRange -= rcHelper;
        rcBuffer -= rcHelper;
        return false;
      }
      // normalization is missing here
    }
    But I don't have time to check.

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Thanks for pointing it out. There should be 'long', not 'int'.

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Thanks for pointing it out. There should be 'long', not 'int'.
    Not even long. Int64 or something - 64-bit integer. Because rcRange must be 32-bit integer. And instead of a division, better use right shift:
    Code:
      UInt32 rcHelper = ((UInt64(rcRange) * probability) >> 16;

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Java uses 64 bit longs. Every integral value is signed.

    BTW: Because of heavy use of longs 64 bit Java Virtual Machine is recommended.

  9. #9
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi Piotr!

    normalization ????

    rcRange<<=8;
    rcBuffer=(rcBuffer<<+getbyte(infile);
    Last edited by Nania Francesco; 14th June 2009 at 21:22.

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Yes. Something like that. Before returns:
    Code:
    while (rcRange < 0x01000000) {
      rcBuffer = (rcBuffer << 8) + inputByte();
      rcRange <<= 8;
    }

  11. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Slightly updated the code.

    PS:
    I've read Agner Fog's tutorials and there is a evidence that writing to uncached memory causes fetching entire memory line to cache and then writing it back. But there are instructions like MOVNTxxx available in some SSE versions which causes writing to memory avoiding cache. So I'll probably use it in assembly version as I'm planning to rewrite that version to 64 bit.

    Some serious work should be done at beginning of July.

  12. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Encoding is almost working now. I have a small problem with infinite loop. I need to read more about Java I/O architecture. Maybe someone other is familiar with Java...

  13. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Encoding now makes reasonable results. I had a lot of problems with types casting (something not present when programming in assembly ).

    Edit:
    Sorry for posting many posts in a row .
    Last edited by Piotr Tarsa; 5th July 2009 at 13:11.

  14. #14
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    I'm expecting of your program. So excuse me, will your open source then?

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I provided full source code in the archive. See first post. I'm constantly updating it, but the link stays the same. You need to download NetBeans IDE from netbeans.org, unpack the archive and open it in NetBeans IDE.

  16. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Encoder now works properly and is bit-for-bit compatible with http://asembler.republika.pl/bin/TarsaLZP.zip .

    Francesco should be happy

  17. #17
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi!

    Yes Piotr , Nice job !

  18. #18
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Decoding runs fine now.

    I've changed modelling a bit - I've changes updateSEEx to updateSEEHistoryx when LZP with order x isn't chosen at the moment. I feel that I'm doing many things in wrong way, especially the SEE.

    I think that I should not update LZP order 4 model when using LZP order 8 model, but when I skip it (ie. when LZP order 8 is chosen then don't update LZP order 4) compression becomes worse - compressed enwik8 becomes about 500,000 bytes larger, enwik9 becomes about 3,900,000 bytes larger. What do you think?

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I've tried adding order-12 LZP model, but nothing good happened. On enwik8 it made the result better by only 10,000,000 byte and it doubled the time. The gains are not worth the speed hit. I will continue by improving current order 4 and order 8 tandem.

  20. #20
    Member
    Join Date
    Jun 2008
    Location
    USA
    Posts
    111
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I always liked the fact that TarsaLZP was so small and written in FASM. But being Windows only as well as ever-increasing amounts of RAM limited its appeal a bit to me (no offense). Nevertheless, good job!

Similar Threads

  1. Test set: Java application
    By m^2 in forum Data Compression
    Replies: 4
    Last Post: 24th October 2008, 00:06
  2. need advice on low- order backend of tarsalzp
    By Piotr Tarsa in forum Forum Archive
    Replies: 4
    Last Post: 25th March 2008, 12:03
  3. JAVA vs C++
    By Nania Francesco in forum Forum Archive
    Replies: 5
    Last Post: 20th January 2008, 22:07
  4. TarsaLZP
    By Piotr Tarsa in forum Forum Archive
    Replies: 83
    Last Post: 31st October 2007, 19:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •