Results 1 to 8 of 8

Thread: M19 - beginning work on next generation BWT compressor

  1. #1
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts

    M19 - beginning work on next generation BWT compressor

    I've started work on my next M series of BWT compressors. M19 is the next generation of my previous M03 work and, like M03, M19 will be a context aware BWT compressor. However, unlike M03, M19 has a more advanced form of context parsing which can skip about 50% of the context transitions that are needed in M03. The result is a much faster context parsing time during the encoding phase, however, it is unclear if or how much this technique will improve compression. I suspect that it will result in improvements beyond M03 but only time will tell.

    So far I have completed the context parsing component only (so not yet a compressor) but this is by far the most complicated part of the project. One of the main goals is to achieve the full boundless context parsing (no limit on max order) while still maintaining a 5N operational space. This was to ensure that the coder demands no more memory than the lightweight suffix array construction stage requires. This goal has been achieved.

    As I have done with MSufSort 4, I am documenting the progress here as I have time to complete the work which, in this case, I suspect will not be completed until next year (hence M19). The code for the full context parser is comletely functional and is available on github at https://github.com/michaelmaniscalco/m19 for anyone who was ever curious about how M03 context parsing worked (this is similar but more complex).

    I will update this thread as more work is completed. I'll update this thread as soon as possible with a basic overview of how M19 context parsing works.

    - Michael

  2. The Following 11 Users Say Thank You to michael maniscalco For This Useful Post:

    Bulat Ziganshin (30th September 2018),comp1 (24th September 2018),Darek (24th September 2018),encode (24th September 2018),Gonzalo (24th September 2018),hexagone (24th September 2018),JamesB (24th September 2018),Mike (24th September 2018),oltjon (24th September 2018),Shelwien (24th September 2018),Stephan Busch (24th September 2018)

  3. #2
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts

    Post

    That is great news.
    Based on M03, hopefully M19 will be ready before 2034 (at which point it would be obsolete since we will all have nano quantum computers running infinite loops in less than 1 second).
    Last edited by hexagone; 3rd October 2018 at 01:13. Reason: add more nonsense

  4. #3
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    164
    Thanks
    115
    Thanked 10 Times in 9 Posts
    What compression / ratio speed comparing to 7z?

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I guess its incomplete atm:
    Attached Files Attached Files

  6. #5
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    Quote Originally Posted by Shelwien View Post
    I guess its incomplete atm:
    Correct. As I wrote previously so far only the context parsing is complete. However, this is by far the most complicated component and getting it to work at this speed and with a fix 5N memory footprint is a big achievement.
    Although I get that it's difficult to appreciate what the current component is doing without a detailed description of the overall algorithm.

    Having said that the encoder only needs to encode symbol distribution from order k to order k+1 for each context parsed and the encoding side is complete. This part is identical to how M03 works. It's the more elaborate context parsing that will make m19 different from m03 for the most part.

  7. #6
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    Quote Originally Posted by michael maniscalco View Post
    Correct. As I wrote previously so far only the context parsing is complete. However, this is by far the most complicated component and getting it to work at this speed and with a fix 5N memory footprint is a big achievement.
    Although I get that it's difficult to appreciate what the current component is doing without a detailed description of the overall algorithm.

    Having said that the encoder only needs to encode symbol distribution from order k to order k+1 for each context parsed and the encoding side is complete. This part is identical to how M03 works. It's the more elaborate context parsing that will make m19 different from m03 for the most part.
    So true! From my website and in my own words:

    "So the rewrite is a more advanced form of the algorithm and I'm dubbing the work 'M19'. (Perhaps I should be realistic am name it 'M27'?!'

    Having children definitely places a drag on production!

  8. #7
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    Quote Originally Posted by hexagone View Post
    That is great news.
    Based on M03, hopefully M19 will be ready before 2034 (at which point it would be obsolete since we will all have nano quantum computers running infinite loops in less than 1 second).
    So true! From my website and in my own words:

    "So the rewrite is a more advanced form of the algorithm and I'm dubbing the work 'M19'. (Perhaps I should be realistic am name it 'M27'?!'

    Having children definitely places a drag on production!

  9. #8
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    109
    Thanks
    7
    Thanked 80 Times in 25 Posts
    I've pushed the latest updates which includes faster context parsing (about 25% faster) as well as support for both m19 parsing as well as the original m03 parsing. So far not a compressor but the project is progressing nicely. I believe that I will be able to make the encoder multithreaded even when encoding a single BWT block across cores. This will certainly be possible when parsing using the classic m03 parsing method. However, the m19 version is more complicated and would likely require an additional 1N of space to acheive the same level of encoding parallelism.

    Here are some measurements of the encoder modelling phase:

    machine:
    Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
    CPU(s): 4
    Thread(s) per core: 2
    Core(s) per socket: 2


    BWT+ M19 parsing:


    enwik8: max order = 2641, elapsed time: 13052 ms (7.30 MB/sec)
    book1: max order = 48, elapsed time: 90 ms (8.14 MB/sec)
    webster: max order = 5683, elapsed time: 5417 ms, (7.29 MB/sec)
    dickens: max order = 5207, elapsed time: 1306 ms, (7.44 MB/sec)



    BWT + M03 parsing:


    enwik8: max order = 5317, elapsed time: 12419 ms (7.67 MB/sec)
    book1: max order = 104, elapsed time: 94 ms (7.79 MB/sec)
    webster: max order = 11293, elapsed time: 5378 ms, (7.35 MB/sec)
    dickens: max order = 10263, elapsed time: 1327 ms, (7.32 MB/sec)


    - Michael
    Last edited by michael maniscalco; 12th October 2018 at 03:53. Reason: spelling

  10. The Following 3 Users Say Thank You to michael maniscalco For This Useful Post:

    Mike (12th October 2018),Shelwien (12th October 2018),Stephan Busch (12th October 2018)

Similar Threads

  1. BCM - The ultimate BWT-based file compressor
    By encode in forum Data Compression
    Replies: 110
    Last Post: 29th May 2019, 11:40
  2. Better PNG IDAT compression with BWT and MTF does not work, why?
    By McSodbrenner in forum Data Compression
    Replies: 6
    Last Post: 11th August 2016, 17:47
  3. BWTIL: a set of tools to work with BWT-based text indexes
    By Nicola Prezza in forum Data Compression
    Replies: 31
    Last Post: 29th August 2014, 14:58
  4. Why does BWT work on images?
    By m^2 in forum Data Compression
    Replies: 6
    Last Post: 21st September 2012, 03:46
  5. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 14:48

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •