Results 1 to 5 of 5

Thread: LSTM and cmix

  1. #1
    Member
    Join Date
    Jun 2019
    Location
    Poland
    Posts
    24
    Thanks
    0
    Thanked 0 Times in 0 Posts

    LSTM and cmix

    What is LSTM? How is main idea of cmix, how cmix can achieve his results?
    Is possible faster algorithm with comparable compression?

  2. #2
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    909
    Thanks
    531
    Thanked 359 Times in 267 Posts
    You could find some information about cmix and LSTM on official Byron's cimix page:
    http://www.byronknoll.com/cmix.html

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > What is LSTM?

    https://en.wikipedia.org/wiki/Long_short-term_memory
    Its not the main part of cmix though, just one of the models.

    > How is main idea of cmix, how cmix can achieve his results?

    By combining the best predictions from 100s of various data models.

    > Is possible faster algorithm with comparable compression?

    Yes, for example see Dmitry Shkarin's result (and its time) here:
    http://prize.hutter1.net/#contestants

    Also m03, glza here: http://www.mattmahoney.net/dc/text.html

  4. #4
    Member
    Join Date
    Jun 2019
    Location
    Poland
    Posts
    24
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    > How is main idea of cmix, how cmix can achieve his results?

    By combining the best predictions from 100s of various data models.
    For now I think about two models: random and text like Wiki dump.
    If is about 100 models, is any faster , statistic (for example Bayesian) method to determine best model than trying all 100 and slow down 100 times?

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > For now I think about two models

    You can see the models here:
    https://github.com/hxim/paq8px/blob/...8px.cpp#L10602

    > text like Wiki dump.

    Its actually not really text. There's too much of xml, wiki markup, html and bibliographies

    > is any faster method to determine best model than trying all 100 and slow down 100 times?

    Its possible to reach similar results with optimal parsing and model switching (slower encoding, much faster decoding),
    or by writing very specific data transformations for known syntax.
    But complexity is high enough even without that - mixing has low redundancy and is much simpler to implement.

    Of course, there're speed optimization tricks even for mixing - for example, NNCP splits data into multiple independent bit streams
    (for MT compression) and paq8px detects data types and enables only the relevant models.
    But rather than improving speed, these optimizations are commonly used to improve compression while keeping the same speed -
    since prediction quality is still limited by hardware.

Similar Threads

  1. lstm-compress
    By Mauro Vezzosi in forum Data Compression
    Replies: 57
    Last Post: 19th July 2019, 14:59
  2. cmix
    By Matt Mahoney in forum Data Compression
    Replies: 406
    Last Post: 9th May 2019, 01:20
  3. Replies: 7
    Last Post: 4th October 2014, 22:00

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •