Results 1 to 7 of 7

Thread: Compression with recurrent neural networks

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    Compression with recurrent neural networks

    Generating Sequences With Recurrent Neural Networks
    Alex Graves
    http://arxiv.org/pdf/1308.0850v2.pdf

    A recurrent neural network is trained on the first 96 MB of enwik8 and tested on the last 4 MB, achieving compression of 1.42 bpc on the training set and 1.33 bpc on the test set. This is quite a good result. I compared several compressors under the same conditions by compressing the first 96 MB, then the complete file and subtracting to get the ratio for the test data.

    Code:
    train test bits per character
    ----- -----
    1.668 1.537  bsc -b100
    1.626 1.528  zpaq -m 57
    1.528 1.427  ppmonstr -o16 -m1700
    1.531 1.426  zpaq -m 67
    1.494 1.384  nanozip -cc -m1.6g
    1.42  1.33   RNN (from paper)
    1.334 1.202  durilca'kingsize_4 -o32 -m3500 -t2
    For durilca'kingsize I included the size of the compressed dictionary EnWiki.dur and UnDur.exe from durilca4_decoder, as tested for the #1 spot on LTCB. Although it uses 13 GB memory for enwik9, it needs only 1.1 GB for enwik8. I did not include the decompresser size for the other programs since they don't use external dictionaries.

    He uses a neural network with 204 input neurons (one for each character that appears at least once), 7 hidden layers with 100 long short-term memory (LTSM) cells each, and one output layer with 204 neurons. The input neuron representing the current character is set to 1 and all others to 0. The output after normalization is the probability distribution of the next character.

    LSTM cells have a gated feedback loop to retain state information and also gated inputs and outputs, each gate controlled by a separate weight matrix from the input layer and all previous hidden layers. The output layer also receives input from all 7 hidden layers. These extra connections speed up back propagation. Weights are trained by gradient descent to minimize coding cost (like paq8 and zpaq) rather than RSME error. The weights are trained in 4 epochs with a learning rate of 0.0001 and momentum of 0.9. Weights are clamped to [-1, 1]. Weights were updated every 100 characters and the LTSM cell states were reset every 10K characters. Training continues on the test set in a single pass; without this training, the static model only compresses to 1.67 bpc.

  2. The Following 2 Users Say Thank You to Matt Mahoney For This Useful Post:

    byronknoll (20th July 2017),Cyan (14th November 2013)

  3. #2
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    223
    Thanks
    106
    Thanked 102 Times in 63 Posts
    There are some more RNN results on enwik8 here: https://www.eff.org/files/AI-progres...-Comprehension


    It looks like these results are for "static" models, where the weights of the model are not updated when evaluating the test set. Usually, the first 90% of enwik8 is used for training, the next 5% for validation, and the final 5% for testing. Multiple training passes are typically made over the training set (which can take weeks on a GPU). The best result is currently 1.313 BPC.


    Here are some cmix v13 results (not including the size of the dictionary or source code):
    5% test set by itself: 849,279 bytes (1.359 BPC)
    90% training set by itself: 13,886,290 bytes (1.234 BPC)
    concatenation of training set and test set: 14,606,322 bytes (1.23 BPC)
    compressed(concatenation) - compressed(training set) = 720,032 bytes (1.152 BPC)
    Last edited by byronknoll; 20th July 2017 at 06:17.

  4. The Following 2 Users Say Thank You to byronknoll For This Useful Post:

    Shelwien (20th July 2017),willvarfar (20th July 2017)

  5. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I think the main problem with NN models is that they lack SSE/APM.

  6. #4
    Member
    Join Date
    Mar 2011
    Location
    USA
    Posts
    223
    Thanks
    106
    Thanked 102 Times in 63 Posts
    Here is a recent paper which I think currently has state of the art results: https://arxiv.org/abs/1705.08639

    "Fast-Slow Recurrent Neural Networks": 1.2 BPC using an ensemble of two static models.

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Well, its certainly nice if they work with unpreprocessed enwik8 and don't have any word models or such.
    On 5M it may be not that important that its a static model.

  8. #6
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    275
    Thanks
    6
    Thanked 23 Times in 16 Posts
    I'm not really impressed with these results. Those 20 million parameters have to be trained somehow.
    You just can't sweep through the data like paq8 and expect good compression from those models.
    Paq8 uses many a-priori context decisions which speed up learning.

  9. #7
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    216
    Thanks
    97
    Thanked 128 Times in 92 Posts
    CM compressors selects the weights of the NN mixer by context, typically based on the few previous bytes.
    Can it also be useful with a byte-wise RNN mixer? Has anyone tried it?
    I did a quick test and I found nothing good.
    -----
    Here is some info on the compression of Text8 (text from ENWIK8) with RNN: https://danijar.com/language-modelin...-norm-and-gru/

Similar Threads

  1. A (new) Neural Network and XWRT implementation in C/C++
    By mahessel in forum Data Compression
    Replies: 9
    Last Post: 22nd June 2013, 02:09
  2. Random neural network weights for paq8
    By byronknoll in forum Data Compression
    Replies: 2
    Last Post: 25th March 2011, 00:53

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •