Page 1 of 2 12 LastLast
Results 1 to 30 of 34

Thread: Machine learning based multimedia compression (GAN, VAE, ...?)

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts

    Machine learning based multimedia compression (GAN, VAE, ...?)

    The space of let say images in everyday applications is much smaller than of all bitmaps – restricting to such subspace allows to define one of them with a much smaller number of bits – allows for efficient data compression.
    However, this “space of everyday images” is extremely complex, in practice defined by huge datasets. There are now machine learning approaches allowing to work on such complex subspaces defined by datasets like GAN or VAE.

    Recently, data compressors based on them have starting appearing – often claiming huge gains, what might be extremely important in near future (unless patents will make them impossible to use :/ ) as e.g. video is about half of internet traffic.
    There is needed a thread to generally gather and discuss them – please post your thoughts here, papers, compressors, approaches etc. I will try to update this top post.

    Probably the most promising are GANs (generative adversarial network), especially closed-source (patents?) WaveOne: https://arxiv.org/pdf/1705.05823.pdf
    Here is its open-source alternative: https://arxiv.org/pdf/1804.02958.pdf
    https://github.com/Justin-Tan/generative-compression
    It is kind of wavelet transform learned on the dataset: there is pyramidal decomposition of blocks into their downscaled versions, in each scale there are extracted features using a convolution. Their parameters are optimized (on dataset) to minimize not only error of such final process: encoding-quantization-decoding, but additionally that adversary (separate NN) cannot distinguish them from the learned dataset.

    Another approach is VAE (variational autoencoder) – recent paper claims its superiority for MNIST: https://openreview.net/pdf?id=ryE98iR5tm
    https://github.com/bits-back/bits-back
    It wants to directly represent the “space of everyday images” as a lower dimensional latent space – assuming e.g. Gaussian distribution there. Encoder can use a pseudorandom value of latent variable by traveling in both directions on LIFO entropy coder.

    What are other interesting ML-based approaches trying to encode within learned subspace of everyday data?
    Last edited by Jarek; 8th October 2018 at 11:04. Reason: specifying

  2. The Following User Says Thank You to Jarek For This Useful Post:

    JamesWasil (8th October 2018)

  3. #2
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    33
    Thanks
    29
    Thanked 6 Times in 6 Posts
    Although slightly dated (and you may have already seen this a few years ago), this paper has an interesting approach to the use of AI for data compression on different image aspects and prediction models for still frame video segments, use of human (manual input) and automatic artifacting for colorization of greyscale images based on similar shapes and shades, and a few other neat things: https://web.bii.a-star.edu.sg/~cheng...Vis_ICML07.pdf

  4. The Following User Says Thank You to JamesWasil For This Useful Post:

    Jarek (8th October 2018)

  5. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Some updates: bits-back got to ICLR,
    ... WaveOne now claims superior video compression:
    http://www.wave.one/video-compression/
    https://arxiv.org/pdf/1811.06981.pdf

    Their image compressor uses multi-scale autoencoder with evaluation by discriminator (GAN) - I believe this is the proper approach.
    However, it has a place for optimization - standard autoencoder(AE) encodes into values which are tough for further storage:

    It can be enforced to get more compact distribution, e.g. Gaussian easy for entropy coding (e.g. coordinate-wise, or pyramidal vector quantizer), by simultaneously optimizing regularizer: which evaluates distance from Gaussian.
    This regularizer was guessed in WAE/SWAE/CWAE - doesn't really lead to Gaussian.
    But we can do it right - optimize empirical distribution to agree with CDF for Gaussian, especially for radii and distances:
    Click image for larger version. 

Name:	auto.png 
Views:	46 
Size:	178.2 KB 
ID:	6345
    Maybe we could also lead to uniform distribution in [0,1]^D cube or torus this way - what would allow to get rid of entropy coder ...

  6. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    I now know how to handle quantization in optimal way while enforcing e.g. uniform distribution on [0,1]^D cube/torus in latent space of autoencoder - to minimize quantization error, we can easily increase density near codewords ("egg-carton") in the attracted CDF.

    And generally, being in ML patent spree, there is a nonzero probability that G is currently trying to patent WAE - training AE minimizing some distance between sample distribution in latent space and chosen e.g. Gaussian distribution, what I believe will be crucial in future image/video compressors ...
    For any case, I believe I have moved far enough to avoid such patent - e.g. based on position in orders for all coordinates, we can calculate preferred position for this point (formula (15)) and shift toward this point - this is a different philosophy from distance between distributions.

  7. #5
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    AutoEncoder-based data compressors even allow to increase resolution (!) of encoded images - upsample accordingly to statistics learned from database of images.

    Standard autoencoder ( https://en.wikipedia.org/wiki/Autoencoder ) only optimizes reconstruction loss - distortion of encoding-decoding process, where the intermediate "latent space" is usually essentially smaller - in data compression we encode quantized value in latent space.
    To increase resolution, we can train to directly decode into higher resolution image, then downsample it - use reconstruction loss with downsampled image (diagram below).

    To make upsampled image look reasonable - agree with patterns from dataset, there can be used GAN discriminator ( https://en.wikipedia.org/wiki/Genera...sarial_network ) - adding discriminator loss to optimized criteria will enforce upsampled image to resemble images from dataset.

    As latent space of autoencoder is usually terrible to encode (diagrams with dots above), we can also add regularizer to optimized criteria - enforcing some easy to encode distribution in latent space, e.g. multivaraite Gaussian or uniform on [0,1]^D.

    Finally the encoder and decoder should be trained to optimize 3 criteria simultaneously - reconstruction loss, discreiminator loss and regularizer:

    Click image for larger version. 

Name:	Ja4Fge0.png 
Views:	28 
Size:	214.1 KB 
ID:	6410

  8. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    AutoEncoder-based data compressors
    The first question is: will it decompress fast enough. Some people consider WebP slow at 150 MB/s and rather decode JPEG at 280 MB/s and pay a hit in density. In lossless compression we are talking about 3x these speeds when compromises are made. These autoencoder-based compressors decode at around 1 MB/s. They are 100-1000x too slow.

    If decompression becomes ever fast enough through specialized acceleration hardware, the second question is if the multiplications and other energy-consuming operations are in efficient use in decoding neural representations, i.e., will your future phone heat up when watching videos or listening music.

  9. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Jyrki, optimizing such criteria is a nightmare - training are weeks on GPUs ...
    But then you get optimal weights for a few layers of convolutional neural networks (CNN) for encoder and decoder - optimized on the dataset.
    It can be imagined as generalized-wavelet-like transformation optimized for the dataset.

    Now encoder/decoder just needs to process the image through these few CNN layers with fixed optimized weights - WaveOne team now claims such real-time video decompression on GPU: http://www.wave.one/video-compression/
    It is a few multiplications per pixel - hardware implementation could easily do it in smartphone.

  10. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    WaveOne team now claims such real-time video decompression on GPU
    They claim 10 fps on 640x480 using a 100 teraflops graphics card. It still requires five orders of magnitude in performance improvement to be practical.

  11. #9
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    This field is like 2 years old, while deep CNNs might be far from practical for video compression (started late 2018 ) in this moment, there is a place for gradual improvement with shallower ...

    ps. For starters, while engineers are great at evolving known methods ... complementing them with useless theoreticians, they would e.g. say that focusing on block-based motion vectors is not the best way, and it is not a problem to do it right:
    Last edited by Jarek; 25th January 2019 at 11:07. Reason: added tractor :)

  12. #10
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    That's the best edit reason message I've seen in a while. All posts should come with added tractors!

  13. The Following User Says Thank You to JamesB For This Useful Post:

    Mike (26th January 2019)

  14. #11
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Perhaps the finer resolution of motion in tracktors is something that has contributed to the 4-5 magnitude orders of additional complexity. Likely the same kind of finer resolution of motion can be done in 1.5 orders of magnitude of additional complexity with a conventional approach -- but even that is too expensive and impractical.

    Sometimes in image and video compression we avoid certain quality-improving techniques because they are 10 % or even 2x slower to decompress, doesn't have to be 100'000x slower.

  15. #12
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by Jarek View Post
    Maybe we could also lead to uniform distribution in [0,1]^D cube or torus this way - what would allow to get rid of entropy coder ...
    I have reminded why enforcing uniform distribution on [0,1]^D for AE latent variable is not a good idea.
    Beside boundary (cube) or continuity (torus) problems, square lattice in high dimension is not the best way from quantization perspective because of large percentage of points being on diagonal - with large quantization error.

    Like for orange packing, much better is triangular lattice: closer to sphere, getting lower quantization error ...
    However, realizing triangle-like lattice is nontrivial, especially in high dimensions.
    The only (?) practical way I know (from Daala) is pyramid vector quantizer (PVQ):
    https://ieeexplore.ieee.org/document/1057198
    https://people.xiph.org/~tterribe/daala/pvq201404.pdf
    It is for encoding from l2 sphere - by first projecting it to l1 sphere, on which we can perform enumerative coding.
    We can reduce MSE a bit (~15%) if adding a deformation to the projection, like taking a power (p) of all coordinates first ( https://arxiv.org/pdf/1705.05285.pdf ):

    So I think the best way is to enforce a spherically symmetric distribution like Gaussian in latent space of autoencoder, then encode quantized radius with entropy coder, and l2 normalized vector with PVQ.

  16. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Lecture from Berkeley about the bits-back paper ( https://arxiv.org/pdf/1901.04866 ):
    https://youtu.be/grsO57XMJMk?t=2396

  17. #14
    Member
    Join Date
    Apr 2012
    Location
    London
    Posts
    239
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Jarek View Post
    Lecture from Berkeley about the bits-back paper ( https://arxiv.org/pdf/1901.04866 ):
    https://youtu.be/grsO57XMJMk?t=2396
    What general kind of scenarios where it is said that 'multiple codewords' requirement would be the norm ( one to many mapping) ? seems bits back could help where 'multiple codewords' required

  18. #15
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by LawCounsels View Post
    What general kind of scenarios where it is said that 'multiple codewords' requirement would be the norm ( one to many mapping) ? seems bits back could help where 'multiple codewords' required
    As I see it, its purpose is only handling indeterministic encoder: make this process detrministic by "borrowing" indeterminism for a moment to encode the symbol, then returning these "bits back".
    Here is the original Frey, Hinton paper: http://www.cs.toronto.edu/~fritz/absps/freybitsback.pdf
    The diagram from arxiv:
    Click image for larger version. 

Name:	sMgd474.png 
Views:	15 
Size:	56.9 KB 
ID:	6532
    So y is random (indeterminsm) from a known distribution defined by q, then the symbol is from p(s|y) distribution.
    Encoder determines y borrowing this information from stack. Then encodes symbol s what required y. Finally encodes y, returning these bits back.
    The cost of indeterminism y was compensated, decoder can reverse this process.

  19. #16
    Member
    Join Date
    Apr 2012
    Location
    London
    Posts
    239
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Take simple concrete example :
    . Encoder sometimes encode an event as '110' (2/3rd of times) or '101' ( 1/3rd of times), what is sent in addition by encoder besides '110' or '101'?

    . How decoder knows was really the same event encoded( other events may encode same codeword)? And in addition in doing so recover expanded bits?

    ANSWER: this bits back scheme is obvious simple and works should each event's potential multiple codewords are distinct exclusive as among all events!
    Last edited by LawCounsels; 26th March 2019 at 11:08.

  20. #17
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    In bits-back case they assume that encoder doesn't have just a single probability distribution of symbols s, but it additionally depends on its random state y: we use p(s|y).

  21. #18
    Member
    Join Date
    Apr 2012
    Location
    London
    Posts
    239
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Even with bits back, where an event may encode into multiple codewords dependent on y, it appears resulting compression though may be better than 'shortest length codewords' choice but not likely to be anywhere near optimum compared to when all are 1 to 1 mapping ( no multiple codewords)

  22. #19
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Bits back only "borrows" random y, then gives these "bits back" - allow to use random y determining the process, still being able to uniquely decode.
    The cost of proper encoding remains - it only allows to use p(s|y) with random y, instead of just p(s).

  23. #20
    Member
    Join Date
    Apr 2012
    Location
    London
    Posts
    239
    Thanks
    12
    Thanked 0 Times in 0 Posts
    What's this ANS hand in glove with bits back, over eg Arithmetic? It was already said bits back works with 'incremental encodes' and ANS is such.
    Probably just implementation convenience, nothing to do with better compressions using ANS

  24. #21
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    ANS is convenient for stack (LIFO) of symbols, what is required in bits back ... but it can be also realized with arithmetic, as it was originally done.

  25. #22
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Bit-Swap for image compression (ICML 2019) - extension of Bit-Back for hierarchical latent variables:
    https://arxiv.org/pdf/1905.06845
    https://github.com/fhkingma/bitswap

    Code:
    lossless bits/pixel:
                    MNIST    CIFAR-10    ImageNet (32x32)
    Uncompressed   8.00      8.00      8.00
    GNU gzip          1.65    7.37     7.31
    bzip2           1.59    6.98    7.00
    LZMA               1.49    6.09     6.15
    PNG                2.80    5.87     6.39  
    WebP              2.10     4.61    5.29
    BB-ANS           1.48    4.19      4.66
    Bit-Swap       1.29   3.82      4.50
    Last edited by Jarek; 17th May 2019 at 18:09.

  26. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    paq8px179 for MNIST test set:
    Code:
    7,840,016 t10k-images.idx3-ubyte
    1,648,877 t10k-images-idx3-ubyte.gz
    1,529,459 1.gz // 7z a -mx=9 -tgzip 1 t10k-images.idx3-ubyte 
      976,599 t10k-images.idx3-ubyte.paq8px179
    
    976599*1.65/1648877 = 0.977 (presuming that original .gz size was used)
    976599*1.65/1529459 = 1.053 (presuming .gz with optimal parsing)

  27. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    Jarek (17th May 2019),Lucas (17th May 2019)

  28. #24
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    Bit-Swap for image compression (ICML 2019) - extension of Bit-Back for hierarchical latent variables:
    https://arxiv.org/pdf/1905.06845
    https://github.com/fhkingma/bitswap

    Code:
    lossless bits/pixel:
                    MNIST    CIFAR-10    ImageNet (32x32)
    Uncompressed   8.00      8.00      8.00
    GNU gzip          1.65    7.37     7.31
    bzip2           1.59    6.98    7.00
    LZMA               1.49    6.09     6.15
    PNG                2.80    5.87     6.39  
    WebP              2.10     4.61    5.29
    BB-ANS           1.48    4.19      4.66
    Bit-Swap       1.29   3.82      4.50
    I wonder how they dealt with header bloat in WebP and PNG. Removing the header bloat alone can give such differences.

  29. #25
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Quote Originally Posted by Shelwien View Post
    paq8px179 for MNIST test set:
    Code:
    7,840,016 t10k-images.idx3-ubyte
    1,648,877 t10k-images-idx3-ubyte.gz
    1,529,459 1.gz // 7z a -mx=9 -tgzip 1 t10k-images.idx3-ubyte 
      976,599 t10k-images.idx3-ubyte.paq8px179
    
    976599*1.65/1648877 = 0.977 (presuming that original .gz size was used)
    976599*1.65/1529459 = 1.053 (presuming .gz with optimal parsing)
    Here you compress 10k images together - using dependencies between them.
    If I properly understand, in the paper they compress them separately - what makes it much more difficult.
    They use some trained models defining conditional probabilities between layers - table 2,3,4 seems to contain number of parameters in millions of these models, it seems they are not included in bits/pixel.
    These model parameters should not be seen as a header, but rather as size of compressor/decompressor specialized for a given type of data.

    ps. the Berkley lecture ( https://youtu.be/grsO57XMJMk?t=2464 ) is from authors of this paper.

  30. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > If I properly understand, in the paper they compress them separately - what makes it much more difficult.

    paq can be easily switched to non-solid compression by including "training set" as a dictionary.
    I'd expect similar results with that anyway.

  31. #27
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    MNIST is very simple, mainly uniform areas ... what about ImageNet?

  32. #28
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Somebody can try - its probably this: http://academictorrents.com/details/...879629a9b4b172

  33. #29
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Jarek View Post
    Here you compress 10k images together - using dependencies between them.
    paq8px doesn't recognize that format, so actually he's compressing a single file with the generic paq models, not using the proper image model.

    Quote Originally Posted by Shelwien View Post
    Somebody can try - its probably this: http://academictorrents.com/details/...879629a9b4b172
    Methodology used:
    - Download the tarball and extract all 49.999 PNG images (122.666.571 bytes)
    - Batch convert all images to raw (hence headerless) format (153.596.928 bytes)
    - Modify v179 of paq8px, in function detect(...), line 10722:

    return info=32*3, detd=info*32, IMAGE24;
    - Compile and run on each image with option -9

    Result: 89.127.751 bytes, 4,64 bpc.

    Quote Originally Posted by Jyrki Alakuijala View Post
    I wonder how they dealt with header bloat in WebP and PNG. Removing the header bloat alone can give such differences.
    They didn't. 122.666.571*8/153.596.928 = 6,39, the listed result for PNG

  34. The Following 4 Users Say Thank You to mpais For This Useful Post:

    Jarek (18th May 2019),Mike (18th May 2019),schnaader (18th May 2019),Shelwien (18th May 2019)

  35. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Result: 89.127.751 bytes, 4,64 bpc.

    I presume this includes paq headers?

    Also, what's the solid compression result?
    I think its better to compare pre-trained NN to solid compression.

Page 1 of 2 12 LastLast

Similar Threads

  1. Best compression for multimedia
    By Phil in forum Data Compression
    Replies: 10
    Last Post: 14th December 2017, 04:21
  2. RAISR - image compression with machine learning by Google
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 13th January 2017, 20:59
  3. Machine Learning to identify weight loss parameters
    By Sportman in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th August 2016, 13:10
  4. Split-stream compression (or what?) for x86 machine code
    By Paul W. in forum Data Compression
    Replies: 9
    Last Post: 26th April 2014, 21:35
  5. Multimedia Data compression with FreeArc
    By Gruselgurke in forum Data Compression
    Replies: 0
    Last Post: 5th April 2012, 20:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •