Results 1 to 29 of 29

Thread: LOCO-I (JPEG-LS) uses 365 parameters to choose width - any alternatves?

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts

    LOCO-I (JPEG-LS) uses 365 parameters to choose width - any alternatves?

    In image/video/audio/numerical data especially lossless compression we often subtract predictor (context based), then encode the difference (residue) usually assuming Laplace distribution e.g. through Golomb coding (~2% more bits/pixel than Shannon for used parameters).
    The tough question is how to choose width of such Laplace distribution/parameter of Golomb depending on context?

    LOCO-I ( https://en.wikipedia.org/wiki/Lossless_JPEG ) quantizes context into 365 discrete possibilities treated independently.
    I didn’t like such brute force approach – without exploiting their dependencies.
    So recently working on modelling conditional probability distributions, I decided to test low parametric ARCH-like models – turns out a few parameter models can give similar or even better compression ratios than 365 parameters for LOCO-I: https://arxiv.org/pdf/1906.03238 , files with Mathematica notebook:



    Any thoughts on that – used, interesting approaches to model conditional continuous distributions, like context dependent width of Laplace for residue encoding?
    Last edited by Jarek; 10th June 2019 at 07:12. Reason: Added arxiv

  2. The Following User Says Thank You to Jarek For This Useful Post:

    Mike (30th May 2019)

  3. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    Any thoughts on that – used, interesting approaches to model conditional continuous distributions, like context dependent width of Laplace for residue encoding?
    http://qlic.altervista.org/LPCB.html suggests many better starting points for improvements than JPEG LS.

    To point out an opensourced one, cwebp -q 100 -m 6 compresses ~7 % more densely and is 5x faster to decode.

    Also, the author of qlic2, Alexander Rhatushnyak, has made an improved version of lossless coding (in pik/JPEG XL), which would likely be more interesting to study than JPEG LS.

  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Sure, but JPEG LS is a standard, and somehow natural brute force approach to the task I wanted to focus on in this thread: just quantize all possible contexts.

    It is tough to find details - so what approaches are used to choose context dependent conditional probability distributions like width of Laplace?

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    just quantize all possible contexts.
    Not sure if you asked about jpeg xl/webp or about jpeg ls... but the better compressors (jpeg xl lossless and webp) do not use context modeling. Jpeg xl models several different predictors and uses the one that worked the best in the recent past. WebP has no implicit context modeling and no adaptive predictor selection, but describes explicitly the 'meta entropy code' that is used in a locality. This is number in turn chooses the entropy codes for RGBA, lengths and distances. For a large image there can be 777 different meta entropy codes that are explictly chosen by the meta entropy image.

    In my experience one loses very little (< 2 %) by doing things explictly, and can win a lot in density performance for short data and for decoding speed.

  6. #5
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Predictor is one thing - estimates the center of such Laplace distribution (ideally the median by l1 optimization).
    Beside choosing from a few predictors, least squares allow to inexpensively optimize parameters of predictor as a linear combination of e.g. neighboring pixels : "mu = a1*A + a2*B + a3*C + a4*D" - for example to be stored in the header.
    The blue dots in top-left plot above are for such (a1, a2, a3, a4) parameters optimized individually for each image - for some of them giving much lower residues.

    But then how to choose width of such Laplace distribution (b) e.g. as parameter in Golomb coding?
    This is much more difficult question than predictor:
    - LOCO-I quantizes 3 dimensional context |C-A|, |B-C|, |D-B| into 365 possibilities and estimates the parameter independently for each of them,
    - the tests are for "b = b0 + b1 |C-A| + b2 |B-C| + b3 |D-B| " types of models, e.g. with 4 parameters (b0, b1, b2, b3) optimized independently for each image (least squares - cheap).

    If I properly understand, you prepare e.g. 777 entropy coding tables chosen based on local context like channel?
    Are they for Laplace of various widths (scale parameter)?
    What kind of a model you use to choose one of them?

  7. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    If I properly understand, you prepare e.g. 777 entropy coding tables chosen based on local context like channel?
    Are they for Laplace of various widths (scale parameter)?
    What kind of a model you use to choose one of them?
    I first choose a spatial resolution for the entropy image. See 5.2.1 Decoding of Meta Huffman Codes in https://developers.google.com/speed/..._specification

    It means that in the initial phase I'd use 4x4 to 512x512 squares where I'd compute the histograms of As, Rs, Gs, Bs, lengths (combined with green) and distances. A further phase would 'cluster' these entropies together in a way that reduces the total entropy (including the entropy coming from the entropy codes themselves).

    Once we have that we have way too many histograms. If we had 4x4 entropy image subsampling, a 1024x1024 image would give 256x256 histograms (x5 for R(G+len)BA+dist). That is too many for efficient coding. Because of it we merge histograms. Easiest to understand the merging is HistogramCombineStochastic in https://github.com/webmproject/libwe...istogram_enc.c

    All of this is without parametric modeling, just minimizing Shannon entropy of symbol histograms. The clustering and relatively dense representation of the entropy codes makes it feasible without modeling.

  8. The Following User Says Thank You to Jyrki Alakuijala For This Useful Post:

    Jarek (30th May 2019)

  9. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Thanks, so if I properly understand, you assume i.i.d. inside blocks of residues - not using context dependence.

    From the test I made for 512x512 grayscale images ( http://decsai.ugr.es/cvg/CG/base.htm ), using such context dependence allows for large improvement: see the green dots at top right plot above - they are ~0.5 bits/pixel higher than the remaining using context dependence.

    It basically says that large local noise makes prediction less accurate (large Laplace width), low local noise makes it more accurate (low Laplace width).
    This context dependence makes that instead of a single Laplace width b, there is used distribution of b:

    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	hSoZKiy.png 
Views:	8 
Size:	104.1 KB 
ID:	6633  

  10. #8
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jarek View Post
    LOCO-I ( https://en.wikipedia.org/wiki/Lossless_JPEG ) quantizes context into 365 discrete possibilities treated independently. I didn’t like such brute force approach – without exploiting their dependencies. Any thoughts on that – used, interesting approaches to model conditional continuous distributions, like context dependent width of Laplace for residue encoding?
    It's always a question of finding the right balance between adapting and context dilution. If you have too many contexts, it takes too long to adapt to the statistics of the source because only a small number of samples are collected throughout the source, if you do not have enough, you do not fully exploit redundancy. For JPEG-LS, you do not need to go with the full 365 contexts - you can actually tune parameters to make it work with less contexts. Unfortunately, it is not "obvious" how to select parameters (in advance). It would be interesting to see how it performs with less contexts, or with context model selections different from the default.

  11. #9
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    thorfdbg, I am suggesting to try a different philosophy: instead of increasing the number of contexts treated independently, try to exploit statistical dependencies between them.
    The plot at the top compares with 4 and 11 parameter models - they are already comparable with 365 parameter LOCO-I model ... and allow to exploit higher dimensional contexts like 3 color channels.
    paper: https://www.dropbox.com/s/10hdh8x314...age%20comp.pdf

    Generally, compressors not using such context dependence: that prediction for larger local gradients is less accurate, can get a large improvement if adding it.

  12. #10
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    Generally, compressors not using such context dependence: that prediction for larger local gradients is less accurate, can get a large improvement if adding it.
    Compression of fractal images show this difference at its largest: there one can have a mixture of very quickly changing localities with noiseless slowly changing areas, and context modeling gives most gains.
    Last edited by Jyrki Alakuijala; 9th June 2019 at 06:42.

  13. #11
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    I think this improvement opportunity concerns all types of image compression - exploiting the fact that large gradient of neighboring pixels implies larger prediction error, hence wider (e.g. Laplace) distribution should be used for such pixels.

    I have just briefly tested using more general: exponential power distribution ( https://en.wikipedia.org/wiki/Genera...l_distribution ) instead of Laplace - it covers both Laplace (kappa=1) and Gaussian (kappa=2) distribution, estimation is still trivial:

    (scale parameter b)^kappa = average of |error|^kappa

    Improvement is usually small (~0.01 bit/pixel), however, for some images/regions it can be much larger - here are tests for the 48 images and kappa = 0.9 .. 1.1 every 0.01:


  14. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > If you have too many contexts, it takes too long to adapt to the statistics of the source because only a small number of samples are collected throughout the source,
    > if you do not have enough, you do not fully exploit redundancy.

    In some codecs this is solved by secondary statistics (aka SSE).
    Interpolated SSE is "transparent" (can perfectly simulate primary model without SSE)
    and quite able to compensate for slow adaptation speed of high-order statistics.

    Alternatively its possible to mix predictions of "small" and "large" models, like in paq.
    But if we have to choose one or another, SSE seems to be better.

  15. The Following User Says Thank You to Shelwien For This Useful Post:

    Jarek (8th June 2019)

  16. #13
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Shelwien, but it is still made through counting frequencies?
    Residues are from nearly Laplace distribution - we only need to model its width/scale parameter based on the context.
    Are there used some smarter ways to choose it than just estimating it separately for all e.g. 365 discretized contexts?

  17. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Shelwien, but it is still made through counting frequencies?

    Yes, but we'd need to count for finding optimal parameters of a distribution function anyway.

    I don't see much benefits in a purely static coder based on purely parametric distribution -
    it would show good results only on very small images, otherwise encoding actual statistics is better.

    And then, these days even adaptive models seem fast enough to compete with static ones in speed too - with your help even.

    > Are there used some smarter ways to choose it than just estimating it separately for all e.g. 365 discretized contexts?

    Bruteforce is certainly the only way if you want to find the perfect best result.
    Otherwise it should be possible to apply the usual optimization methods, like gradient descent -
    it should normally provide near-optimal results in O(log(N)) steps instead of full search.
    There's also a choice between fitting the distribution function to collected statistics
    and actually estimating entropy of actual data compression with given parameters.

    As to smarter ways, why not optimize it for all 365 contexts, for a few samples,
    and then find correlations in the results?
    It should be possible to approximate optimal parameter values for some contexts from others,
    though I can't say what's the minimum number of contexts that would still require actual processing
    for good results.

  18. #15
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    For single distribution, the difference between Laplace and frequency counts is that the former requires one parameter ( b = average |error| ), the latter requires the number of parameters as size of alphabet.
    For context dependence this difference is even larger: doing LOCO-I like model using frequency counts (instead of Laplace) would require '365 x alphabet size' number of frequencies.

    And no, bruteforcing is not the only way, we can exploit general trends there instead.
    For example replacing 365 parameter model with models like

    b = b0 + b1 |C-A| + b2 |B-C| + b3 |D-B|

    gives already similar behavior for 4 parameter model (b0, b1, b2, b3), and usually better for 11 parameter model - plots and paper.
    Leastsquares estimation of such parameters is really cheap - can be performed by compressor to be put in the header.

  19. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > For single distribution, the difference between Laplace and frequency counts is that the former requires one parameter, the latter requires the number of parameters as size of alphabet.

    True, but with AC/ANS knowing the precise symbol counts would directly reduce the code size by the amount of information contained in counts,
    while approximated counts would add redundancy, since symbol probabilities would never become zero.

    Also, if there's really a good fit, the distribution function can be used to compress the frequency tables instead.

    > And no, bruteforcing is not the only way, we can exploit general trends there instead.

    That's what I just talked about, yes. Though I'd prefer a more complex and non-linear approximation.
    For example, maybe I'd find the 365 optimal parameter sets for several samples,
    then build a CM compressor for these parameter sets,
    then use a trained context model to generate parameter sets from a given context,
    then see how far it can be simplified.

    But if we'd consider the worst-case scenario - like somebody generating data specifically to exploit your approximation,
    bruteforce is the only certain method to find the optimum.

  20. #17
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Indeed, frequency counts have advantage of exploiting distortion from idealized e.g. Laplace distribution.
    However, it is at cost of requirement to store this distribution in the header - we need to find the best compromise for minimal description length: of size of the model + entropy if using this model.
    While for a single distribution storing counts is relatively cheap, for context dependence this cost grows exponentially with dimension of context - wanting to use higher dimensional context, like all RGB color channels or further pixels, we need to go toward parametrized models.
    They also allows for adaptiveness, e.g. for Laplace use exponential moving average in "b = average |error|".
    Regarding use of AC/ANS, we can have prepared tables for quantized b.

    Regarding estimation, minimal bitlength is for MLE estimation (exactly if neglecting quantization), MLE for Laplace (generally exponential power distributions) is given by average.
    Average is the value minimizing sum of squares of differences, hence its leastsqures seems at least nearly optimal way for estimation (and really cheap).
    Sure, this is heuristics, maybe could be formalized, also works great for more general models for conditional distributions ( https://arxiv.org/pdf/1812.08040 ) ... anyway, you can just perform a few steps of some gradient descent later if it's not.

    And generally I agree that having a few sets of parameters seems a better way - for example using some segmentation (each using separate parameters), or classifier (which value is used as part of the context).
    This way we could optimize on a larger dataset some default set of segments+parameters - stored in compressor, maybe additionally allowing to create additional ones for a specific image.
    Anyway, such philosophy requires going from frequency counts toward parametrized models.

  21. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > While for a single distribution storing counts is relatively cheap, for context dependence this cost grows exponentially with dimension of context -
    > wanting to use higher dimensional context, like all RGB color channels or further pixels, we need to go toward parametrized models.

    I have a different opinion here.
    There're even compressors which only compress statistics instead of data (and BWT is kinda related).
    Yes, with more contexts the contribution of stats code would always grow... but there'd be a corresponding size reduction for actual data.
    Of course, this only applies if adaptive AC/ANS can be used - with golomb code and such your approach may be really the best one.
    However for static AC/ANS or even huffman I think stored freqtables might still win
    (of course freqtables should be compressed - even deflate does that).

    > And generally I agree that having a few sets of parameters seems a better way

    There's also an option of dynamic adaptation of distribution parameters during data processing.
    I think its more interesting than computing and storing the optimal parameter values.
    Eg. we can compute rolling codelength estimations for param+d,param,param-d and update param value if incremented/decremented version is consistently better.
    I actually tested this with CM - it showed better results than logistic mixing, but was much slower.
    Not sure that its impossible to make an efficient implementation though.

  22. #19
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    So imagine you would like to use RGB to choose width of Laplace, e.g. blue often corresponds to sky - better prediction, green to plants - worse predictions.
    This way instead of 4 neighboring pixels, you have 12 dimensional context - for parametrized models it is not a problem, but frequency counts become impractical ...
    Even so, you need quantization of context - while parametrized models smoothly interpolate between such discrete possibilities.
    Anyway, tests show that such 11 parameter model usually give better compression ratio than 365 parametric LOCO-I models - test images and Mathematica notebook: https://www.dropbox.com/sh/b6dsqzf4e...knpoVq5WT1u_Ya

    Regarding adaptation, there is no problem to use it for parametrized models - like just replacing average with exponential moving average, or perform likelihood gradient ascend step with each processed symbol.
    However, adaptation is usually for a sequence, while data compression usually scans line by line - it would be good to make some optimization of this order for adaptive models...

  23. #20
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jarek View Post
    Regarding adaptation, there is no problem to use it for parametrized models - like just replacing average with exponential moving average, or perform likelihood gradient ascend step with each processed symbol.
    Adaption strategies are a bit more sophisticated in practical implementations. For example, look at the MQ coder adaption. It first has a "fast attack" approach for guessing probabilities, and then adapts slower after the initial step. Unfortunately, I do not know how this precise model was obtained.
    Quote Originally Posted by Jarek View Post
    However, adaptation is usually for a sequence, while data compression usually scans line by line - it would be good to make some optimization of this order for adaptive models...
    For JPEG LS, this is true, mostly due to the design constraints of the codec, but that does not hold in this generality. You typically scan after the transform (wavelet or dct), so the game works a bit different, and even then, the scan pattern might be more complicated than linear. Look at the "four column scan" of JPEG 2000 - or the adaptive scan pattern of HDPhoto aka JPEG XR, which is essentially a "move to front list" within each PCT transform block. Granted, HDPhoto does not have any other form of model adaption, so there might be still some headroom available for compression efficiency.

  24. The Following User Says Thank You to thorfdbg For This Useful Post:

    Jarek (9th June 2019)

  25. #21
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    I have never studied MQ coder, but if I properly understand (also for CABAC), it only operates on final bit sequence - predicts probability of the next bit based on context being a few last bits?
    If so, this is a nice final polishing, but has no chance exploiting more complex or just further dependencies, like discussed here width of Laplace distribution based on neighboring pixels - it requires some additional dedicated mechanisms.

    Scanning after transform is rather for lossy compressors, which are a different story, e.g. we could use a separate model for each position in the block, or include position in context of model to exploit general trends.
    As different positions in block have different meaning, instead of adapting inside a block, we should rather focus on adaptation between blocks.

  26. #22
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jarek View Post
    I have never studied MQ coder, but if I properly understand (also for CABAC), it only operates on final bit sequence - predicts probability of the next bit based on context being a few last bits?
    In a sense. It maintains per context a state, and this state is updated according to the last state, and the bit being encoded. It is just a context-sensitive, table driven state machine, except that the arithmetic coding is still "spelled out", even though multiplications are replaced by additions (sounds 'yuck', but works). For CABAC, everything is table driven, adaptivity and encoding.
    Quote Originally Posted by Jarek View Post
    Scanning after transform is rather for lossy compressors, which are a different story, e.g. we could use a separate model for each position in the block, or include position in context of model to exploit general trends.
    That, of course, depends on the transformation. JPEG 2000 supports lossless, and only the transformation is different, not the encoder.
    Quote Originally Posted by Jarek View Post
    As different positions in block have different meaning, instead of adapting inside a block, we should rather focus on adaptation between blocks.
    Not quite sure what you mean. EBCOT is only intra-block coding, there is no inter-dependency (by design choice).

  27. #23
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Such state transition based on single bits can rather catch only very single dependencies, e.g. has no chance to exploit dependencies with pixels from the previous line - what requires separate dedicated mechanisms.

    The last quote corresponded to adaptive models - varying to optimize for local situation.
    For block-based compressors such adaptation should be every block, however, in image compression I would rather use segmentation and separate models for each segment.

  28. #24
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by Jarek View Post
    Such state transition based on single bits can rather catch only very single dependencies, e.g. has no chance to exploit dependencies with pixels from the previous line - what requires separate dedicated mechanisms.
    Well, of course not. The adaption mechanism only guesses conditioned probabilities. The context captures such dependencies, and defines what the probabilities are conditioned on. For JPEG-LS, we have contexts that go cross lines, and so do the EBCOT contexts. Except that the latter has a more complex scanning pattern.
    Quote Originally Posted by Jarek View Post
    The last quote corresponded to adaptive models - varying to optimize for local situation. For block-based compressors such adaptation should be every block, however, in image compression I would rather use segmentation and separate models for each segment.
    Sorry, what is a segment? A block looks like a segment to me - or a potential segment. Not saying that this is ideal since it means that the segment bounds are static, and not adaptive.

  29. #25
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Ok, I wrongly imagined that context is suffix of the encoded bit sequence, but if it works for LOCO-I means that context sequence and encoded sequence can be separate.

    Regarding segmentation philosophy, it assumes that there is a classifier splitting an image into similar segments (e.g. sky, plants etc.) - for each of them we can use a separate model for given type of pattern, as experimentally there are large differences between models optimized for different images.

  30. #26
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Recently reading FLIF documentation ( https://flif.info/spec.html ), I thought that a crucial part of its improvement comes from nonstandard pixel ordering.
    I have just found paper about more options: https://arxiv.org/pdf/1812.01608.pdf

    Click image for larger version. 

Name:	xX4X4ts.png 
Views:	23 
Size:	232.4 KB 
ID:	6669

    So in multiscale we first decode pixels in a sparser lattice, then progressively make it denser - using already decoded pixels as context.
    Additionally, in "depth upscaling", we split the values e.g. to upper and lower 4 bits - first decode upper, much later lower - using also higher as context.

    It should allow for large compression ratio improvement, and is not that expensive ...
    ... but requires modelling many types of high dimensional contexts.
    Doing it LOCO-I style is completely impractical, but parametric approach should have no problem - just a few optimized parameters per context for both predictor and width of Laplace.

    Maybe somebody would like to collaborate on that?

  31. #27
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    667
    Thanks
    204
    Thanked 241 Times in 146 Posts
    Quote Originally Posted by Jarek View Post
    So in multiscale we first decode pixels in a sparser lattice, then progressively make it denser - using already decoded pixels as context.
    If you are willing to look for more inspiration where to plug-in your ideas, I consider Alex Rhatushnyak's (gralic and qlic author, 4x Hutter prize winner) lossless image encoding method to be the current state of the art -- when looking for solutions that come with practical decoding speed.

    If you go for the one million pixels per second class, it may be fine for archiving or other esoteric uses, but not every human is relaxed/not-in-a-hurry/patient enough to wait for 10 or 30 seconds (intel or arm, respectively) between going from one 20 million pixel photo to the next.

    Alex's work is opensourced in pik's research directory. See the lossless16 (16 bits per channel) and lossless8 (8 bits per channel) here:

    https://github.com/google/pik/tree/master/pik/research

    If you want to build something even more practical, a codec that does very well on near lossless is very different from a fully lossless codec. In a typical use a near lossless codec can deliver the same observable quality with 50 % of the bytes, and often decodes faster than a fully lossless.

    With lossless codecs you need to be very careful what corpora you design and benchmark them with, and different approaches are competitive for:
    1. graphics (like company logos)
    2. pixel graphics (for example UI graphics for a website)
    3. photos that are resampled for the internet (often also denoised and sharpened)
    4. photos that are in camera resolution (the artefacts/properties of the image reconstruction algorithm are a lot of entropy there, and dealing with it gives a ~10 % advantage)

  32. #28
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Thanks, I would gladly discuss concepts, maybe perform some simple tests ... but I don't have experience with fast implementations, and work on a few very far projects simultaneously (e.g. hierarchical correlation reconstruction -> neural networks, SGD optimizer, graph isomorphism ... and a few topics in physics starting with Bell violation).
    The methods I am focusing on here are cheap - just calculate predictor and Laplace width based on a few neighboring pixels using inexpensive parametrized formulas.
    I see Alex source is basically similar, the question are details like pixel order, context dependent predictor choice, width of the Laplace - what requires discussion - maybe you could suggest him to come here?

  33. #29
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    I have rethinked this pixel scanning optimization, and we can combine it with e.g. Haar wavelets to get advantages from both - also scan using local context (updated https://arxiv.org/pdf/1906.03238 ):

    Click image for larger version. 

Name:	multi.png 
Views:	26 
Size:	46.6 KB 
ID:	6680

    Also, there are these popular papers predicting probability distribution of e.g. pixels using neural networks:
    Pixel recurrent neural networks (666 citations): https://arxiv.org/pdf/1601.06759
    WaveNet (124 citations): https://arxiv.org/abs/1609.03499

    Instead of predicting entire discrete probability distributions, we can use the fact that in practice they are usually simple parametric distributions like Laplace - we can train neural networks to predict such few parameters instead and then discretize them (arxiv) - getting smaller networks (closer to practical application in data compression), and probably better trained - focused on proper distributions.

  34. The Following User Says Thank You to Jarek For This Useful Post:

    Sebastian (2nd July 2019)

Similar Threads

  1. JPEG issues a draft call for a JPEG reference software
    By thorfdbg in forum Data Compression
    Replies: 11
    Last Post: 19th April 2017, 16:18
  2. Machine Learning to identify weight loss parameters
    By Sportman in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th August 2016, 13:10
  3. Replies: 9
    Last Post: 11th June 2015, 23:28
  4. Replies: 0
    Last Post: 6th February 2015, 06:57
  5. Best compressor and his parameters for Ghost files?
    By surfersat in forum Data Compression
    Replies: 4
    Last Post: 20th March 2014, 18:08

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •