Results 1 to 26 of 26

Thread: Sac: (State-of-the-Art) Lossless Audio Compression

  1. #1
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Sac: (State-of-the-Art) Lossless Audio Compression

    Hello,

    nice little forum you have here - first, let me introduce myself.
    I do research in the field of data compression since about 15 years.
    It all started with Mark Nelsons book, the pages from Charles Bloom, you get the idea.

    I never went to the public, because I often only focused about understanding and (re-)implementing proposed algorithms.

    I have many self written programs: LZSS, PPM, Lossless/Lossy Image/Video/Audio Codecs and so on.

    Now I got stuck somewhere between the latest developments, escpecially CM is very encouraging,
    but there are many things I currently don't fully understand, especially SSE, but that's another story.

    Currently I'm working on yet another LZSS-variant with optimal-parsing and PAQ-style literal encoding which seems promising (near as good as LZMA).

    -snip-

    Ok what's this post really about?

    Some years ago I started to write a Lossless Audio Codec as a research project.
    It's aim was to beat all existing codecs, mainly OptimFROG.

    The current code is back from ~2007 but since there was little change in this field, I hope someone is able to give me some feedback about this.

    Codec internals
    ---------------

    The encoder works as follows:

    3 stages of predicition
    - 1. stage: simple 1. order to remove static noise
    - 2. stage: linear dual-channel predictor with orders up to 32, updated every k-samples with a cholesky decomposition
    - 3. stage: merged 3 stage lms-predictor with orders up to 1280

    The output from the prediction is fed into a context-coder similiar to the one in Monkey's Audio (running average+golomb-coding) but with better modelling.

    There are some special nifty things:

    Files with many holes in the range of their PCM-values, mainly classic audio where
    sac stores a "context-map" of the used values, thereby reducing the needed range for the error-coder.

    Some parameters are optimized on-the-fly on parts of individual samples.

    Drawbacks
    ----------
    - it's slow
    - written with floating point math
    - only supports 16-bit, 44100Hz files
    - decoupling techniques are only used on stereo files
    - overtrained on a special test-set
    - no multi-processor support

    that's it, i think

    I would be really happy if someone gives me some feedback about this piece of software.

    It would be really interesting how Sac compares to compressors which rely only on CM, like PAQ, what I never tested.

    My newer experiments with mixing lead to the impression that there's still room for improvement, which is nice

    best regards
    Sebastian

    The included .exe file is a Win32-commandline-executable compiled with gcc 3.x, call with no options for help

    Example output

    Code:
    Open: 'ATrain.wav': ok (3377110 Bytes)
      WAVE  Codec: PCM (1411kbps)
      44100Hz 16 Bit  Stereo
      844266 Samples [00:00:19.144]
    Create: 'ATrain.sac': ok
      Mode:      Lossless High
      Framesize: 8 Sec (352800 Samples)
      Optimize:  Good  Lss 12\6 (k 32)  Lms 640
        844266/844266: 100.0% (0.0x):  633.2kbps
      Time:      [00:00:00]
      MD5:       [ed71eef0be48f6fbb09e8e2034e3b1b9]
      Frames:     3   Ms: 0.0%   L-R: 0.0%   Mono: 0.0%
      Ratio:     3377110 -> 1515188 Bytes = 44.866%
    Attached Files Attached Files
    Last edited by Sebastian; 2nd October 2010 at 21:54.

  2. #2
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    342
    Thanks
    4
    Thanked 2 Times in 2 Posts
    Quick test with the 88 soundfiles from DoomRL - not the best testbed because all files are 8-bit mono and very short, but anyway, here are the results (only tested sac and paq8p):

    Code:
    Original           959.528 bytes
    sac --fast         576.822 bytes   128 s
    sac --normal       570.716 bytes    90 s
    sac --high         567.880 bytes    92 s
    sac --best         567.463 bytes    92 s
    sac --insane       568.509 bytes    96 s, decompression 47 s
    paq8p -3           557.549 bytes   100 s
    paq8p -4           559.476 bytes   121 s, decompression 122 s
    No idea why --fast was actually slower, rechecked it and time was the same, perhaps some optimization doesn't work on the crappy CPU here (AMD Duron 800 MHz).

    Also tested paq8p -4 on all the files stored in one UHArc archive which results in 527586 bytes, 90 s, but is unfair to compare because sac is a single-file compressor.
    http://schnaader.info
    Damn kids. They're all alike.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Sounds promising. I will have a few very busy days, so I'll have to delay testing it, but I'm VERY interested.
    2 quick questions:
    -Decompression speed?
    -Linux?

  4. #4
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    229
    Thanks
    10
    Thanked 19 Times in 5 Posts
    Sac v0.0.6a4 - Lossless Audio Coder 06-07 pest (May 12 2007)

    Open: 'ding.wav': ok (212900 Bytes)
    WAVE Codec: PCM (1411kbps)
    44100Hz 16 Bit Stereo
    53203 Samples [00:00:01.206]
    Create: 'ding.sac': ok
    Mode: Lossless Insane
    Framesize: 10 Sec (441000 Samples)
    Optimize: Best Lss 16\8 (k 4) Lms 1280


    ding.wav - 212900
    ding.zip - 204953
    ding.paq8f - 171575
    ding.flac - 168554
    ding.rar320 - 167239
    ding.paq8px - 154690
    ding.ofr - 151735
    ding.sac - 151106

  5. #5
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for your evaluations don't be irritated by the name 'pest' its my name on other forums

    Quote Originally Posted by schnaader View Post
    not the best testbed because all files are 8-bit mono and very short,
    very short is not good and Sac is totally optimized for 16-bit, I never tested it on 8-Bit, who uses this anyway?

    Sac shows its true power in normal 3-4min stereo cd-tracks, and as you see --insane is sometimes worse, which i would like to eliminate.

    Don't forget that you're unable to seek in a paq-file. Sac uses individual blocks, and resets the encoder after 4-8s of samples.
    And btw, I am not able to compress with paq8p on my vista64-machine, it always says "-> tmpfile: invalid argument", but paq seems very strong

    Quote Originally Posted by m^2 View Post
    but I'm VERY interested.
    source could be published

    -Decompression speed?
    asymmetric with a factor of ~2 to ~10 depending on optimization level

    -Linux?
    No, not yet

    here's a comparison by myself, sorry for the german names, it's a bit older

    Last edited by Sebastian; 2nd October 2010 at 22:21.

  6. #6
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Quote Originally Posted by Sebastian View Post
    Thanks for your evaluations don't be irritated by the name 'pest' its my name on other forums



    very short is not good and Sac is totally optimized for 16-bit, I never tested it on 8-Bit, who uses this anyway?

    Sac shows its true power in normal 3-4min stereo cd-tracks, and as you see --insane is sometimes worse, which i would like to eliminate.

    Don't forget that you're unable to seek in a paq-file. Sac uses individual blocks, and resets the encoder after 4-8s of samples.
    And btw, I am not able to compress with paq8p on my vista64-machine, it always says "-> tmpfile: invalid argument", but paq seems very strong



    source could be published

    asymmetric with a factor of ~2 to ~10 depending on optimization level

    No, not yet

    here's a comparison by myself, sorry for the german names, it's a bit older

    Oh, the table looks bad. IMO decompression is much too slow. ~2x realtime on a mid-end laptop, with stereo 16/44.1, seek times 0-2 s.. Netbooks are out. 24/96 stereo would be barely playable with multithreading. BluRay top standard, 6 channels 24/96 would take 4 FAST cores running full. I expected it to be slower than OptimFrog, but not so much. Yes, I know it's 16 bit stereo now, I'm just extrapolating.

    I was interested because I feel there's too little happening in the world of lossless audio compression, especially near the top. The strongest stable, cross-platform format is still monkey's audio....

    Nontheless storage formats are valuable too...

    As to source code - judging by what I read on Hydrogen Audio, there was huge number of people paranoid about the author dropping support. If it's representative, publishing sources might be good.
    Last edited by m^2; 2nd October 2010 at 23:18.

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    I couldn't resist and run a quick test, picked up a random album and encoded.
    Results:
    sac normal289 769 737
    ofr best ex292 530 234
    tak 4e293 956 332
    wav433 372 508
    Obviously such results should be taken with a truckload of salt, but the size is great.
    Compression speed is not, 0.25 realtime. I have Pentium D 2.66, so clock for clock it's 5 times slower than yours. It's overtrained for your CPU too.

  8. #8
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Don't worry too much about speed. The code is pure C++ without any optimizations at all, and I focused in the time of development (which took 3 month only) merely on compression.
    You can squeeze already some bytes out of it, by choosing an update-frequency of 1 sample and optimizing over the whole sample block, which should take forever

    I experimented with SSE2/3 (which OptimFROG uses), but had too drop some of the precision. There's a factor of 3 to 10 in the code if you use MT and such stuff.

    The reason why I'm here is mainly some questions in the field of used algorithm, and ways of improving the current state.

    Let me ask one question which bothers me.

    The three mentioned stages are applied after another
    err1 = src - pred1
    err2 = err1 - pred2
    err3 = err2 - pred3
    ->code err3

    but I would like to do something simple but better like
    err1 = src - a1*pred1
    err2 = err1 - a2*pred2
    err3 = err2 - pred3

    with a1,a2 ranging from 0 to 1.

    I wasn't able to come up with something actually working good.
    If you damp the strength of prediction at one stage, it might lead to better compression at a later stage because you introduce less noise and keep correlation at a higher level.

    There are too things I want to try: introduce some mixing of orders in the error-coder and making a good algorithm for the above problem.

    Are there any readable sources about SSE (second symbol estimation)?
    Last edited by Sebastian; 3rd October 2010 at 00:42.

  9. #9
    Member Surfer's Avatar
    Join Date
    Mar 2009
    Location
    oren
    Posts
    200
    Thanks
    9
    Thanked 7 Times in 1 Post
    Please add WavPack results

  10. #10
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I could do it, but Wavpack wouldn't be better than MAC at all, let alone LA or OptimFROG.

    The funny thing is, that I found a cholesky-decomposition with 36/12 coupling in the PAQ8p code - Sac uses 16/8 + higher order predictors + optimization of learning rate
    Last edited by Sebastian; 3rd October 2010 at 01:25.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    2,609
    Thanks
    126
    Thanked 364 Times in 235 Posts
    Quote Originally Posted by Sebastian View Post
    And btw, I am not able to compress with paq8p on my vista64-machine, it always says "-> tmpfile: invalid argument", but paq seems very strong
    It is a stupid bug in the g++ implementation of tmpfile() (and tmpnam()) in stdio.h. It tries to create a temporary file in the root directory of the current drive (instead of someplace sensible like %TEMP%), but in Vista and Win7 it does not have write permission on C:\. To get around it, run it on a different drive than C: (or wherever Windows is). I think the Intel compiled versions should also work but they don't take wildcards on the command line.

    I would fix it but I am no longer maintaining PAQ.

  12. #12
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for the info.

    Another good example of Sac strength using the special "hole-compression"

    Code:
    ATrain.wav        3.377.110 (Jazz)
    paq8px_v68 -6     1.944.016 
    Sac --best        1.498.795
    
    Bachpsichord.wav  4.410.078 (Classic)
    paq8px_v68 -6     3.016.350 (~8min to compress)
    Sac --best        2.159.785 (~2min to compress)
    - keep on developing?

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,061
    Thanks
    0
    Thanked 6 Times in 5 Posts
    Thanks for posting your coder, its always nice to have something new around.
    However, although it really does look like state-of-art, its also what makes it
    somewhat boring - same prediction method and similar results everywhere.
    But solutions for practical problems, like archiving multiple versions of the same song
    or diff coding (its common to see the same song in .mp3 and .flac in the rips),
    or handling of audio decoded from lossy formats (they're not so uncommon among
    "HQ" releases in lossless formats) are nowhere to see.
    Also, mysteriously, afaik these state-of-art lossless audio codecs have nearly
    nothing common neither with (psycho)acoustics - they don't have solutions
    for fragment similarity detection or sound distribution/source location or
    sound source models or processing models, nor with modern context modelling -
    any modelling of prediction error can't replace proper mixing of predictions
    of different submodels.

    > The included .exe file is a Win32-commandline-executable compiled with gcc 3.x

    I'd suggest recompiling with gcc 4.5, like from
    http://tdm-gcc.tdragon.net
    with -O3 and maybe -fprofile-use

    > Are there any readable sources about SSE (second symbol estimation)?

    http://cs.fit.edu/~mmahoney/compression/fpaq0f2.zip,
    maybe m1 (http://sites.google.com/site/toffer8..._0.6_100206.7z, codec.h:96)
    maybe a diff from http://compression.ru/sh/ppmy_3c.rar to http://compression.ru/sh/ppmy_3c_sse.rar

    Though actually the idea is pretty obvious, and you won't be able to just plug
    an existing implementation into your codec anyway - just write your own version.

    Basically, it works like this. Symbols (ie bits these days) appearing in a context can be
    written in sequence (called "context history").
    For example, after "00111101011101010110101110" the context history for "01"
    would be "10100101" (note that i just typed that randomly), and its clear that
    context modelling can help to predict the next bit to appear in this context,
    same as with primary input.
    And then, we can use the same secondary statistics for a group of contexts,
    which can be a huge improvement if their histories are similar.

    That's basically all you have to know about SSE - actual implementations
    can be confusing, because they tend to use quantized probabilities
    as secondary contexts (note that probabilities are functions of context histories),
    and so the specific quantization function that works in some coder might
    not do any good when attached to your model - you have to find actual dependencies
    in your own context histories.

    Notable SSE-related tricks include interpolation of statistics based on secondary
    context quantization remainder and multidimensional SSE components which can work
    as advanced submodel mixers.

  14. #14
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    same prediction method and similar results everywhere.
    Please remember that I first tried to catch up with existing programs, which took a very long research phase.

    But solutions for practical problems, like archiving multiple versions of the same song
    or diff coding (its common to see the same song in .mp3 and .flac in the rips),
    or handling of audio decoded from lossy formats (they're not so uncommon among
    "HQ" releases in lossless formats) are nowhere to see.
    That is no problem of the actual compression algorithm in my opinion, but programming.
    The core makes perhaps 1% of the actual code, and the rest is hard work.
    Recompression of lossy-formats is nothing special, it only introduces more noise.
    A so called hybrid approach is nothing which brings you near the top.

    One way to overcome some of the problems is specializing on different types of music.

    Direct similarity approaches like exakt match finding are nearly useless even for high repetitive music (ranging to maximal 1% matching).


    Also, mysteriously, afaik these state-of-art lossless audio codecs have nearly
    nothing common neither with (psycho)acoustics - they don't have solutions
    for fragment similarity detection or sound distribution/source location or
    sound source models or processing models
    I've experimented a lot. I've tried sound source models with wavelet-decompositions and what not,
    but nothing gave better results than the standard predict+error-coding approach.

    any modelling of prediction error can't replace proper mixing of predictions
    of different submodels.
    I don't really see the difference to my current problems mentioned above.
    One predictor-stage itself is a mixing , because you combine N-samples and try to make a good prediction from it.
    A form of nonlinearity is archived via cascading different predictors.

    LA uses some clever methods. It uses more stages but with very simple predictions in the form of prediction=sample[-k],
    but there seems to be some clever heuristics to the question of how much to apply each stage,
    which circumvents later stages of loosing too much. That's where I'm standing now.

    The compression ratio obtained by Lossless Audio Coding is generally very small, and at some stage you're rather trying to compress random noise.

    Don't forget that there's also a huge gap between say ZIP and LZMA and it's all about finding these small little tricks.
    The indroduction of the "hole-compression" mentioned above improves compression by 5%,
    and as far as I know only the closed source programs TAK and OptimFROG employ something like this.

    Though actually the idea is pretty obvious, and you won't be able to just plug
    an existing implementation into your codec anyway - just write your own version.
    All code is written by myself, and I am not a fan of this either.

    Basically, it works like this. Symbols (ie bits these days) appearing in a context can be
    written in sequence (called "context history").
    For example, after "00111101011101010110101110" the context history for "01"
    would be "10100101" (note that i just typed that randomly), and its clear that
    context modelling can help to predict the next bit to appear in this context,
    same as with primary input.
    Ok, now it appears clear to me. Thanks for the explanation and the links.

    I will try some stuff in the next weeks when I get time.
    Last edited by Sebastian; 3rd October 2010 at 20:55.

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,061
    Thanks
    0
    Thanked 6 Times in 5 Posts
    >> same prediction method and similar results everywhere.
    >
    > Please remember that I first tried to catch up with existing
    > programs, which took a very long research phase.

    Yes, I just want something new to learn from, don't mind it :)
    Also I'm spending too much time in attempts to reinvent the wheel
    (its fun, though), while working complete programs, like yours,
    usually get more attention.
    Just that when you're trying to recreate somebody's method and
    then go further, it frequently fails, because previous authors
    already tried all they could before you :)

    > That is no problem of the actual compression algorithm in my
    > opinion, but programming.
    > The core makes perhaps 1% of the actual code,
    > and the rest is hard work.

    In a way, but its not that simple. I wanted to show that for practical tasks,
    like what I mentioned, you'd need muliple such "cores", beside all the maintenance work.

    > Recompression of lossy-formats is nothing special, it only introduces more noise.

    mp3 recompression and such is another thing. But when you have a 3M .mp3
    and decode it to 30M .wav, the lossless coders then compress it to 15M, and you
    can't get 3M again without further quality loss.
    And what I meant is that an algorithm for "mp3 reconstruction" would be completely
    different from linear prediction.

    > Direct similarity approaches like exact match finding are nearly
    > useless even for high repetitive music (ranging to maximal 1%
    > matching).

    One common application of lossless compression is backup of
    tracks and draft versions in sound editing.
    But even plain detection of exact matches requires some special
    research - eg. see Bulat's rep/srep.
    And for audio normally we'd need a method to detect matches
    despite volume changes and filtering.

    > I've experimented a lot. I've tried sound source models with
    > wavelet-decompositions and what not, but nothing gave better results
    > than the standard predict+error-coding approach.

    Well, its a result too - it'd be interesting if you could write
    a detailed report about that.
    But do you really claim that what you do is the only right approach to
    lossless audio compression and things which I mentioned won't ever
    affect the compression ratio?

    >> any modelling of prediction error can't replace proper mixing of
    >> predictions of different submodels.

    > I don't really see the difference to my current problems mentioned above.
    > One predictor-stage itself is a mixing, because you combine
    > N-samples and try to make a good prediction from it.

    Sure, prediction and delta coding is a useful method. In fact it could be
    used more actively in "universal" compressors like paq - and I don't mean analog data.
    For example, it should be possible to find the most probable symbol (eg. byte) by
    primary model's probability distribution and use it as a context for the secondary model.

    But from my p.o.v., what you have is still different from submodels in CM sense.
    For example, suppose you'd tune a few model instances for different types of music.
    Then, how would you combine them? By choosing (and encoding) the optimal model
    index for each block?
    Anyway, there's a method to really mix the predictions of multiple LPC models,
    I call it "probability translation" - http://nishi.dreamhosters.com/u/wav_00.rar
    There's also an implementation of logistic mixer with bruteforce update, which
    is the main reason for its (awful) speed, but mainly it demonstrates probability
    translation and mixing.

    > That's where I'am standing now. The compression ratio obtained by
    > Lossless Audio Coding is generally very small, and at some stage
    > you're rather trying to compress random noise.

    Yes, but your methods may be the main reason for that.
    I mean, its always better to work with original data, while modelling
    spectral coefficients or prediction errors in abstraction from the
    actual source is just a ugly tradeoff to gain speed.
    For example, I'd compare it to applying BWT to audio data and then
    tuning postcoders for it - there can be local improvements, but
    overall the transform just makes it hard to reach the data dependencies.
    Anyway, you're not concerned about speed, right?

    > Don't forget that there's also a huge gap between say ZIP and LZMA
    > and it's all about finding these small little tricks.

    I'd say its a wrong example - the main difference there is window/dictionary size.
    Then also optimal parsing and better entropy coding, but these are not
    so important and can't be called "little tricks" anyway.

    > The indroduction of the "hole-compression" mentioned above improves compression by 5%,

    It'd be nice if you could explain it in more detail.

    > and as far as I know only the closed source programs TAK and
    > OptimFROG employ something like this.

    Btw, do you know such compressors as rkau/winrk,sbc,nanozip,mma?
    There's actually more competition at this "state-of-art" level than what you imply.

    > All code is written by myself, and I am not a fan of this either.

    Well, sure, but its kinda hard to pinpoint.
    Do you use your own implementation of arithmetic coding, statistical counters,
    mixers (if you have any) etc?
    I meant that you can see something like p=map[p>>7] in a coder (its already a real SSE),
    and it would improve compression there, but won't work at all in your circumstances.

    > Ok, now it appears clear to me. Thanks for the explanation and the links.

    It'd be nice if you could make some progress along that line.
    To paq design, its an alien idea, not very compatible with its other components,
    while it can be much more impressive in other cases.

  16. #16
    Member DARcode's Avatar
    Join Date
    May 2009
    Location
    Genoa, Italy
    Posts
    16
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Thumbs up

    Quote Originally Posted by Sebastian View Post
    I could do it, but Wavpack wouldn't be better than MAC at all, let alone LA or OptimFROG.

    The funny thing is, that I found a cholesky-decomposition with 36/12 coupling in the PAQ8p code - Sac uses 16/8 + higher order predictors + optimization of learning rate
    Please give WavPack Version 4.60 -hhx6 a try, you won't be disappointed by neither the compression ratio nor the decoding speed, it'd definitely be a nice addition to these tests.

  17. #17
    Member PAQer's Avatar
    Join Date
    Jan 2010
    Location
    Russia
    Posts
    18
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Ethnic track small bench.
    CodecSizePreset
    OptimFROG 4.6 ex 12 704 584 bestn, exp
    SAC 13 196 399 best
    OptimFROG 4.6 ex 13 855 085 bestn
    TAK 2.0 14 169 653 p5m
    WavPack 4.6 14 980 944 hh,x6


    Quote Originally Posted by Sebastian
    I experimented with SSE2/3 (which OptimFROG uses)
    I was found info only about "first" SSE. Why you thinking that's OFR supports SSE2/3?
    Last edited by PAQer; 4th October 2010 at 12:33.

  18. #18
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    while working complete programs, like yours, usually get more attention.
    I would consider this program as minimum bevor publishing.
    I have self written libraries with APEv2 Tag support and so on, but I'm too lazy to code a Foobar2000 plugin for example.

    Just that when you're trying to recreate somebody's method and
    then go further, it frequently fails, because previous authors
    already tried all they could before you
    Let me take a look at the source of Sac.
    There are numerous variants of least-mean-square filters, as VLMS (adaptive learning rate),
    CPVLMS (learning rate with speed and accuracy coefs), dual-channel LMS and i even
    tested variants with transform-domain lms. In the end I sticked to a simple dual-channel variant.

    But when you have a 3M .mp3 and decode it to 30M .wav, the lossless coders then compress it to 15M, and you can't get 3M again without further quality loss.
    That is nearly impossible without the original mp3, or it would be pure guessing about the prior encoder structure.


    But even plain detection of exact matches requires some special
    research - eg. see Bulat's rep/srep.
    I accidentally tested srep on a wav-file, and I don't see something special about it, except the usage of really large dictionaries.


    Well, its a result too - it'd be interesting if you could write
    a detailed report about that.
    That's like 6 years in the past sorry I don't remember any good results.
    But it seems logical to me, that a form of lossless subband-decomposition should help.
    I only need a good idea of mixing that model with my main model.
    Linear mixing with p=a0*p1+...+an*pn and least-squares approximation of filter-coefs is not the way to go in my opinion.

    But do you really claim that what you do is the only right approach to
    lossless audio compression and things which I mentioned won't ever
    affect the compression ratio?
    Sure not, but for example mixing different models will only lead to minimal gains.
    But an entire different approach could be very useful on special types of music and structure.

    Then, how would you combine them? By choosing (and encoding) the optimal model
    index for each block?
    one way

    Anyway, there's a method to really mix the predictions of multiple LPC models,
    I call it "probability translation" - http://nishi.dreamhosters.com/u/wav_00.rar
    There's also an implementation of logistic mixer with bruteforce update, which
    is the main reason for its (awful) speed, but mainly it demonstrates probability
    translation and mixing.
    Could you give me a brief explanation? It would be very helpful.
    The code is a mess in my eyes, but I am new in paq-like encoding.
    If I reinvent the wheel for myself, I could understand your code better.
    Do you mix an a bit-level?

    The only thing I've done is some simple bit-mixing like in PAQ1.
    I don't know if something like this is actually stupid:
    - two counters, stationary and non-stationary.
    - mix the two prediction made by the counters in every model, and with a mixer working the same, i mix every model

    Yes, but your methods may be the main reason for that.
    I mean, its always better to work with original data,
    Now we can discuss, what is original in pcm-audio. The waveform or its spectral-form?

    > The indroduction of the "hole-compression" mentioned above improves compression by 5%,
    It'd be nice if you could explain it in more detail.
    Nothing special, and as it seems OptimFROG is sometimes really better in employing this.
    First you have a defined range of used values in your pcm-data.
    There are some files where specific pcm-values never occure.
    Now you know: If I code the error of a prediction some values can never occure thus narrowing the range needed by the coder.

    I wonder if using this in actual prediction gives good results too? Or a transformation of the original input to form a continous range of used values.

    So many things to test, and so few time

    Btw, do you know such compressors as rkau/winrk,sbc,nanozip,mma?
    There's actually more competition at this "state-of-art" level than what you imply.
    I know them by name, but are they really better on average, except for loud rock-songs where you merely hit 70%.

    Do you use your own implementation of arithmetic coding, statistical counters,
    mixers (if you have any) etc?
    You got me here. The arithmetic coding is the public-domain rangecoder by Dmitry Subbotin,
    but I give propper credits in the code. My working mixer code (very very simple), is from paq1, to understand how it's working. The fixed point math makes it often hard to understand.

    It'd be nice if you could make some progress along that line.
    Too bad I had much spare time the last 2 month and now university is starting again.
    My main focus for Sac is cleaning the source. Reducing the amount of "wtf am I doing here and why does this works actually" and copying the simple mixer from my LZ-Encoder.

    Quote Originally Posted by PAQer
    I was found info only about "first" SSE. Why you thinking that's OFR supports SSE2/3?
    To stop potentional confusion with second symbol evaluation. Nothing to be fishy about.
    Last edited by Sebastian; 4th October 2010 at 16:35.

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,061
    Thanks
    0
    Thanked 6 Times in 5 Posts
    > There are numerous variants of least-mean-square filters,

    There's a belief that
    "MSE criterion yields an optimal solution only if the
    input signal is Gaussian"
    Did you ever try optimization metrics based on entropy?
    It was very helpful at least for derivation of logistic
    mixer update formula in paq.

    >> But when you have a 3M .mp3 and decode it to 30M .wav, the lossless
    >> coders then compress it to 15M, and you can't get 3M again without
    >> further quality loss.
    >
    > That is nearly impossible without the original mp3, or it would be
    > pure guessing about the prior encoder structure.

    Its hard to recover the precise original mp3 (and would require fitting
    to one specific mp3 decoder executable, as they can produce slightly
    different results even with different compiler options), but significant
    lossless compression improvement based on known mp3 quantization method
    is very likely.

    >> But even plain detection of exact matches requires some special
    >> research - eg. see Bulat's rep/srep.
    > I accidentally tested srep on a wav-file, and I don't see something
    > special about it, except the usage of really large dictionaries.

    It would be special if you'd have a few copies of the same wav in archive.
    Sure its dumb, but happens all the time, and audio compressors won't notice
    anything.
    But anyway, I was talking about the fact that even this, rather simple,
    task actually requires some research, or you won't be able to find
    matches in multiple GBs of data.

    > But it seems logical to me, that a form of lossless subband-decomposition should help.

    Depending on data type, I guess - at least I tried making a lossless
    MDCT-based audio coder, with reasonably good entropy part, and
    it was really bad (flac level or slightly worse, but much worse than ofr anyway)

    > I only need a good idea of mixing that model with my main model.

    Well, you can have multiple audio models in paq.

    >> Anyway, there's a method to really mix the predictions of multiple LPC models,
    >> I call it "probability translation"

    > Could you give me a brief explanation? It would be very helpful.

    Different LPC models do statistical modelling for different residuals,
    so you can't directly mix them.
    But you can translate the residual probability distributions into
    p.d.'s of input data (and its O(bits-per-sample) with appropriate data
    structures) and then mix these.

    > Do you mix an a bit-level?

    Yes. It makes most sense for technical reasons - like rangecoding
    simplicity/speed, mixing precision etc.

    > I don't know if something like this is actually stupid:
    > - two counters, stationary and non-stationary.
    > - mix the two prediction made by the counters in every model, and
    > with a mixer working the same, i mix every model

    Its still kinda too general.
    Anyway, logistic mixing is good, but 2-way mixer tree is better than single N-way mixer,
    also note that mixers can be selected by context too, and you can tune their parameters.
    As to counters, its possible to implement a whole simple model for a counter, and
    hide it within a state machine, like what Matt did.

    Btw, a simple test is to write down your data (residuals, coefs, whatever you have there)
    and compress it with paq8 - a properly tuned model should show better results than paq.

    > Now we can discuss, what is original in pcm-audio. The waveform or its spectral-form?

    I can't be sure either, because of all the digital mixing and filtering which is usually
    applied. But its a fact that spectral form is very bad for lossless compression.

    I'd even say that its actually bad for lossy compression too - even if there was a
    perfect spectral quantization method, in theory its possible to translate the spectral
    statistics into signal statistics, and then use a few alternate submodels and mixing.
    But that transform is too complex for it to happen anytime soon.

    > Now you know: If I code the error of a prediction some values can
    > never occure thus narrowing the range needed by the coder.
    > I wonder if using this in actual prediction gives good results too?
    > Or a transformation of the original input to form a continous range
    > of used values.

    I'd say the better way is to mask the missing values at entropy coding
    stage - that kind of shrinking transformation would be likely a kind of
    nonlinear distortion.
    Although its another matter if you can find and revert the function
    which originally caused that (like in sound sound editor) - like it
    can be caused by a simple rounding error in integer multiplication.
    (ie volume change via a'=a*150/100)

    >> Btw, do you know such compressors as rkau/winrk,sbc,nanozip,mma?
    > I know them by name, but are they really better on average, except
    > for loud rock-songs where you merely hit 70%.

    Well, you can see some stats there -
    http://ctxmodel.net/files/MIX/lamix_v0.htm
    http://ctxmodel.net/test/mma00.htm
    They may be not quite as good compression-wise, but there's a matter of reasonable speed too.

    > The arithmetic coding is the public-domain rangecoder by Dmitry Subbotin,

    I'd suggest to test sh_v1m from http://ctxmodel.net/files/PPMd/ppmd_Jr1_sh8.rar
    (or maybe older version from http://compression.ru/sh/coders6c2.rar)
    That's public domain too, and has better decoding speed and precision
    (less redundancy comparing to precise entropy).

    Also do you have any comments about http://www.ctxmodel.net/rem.pl?-37 ?

    > My main focus for Sac is cleaning the source. Reducing the amount of
    > "wtf am I doing here and why does this works actually" and copying
    > the simple mixer from my LZ-Encoder.

    Well, yes, because for now its the same closed-source as others :)

  20. #20
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    There's a belief that
    "MSE criterion yields an optimal solution only if the
    input signal is Gaussian"
    I don't know if we both mean the same, but it is easy to proof that
    under the assumption of a normally distributed error the maximum likelihood of a linear estimator leads to the least-squares criterion.

    Did you ever try optimization metrics based on entropy?
    No, but on the other side that would lead to completely different predictors.


    Depending on data type, I guess - at least I tried making a lossless
    MDCT-based audio coder, with reasonably good entropy part, and
    it was really bad (flac level or slightly worse, but much worse than ofr anyway)
    Was the subband-decomposition of MDCT fixed? I used a wavelet-packet-decomposition based on some metrics. Order-1 entropy if i remember correctly.
    This helps alot because you don't want to model critical subbands in a lossless approach.

    Its still kinda too general.
    Anyway, logistic mixing is good, but 2-way mixer tree is better than single N-way mixer,
    also note that mixers can be selected by context too, and you can tune their parameters.
    As to counters, its possible to implement a whole simple model for a counter, and
    hide it within a state machine, like what Matt did.
    The math behind the current paq-structure is not too hard and I think the many stages of refinement
    are only needed because of inappropriate modelling. I like the idea of optimality in some sense, so using a
    a better algorithm for minimum search instead of gradient descent could possibly lead to better improvements.
    I already do mixer selection via contexts but the improvements are very tiny.

    Perhaps, if you have the time and nerves you can explain me that thing called "StateMap". That's what I understood:
    You map contexts with similar bit-histories to a counter/state and use that. But why is this better than just using
    the smaller history as a context?

    Btw, a simple test is to write down your data (residuals, coefs, whatever you have there)
    and compress it with paq8 - a properly tuned model should show better results than paq.
    that's a good idea

    I'd suggest to test sh_v1m from http://ctxmodel.net/files/PPMd/ppmd_Jr1_sh8.rar
    (or maybe older version from http://compression.ru/sh/coders6c2.rar)
    That's public domain too, and has better decoding speed and precision
    (less redundancy comparing to precise entropy).
    The idea of using 64-bit low is good and I fitted the coder into my classes. Gives a tiny improvment size/speed-wise
    thank you

    Well, yes, because for now its the same closed-source as others
    If you say: "give me the code" I'll upload it immediately. I focus on clean reading like

    PCM.ReadData
    Encoder.PredictBlock
    Encoder.CodeBlock
    ...

    so yeah, if you like, just tell me, because I'm currently focusing on yet another LZ77 coder

    Don't forget, at the time I wrote this, I barely had the math-skill to understand what I'm doing (now I have a degree in applied math).
    For example, I really don't know why I am adding an error estimation on the main diagonal of the covariance estimation, but it helps?!
    I really have to go deeper into this stuff again to improve something but I collected some
    simple ideas to improve it a tiny bit at least. For example, the multistaged LMS-predictor is in fact a simple linear neuronal network .
    The different algorithm are often not too different at all if you take a closer look.

    I also discovered that my optimization algorithm is dumb, because I never checked if the real output follows my assumptions.
    I'll try some form of simmulated annealing in 2weeks or so.
    Last edited by Sebastian; 11th October 2010 at 15:11.

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    2,609
    Thanks
    126
    Thanked 364 Times in 235 Posts
    This explains a little about StateMaps. http://mattmahoney.net/dc/dce.html#Section_413

    They are very similar in PAQ and ZPAQ although the tables are not exactly the same. The idea is that if you have a history like 0000000001 in some context, you don't have enough information to estimate the probability that the next bit will be 1. For example, if the source is stationary then it is 0.1 since you have 1 out of 10 bits set. If it has changing statistics then the most recent bits should have more weight so it is higher than 0.1. If the data is random, the sequence might have occurred by chance and it is 0.5. The solution is to look at what happened when the same sequence occurred in other contexts. That's what an indirect model does.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,061
    Thanks
    0
    Thanked 6 Times in 5 Posts
    >> There's a belief that
    >> "MSE criterion yields an optimal solution only if the input signal is Gaussian"

    > I don't know if we both mean the same, but it is easy to proof that

    Sure, with a formal definition. But the quote I posted is relatively
    fuzzy in that sense.

    And my point there was that actual signal is not really like that -
    it may be locally similar, but isn't stationary anyway.

    >> Did you ever try optimization metrics based on entropy?
    > No, but on the other side that would lead to completely different predictors.

    Yes, and likely better, because entropy is what you care about (=compressed size).
    Btw, afaik, Matt tested both MSE- (in http://mattmahoney.net/dc/paq.html#neural)
    and entropy-optimized update functions, and the latter was clearly superior.

    > Was the subband-decomposition of MDCT fixed?

    It does a 1024-point MDCT, then compresses the result as a 1024xN table
    of integers. Surely improvements are possible there too, but LPC seems much
    more promising comparing to that.

    > The math behind the current paq-structure is not too hard and I
    > think the many stages of refinement are only needed because of
    > inappropriate modelling.

    Not really. Or, rather, you just can't implement "appropriate modelling",
    because it implies infinite memory/complexity.

    > I like the idea of optimality in some sense, so using a a better
    > algorithm for minimum search instead of gradient descent could
    > possibly lead to better improvements.

    Its not gradient descent. Its a bayesian inference, like
    Code:
    C[n,k] = n!/k!/(n-k!)
    p0 = 1/C[n0+n1+1,n0+1] // probability of string with (n0+1) 0's and n1 1's
    p1 = 1/C[n0+n1+1,n0]   // probability of string with n0 0's and (n1+1) 1's
    p = p0/(p0+p1) = (n0+1)/(n0+n1+2) // normalize because its either this or that
    Or a minimum entropy extrapolation, like
    Code:
    f[p] = n1*Log[1-p]+n0*Log[p] + Log[p]+Log[1-p] // n1 1's, n0 0's, then 01 for extrapolation
    f'[p] = (n0+1)/p - (n1+1)/(1-p)
    f'[p]==0 -> p=(n0+1)/(n0+n1+2)
    (In another thread here, I demonstrated derivation of paq8 mixer update in a similar way)

    > I already do mixer selection via contexts but the improvements are very tiny.

    How did you check that?
    By manually adding context and seeing that improvement is tiny?
    Actually such approach won't always work even for simple counters
    (ie higher order context doesn't always mean better compression).
    At least, you have to tune mixer parameters after changing its context,
    and ideally you have to retune the whole model, starting with parameters
    of mixed submodels (including contexts used there).

    > Perhaps, if you have the time and nerves you can explain me that
    > thing called "StateMap".

    paq's StateMap is Matt's thing, so you can look at Matt's explanations
    (there're plenty - description in paq source, DCE, http://cs.fit.edu/~mmahoney/compression/stategen.cpp, etc).

    Alternatively, here's a version of ccm counter state table generator:

    Code:
    byte freqmask[64*64];
    
    struct fsm {
      byte s[2]; // next state after bits 0,1
      byte p5;   // 5-bit prob quant
      byte p6;   // 6-bit prob quant
      word n1;   // bit==1 freq
      word n0;   // total freq (0+1)
    } cstate[256];
    
    int Init_FSM_States( int x=0, int y=0 ) {
      static int state = 0;
      if( state==0 ) memset( freqmask, 0xFF, sizeof(freqmask) );
      if( (x==57) || (y==57) ) x<=y ? y-- : x--;
      byte& r = freqmask[y*64+x];
      if( r==0xFF ) {
        r = state;
        int s = state++;
        cstate[s].n0 = y+x;
        cstate[s].n1 = y;
        cstate[s].p5 = (y*4+1)*32/(x*4+1 + y*4+1);
        cstate[s].p6 = (y*4+1)*64/(x*4+1 + y*4+1);
        cstate[s].s[0] = Init_FSM_States( x+1, (y+3)/4 );
        cstate[s].s[1] = Init_FSM_States( (x+3)/4, y+1 );
      }
      return r;
    }
    But its ad hoc both in paq and ccm, and there's a better (afaik) approach.
    Suppose that we have a simple counter model with two 12-bit counters and static mixing:
    Code:
     unsigned r:12;
     unsigned q:12;
     [...]
     p = (r*5+q*3)>>3; // compute the probability
     rc.Encode( bit, p ); // encode bit
     // update the model
     if( bit==1 ) {
       r += (((1<<12)-r)>>5);
       q += (((1<<12)-q)>>4);
     } else {
       r -= (r>>5);
       q -= (q>>4);
     }
    Now, this thing already looks relatively complex, but it only has
    2^24 possible different states, so its possible to implement it
    as a direct state machine:
    Code:
     unsigned state:24;
     [...]
     p = LUT[state].p; // compute the probability
     rc.Encode( bit, p ); // encode bit
     state = LUT[state].next[bit]; // update the model
    But such a plain version is huge, so it would make sense to look
    for similar states and merge them, to reduce the size of state table.
    One possibility (on which afaik paq stategen is based) is to merge
    states with equal coding probabilities, which would already reduce
    the table size to 2^12.
    But overall, I don't think that some perfect universal solution
    exists for this task, and you don't have to do it in realtime anyway,
    so I think that bruteforce methods would be better -
    ie for all state pairs compute how much merging them
    hurts compression, then merge some (by entropy metric)
    and repeat until getting under state number threshold.

    > That's what I understood:
    > You map contexts with similar bit-histories to a counter/state and
    > use that. But why is this better than just using the smaller history
    > as a context?

    I like to compare counters with lossily compressed codes of histories.
    In other words, counters actually store more (useful) information
    about context history than the same amount of most recent bits.
    Actually I experimented with this, and found that I needed 80 bits
    of history to simulate the counter in my coder (by allocating a new
    counter, then updating it with preserved history bits, then using its prediction).

    > The idea of using 64-bit low is good and I fitted the coder into my
    > classes. Gives a tiny improvment size/speed-wise

    Good. Btw, one very good rc-related speed optimization is to declare
    the rc state as a bunch of local variables in a function, instead of
    structure/class fields.
    You can see it here: http://ctxmodel.net/files/fpaq0pv4b3.rar
    The idea is like this: local vars can be mapped to cpu registers
    and separately preserved in random stack locations when necessary,
    without caring about object state consistency. But compilers don't
    bother to analyze structures in that sense and so allocate them
    in memory, with fields in specific order too. And when frequently-accessed
    values are stored in memory, its really scary, because compiler
    would store and reload the values in registers around any (non-"restrict")
    pointer accesses and (not inlined) function calls.

    But unfortunately C++ doesn't have local functions, so I'm still
    searching for a good solution for this - the one in fpaq above
    uses perl preprocessors to unroll template functions and convert them into macros,
    thus simulating the local functions, but its not easy to apply to complex coders.

    Another idea was to define a macro like "int& low, int& range, ..." and add it
    to parameters of all related functions, but the number of function arguments
    affects inlining heuristics, and compilers may not even see through it after
    enough nested calls.

    Do you have any workaround ideas for this?
    Actually the issue with scratch vars declared as class fields is imho very common.
    In some cases it can be solved by replacing iterators with coroutines, but that's
    a "heavy" method which is hardly applicable to arithmetic coding.

    >> Well, yes, because for now its the same closed-source as others

    > If you say: "give me the code" I'll upload it immediately.

    I can't say that I need it, and can't even promise to read it.
    But I think that overall it could be helpful to have it available -
    at least somebody would compile a faster executable for you.
    Anyway, people here won't troll you for the code quality, so
    why not attach the source, if you intend to open it at all?

    > I really have to go deeper into this stuff again to improve
    > something but I collected some simple ideas to improve it a tiny bit

    Afaik the common mistake in compression is when people implement
    complex high-level models (with MSE or at best approximated likelihood metrics),
    but don't care to spend time on actual entropy coding.
    For example, a few times I was asked to implement a postcoder for some data
    and given some delta-coded samples (they noticed that values are similar
    and decided to compress deltas). So I had to guess that these are deltas
    (obviously nobody would tell right away) and sum them back to actual values,
    to get better compression with context modelling of actual values.

  23. #23
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,462
    Thanks
    8
    Thanked 37 Times in 27 Posts
    Quote Originally Posted by Shelwien View Post
    But unfortunately C++ doesn't have local functions, so I'm still
    searching for a good solution for this - the one in fpaq above
    uses perl preprocessors to unroll template functions and convert them into macros,
    thus simulating the local functions, but its not easy to apply to complex coders.
    C++ has local, callable objects which can be used to build local functions. Boost has a library to make it usable. And lambda expressions are a part of C++0x.

  24. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    2,061
    Thanks
    0
    Thanked 6 Times in 5 Posts
    @m^2:
    I think you missed the point.
    There's a known optimization problem for which they even had to invent a keyword: http://en.wikipedia.org/wiki/Restrict
    And for the same reasons current compilers have problems with generation of fast code for access to structure fields -
    and it likely won't be improved anytime soon.
    C++0x lambda may be a solution, but C++0x itself doesn't seem to be reliable enough yet.
    As to that boost lambda, its certainly cool, but doesn't seem like a speed optimization at all.
    And as to local classes, their methods can't access local vars.

  25. #25
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    214
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanks for all the input especially Shelwien, I'm currently testing various things beside my little LZ-Codec

  26. #26
    Member
    Join Date
    Jun 2008
    Location
    L.E.
    Posts
    281
    Thanks
    15
    Thanked 8 Times in 6 Posts
    any news? for new version?

Similar Threads

  1. Compression state-of-art discussion with C.Bloom
    By Shelwien in forum Data Compression
    Replies: 7
    Last Post: 27th October 2011, 07:18
  2. Comparison of lossless PNG compression tools
    By Surfer in forum Data Compression
    Replies: 54
    Last Post: 19th September 2011, 23:58
  3. Replies: 13
    Last Post: 3rd April 2010, 00:46
  4. Audio Compression
    By osmanturan in forum Data Compression
    Replies: 128
    Last Post: 8th March 2009, 10:26
  5. Lossless Audio Codec
    By encode in forum Forum Archive
    Replies: 8
    Last Post: 1st August 2007, 19:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •