Results 1 to 20 of 20

Thread: comparing file transfer speed with ccm and lzma compression

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

    comparing file transfer speed with ccm and lzma compression

    Suppose that we have two codecs, LZMA with 50MB/s decoding speed,
    and CCM with 5MB/s decoding speed, and have to choose one of them
    for data transfer via a channel with 300KB/s speed.

    Also lets define some compression ratios, via enwik8:
    lzma: 0.24557177
    ccm: 0.22003958

    Now, suppose that decoding can be done in parallel with download, so
    block_decoding_time = max( blocksize/ratio/dec_speed, blocksize/down_speed )
    filetime = filesize*ratio/blocksize*block_decoding_time
    filetime = filesize*max( 1/dec_speed, ratio/down_speed )
    So for dec_speed>down_speed/ratio:
    filetime = filesize * ratio / down_speed

    Now lets compute enwik9 transfer time:
    Code:
    In[1]:= filetime[filesize_, ratio_, decspeed_, downspeed_] := filesize*Max[1/decspeed, ratio/downspeed]
    In[2]:= filetime[1000000000, 0.24557177, 50000000, 300000]
    Out[2]= 818.573
    In[3]:= filetime[1000000000, 0.22003958,  5000000, 300000]
    Out[3]= 733.465
    Question: are LZ codecs really that good?

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    You missed the point of LZ.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Interesting...I did various estimations in the past based on assumption that reading and decompression are not parallel - because that's how it usually works. I wonder why. I never thought about it but many systems could be much improved this way, yet somehow it doesn't happen. Complication? A lot of it could be handled by the OS.
    The only cases where it works like this that come to my mind are multimedia streaming and various hardware like tape drives. It's a good place for improvement for *nix package managers.

    Though to be fair we should note that low CPU usage has some value too.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @Piotr:

    Did I? Which compression algorithm would be commonly used for app updates, which are automatically downloaded over the net?

    @m^2:
    > yet somehow it doesn't happen

    For example, you can't exactly do that with LZMA SDK. You'd have to process 1MB blocks or so to get good speed with it,
    its not designed for handling of ~1500 byte packets. Same probably applies to zlib. In a sense, they're too speed-optimized,
    and there's no simple, but flexible version which could be used in a different environment.
    While any PPM/CM is inherently bytewise.

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I didn't mean that it doesn't happen with particular libraries. If there are no or few libraries capable of doing so, it's not the real cause but one of it's effects. My first guess that it's high level of complication still seems to me to be the most probable candidate.

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    http://encode.ru/threads/140-Nosso-C...ion-technology..

    NOS Ltd could use something little different.


    Anyway LZ has some good characteristics:
    1. When we use some preprocessors then LZMA isn't that far off from heavier formats.
    2. LZ has almost no memory overhead - when you download a 10 MiB file, then decompression only uses about 10 MiB RAM (plus maybe 1 MiB for some light statistics). CCM uses several MiBs even for small files.
    3. LZ use far less CPU time, but as you're assuming 300 kiB/s transfer it shouldn't be a real advantage here.

    You're assuming a rather fast CPU, imagine an user with a crappy netbook. CCM would be awfully slow on such thing. Also networks speeds are getting better, many people are able to download with speeds of few MiB/s.

  7. #7
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    In the original post you didn't really give a scenario but m^2 points to package managers.

    Most package manager systems for *nix don't even use delta updates. The inertia in the packaging world is depressing.

    They typically download then unpack, as that's the way they are built.

    Whilst they could unpack during download, an alternative direction they could be developed in is to use bittorrent or similar, where parallel download and unpack is not so easy.

    And also, adopting a heavyweight compressor would increase the packaging times, and the people doing the packaging are inpatient volunteers (and not big 'compile farms' as you might hope)

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @m^2:

    > it's not the real cause but one of it's effects.

    To be specific, it usually works like this

    - LZ is faster (though lzma compression is actually slower than many PPM/CM)
    - to make it even faster, it needs speed optimizations
    - speed optimizations imply working with large blocks (it reduces the API overhead)
    - parallel processing with large blocks is harder to setup when you receive
    the data in small packets (you need a large cache), also thanks to speed optimizations,
    its impossible to understand what codec actually does and modify it for different
    circumstances.

    > My first guess that it's high level of complication still seems to
    > me to be the most probable candidate.

    Not quite sure what you mean, but modern CMs are much simpler than LZs
    (which is only natural, because any LZ is actually LZ transform + CM).
    Also decompression during download is really simple, you don't even need
    threading there, because tcp is already implemented in OS threads anyway.
    So its just a matter of immediately passing the packet to decoder after
    receiving it - for CM there won't be any speed impact at all, and for lzma
    there would be a considerable overhead due to multiple wrappers and
    buffer scraping, but it would still work.
    I think its more a matter of high-level network APIs and such, also
    app modularity (you first implement file download, then add file decompression).

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @Piotr Tarsa:
    > NOS Ltd could use something little different.

    Do you believe that they really do any compression research?
    We may be unable to losslessly compress a picture decoded from jpeg,
    better than jpeg, but does it mean that jpeg is so great?

    > 1. When we use some preprocessors then LZMA isn't that far off from heavier formats.

    LZMA can't compress structured data well enough.
    So for preprocessors like mp3dump (or similar jpeg utils) it can't be used as postcoder.
    Also preprocessors that actually do improve lzma's compression usually help CMs much better.

    > 2. LZ has almost no memory overhead - when you download a 10 MiB
    > file, then decompression only uses about 10 MiB RAM (plus maybe 1
    > MiB for some light statistics). CCM uses several MiBs even for small files.

    Actually lzma's maximum model size is 6.3M.
    Thats likely enough for ppmd to compress text better.

    > 3. LZ use far less CPU time, but as you're assuming 300 kiB/s
    > transfer it shouldn't be a real advantage here.

    That's the point. Somehow 100MB/s codecs are most popular, while
    its not strange to see 3MB/s hdd reads on a notebook
    (or on any machine when another task actively accesses files on hdd).

    > You're assuming a rather fast CPU, imagine an user with a crappy
    > netbook. CCM would be awfully slow on such thing.

    Not really, it doesn't scale linearly.
    Basically, the RAM access speed didn't change so much, and
    that's the bottleneck for hashtable-based CM.
    So I'd expect CM to be relatively faster on slower machines.

    > Also networks speeds are getting better, many people are able to
    > download with speeds of few MiB/s.

    Yes, I can do that too - from torrents or microsoft.com.
    Did you see that - http://encode.ru/threads/43-FreeArc?...ll=1#post23651 ?
    Actually in many cases I'm getting capped at 30KB/s, like when downloading
    from ctxmodel.net (you can try with http://www.ctxmodel.net/sh_samples_1.rar)

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    @willvarfar:
    > In the original post you didn't really give a scenario but m^2 points to package managers.

    My scenario is app update, like what chrome does.
    Actually I already implemented it before, and with a separate lzma decoding pass too,
    because I only had original decoder then and didn't want to debug the mess.

    > Most package manager systems for *nix don't even use delta updates.
    > The inertia in the packaging world is depressing.

    Yeah, that's actually what I implied while talking about ELF recompression
    vs windows exe recompression in other thread.

    > Whilst they could unpack during download, an alternative direction
    > they could be developed in is to use bittorrent or similar, where
    > parallel download and unpack is not so easy.

    Blocks are large enough there though.

    > And also, adopting a heavyweight compressor would increase the
    > packaging times, and the people doing the packaging are inpatient
    > volunteers (and not big 'compile farms' as you might hope)

    lzma compression can be slower than ccm and ppmd.

  11. #11
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by willvarfar View Post
    Whilst they could unpack during download, an alternative direction they could be developed in is to use bittorrent or similar, where parallel download and unpack is not so easy.
    I've been thinking P2P package manager too.
    And in BT client can request which chunks do they want. There's BT TV, so it works.

    Quote Originally Posted by willvarfar View Post
    And also, adopting a heavyweight compressor would increase the packaging times, and the people doing the packaging are inpatient volunteers (and not big 'compile farms' as you might hope)
    Well, actually it favours symmetric compressors - they can give better strength with the same compression time.
    Quote Originally Posted by Shelwien View Post
    @m^2:

    > it's not the real cause but one of it's effects.

    To be specific, it usually works like this

    - LZ is faster (though lzma compression is actually slower than many PPM/CM)
    - to make it even faster, it needs speed optimizations
    - speed optimizations imply working with large blocks (it reduces the API overhead)
    - parallel processing with large blocks is harder to setup when you receive
    the data in small packets (you need a large cache), also thanks to speed optimizations,
    its impossible to understand what codec actually does and modify it for different
    circumstances.

    > My first guess that it's high level of complication still seems to
    > me to be the most probable candidate.

    Not quite sure what you mean, but modern CMs are much simpler than LZs
    (which is only natural, because any LZ is actually LZ transform + CM).
    Also decompression during download is really simple, you don't even need
    threading there, because tcp is already implemented in OS threads anyway.
    So its just a matter of immediately passing the packet to decoder after
    receiving it - for CM there won't any speed impact at all, and for lzma
    there would be a considerable overhead due to multiple wrappers and
    buffer scraping, but it would still work.
    I think its more a matter of high-level network APIs and such, also
    app modularity (you first implement file download, then add file decompression).
    Actually most of my thoughts went to something more generic - processing data that's not fully available; kinda extending memory mapped files from being backed by files to backed by anything, including a network stream (or a torrent ). Allowing things like starting not fully downloaded exes (and stopping them when they try to access not-yet-downloaded parts - just like MMF stops a program until it can read accessed page from disk). Such things could significantly improve performance in many scenarios.

  12. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Another big problem with CM or PPM is that those algorithms doesn't have a popular stable implementation. LZMA's format is fixed for years, zlib or bzip2 are even more stable. If you compare a 15 years old PPM to 15 years old LZ then its not that easy for PPM.

    Current PPMs or CMs aren't very suitable for archiving of precious data, for example Shkarin's PPMII variations had some bugs that could cause a loss of data. On the other hand, LZ formats are very easy to understand thus very easy to debug.


    Also I don't understand the API overhead for LZ thing. CMs usually uses big data structures and initializing them can take a lot of time. On the other hand decoding a small packet encoded with zlib incurs only a very small overhead for reconstructing Huffman codes.


    LZMA can't compress structured data well enough.
    So for preprocessors like mp3dump (or similar jpeg utils) it can't be used as postcoder.
    Also preprocessors that actually do improve lzma's compression usually help CMs much better.
    Once I've made a wav and bmp preprocessors for LZMA, they were very small and simple, yet they succesfully competed with RAR's ones. With mp3's you won't achieve much additional compression anyway. 10 % savings doesn't impress a lot of people.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Another big problem with CM or PPM is that those algorithms doesn't
    > have a popular stable implementation.

    Its not a problem of algorithms.

    > LZMA's format is fixed for years,

    In that sense, ppmd vH is likely older.
    and rar format wasn't changed since 2.9, so its likely stable.

    > Current PPMs or CMs aren't very suitable for archiving of precious
    > data, for example Shkarin's PPMII variations had some bugs that
    > could cause a loss of data.

    PPMd is 10 year old now - vH date is "Apr 21, 2001".
    Yes, its really pretty complicated, but that's mostly due to speed
    optimizations.
    Paq-like CMs are much simpler.

    > On the other hand, LZ formats are very easy to understand thus
    > very easy to debug.

    I don't see how this applies to deflate or lzma.
    Any LZ is more complex by definition - it has to encode at least 4 types
    of symbols (id,literal,length,distance), though actually in lzma there're _15_
    types (including masked/unmasked literals, length intervals, and distance subdivision).
    Also sometimes there're block headers, which require additional coding, and
    its normal to have separate implementations of encoder and decoder.
    Compare that to <10k of ccm source (well, ccm_sh, w/o filters), which includes
    both encoding and decoding.

    > Also I don't understand the API overhead for LZ thing.

    I meant specifically lzma. There're a few layers of wrapper
    functions over the one that actually does decoding, for buffer management and such.
    Also there's a tricky optimization. It runs the main decoder
    until ~20 bytes left in the buffer, then switches to "debug" decoder
    (which has all the bound checks on all memory operations) and scrapes
    the remaining LZ records. Imho that thing is not exactly fast...
    I'd expect lzma decoding to become at least 2x slower with small buffers.

    > CMs usually uses big data structures and initializing them can take
    > a lot of time.

    Yes, that can be done before actual download though (eg. while connecting).
    Also as I said, lzma has a 6M model which is comparable to a CM/PPM.

    But instead, CM coder structure is very simple. Potentially there're no
    branches at all - just a bit processing loop.
    Thus with CM, its very easy to return an error when there're no more input bytes,
    and resume when they appear.
    While LZ has a lot of branches and symbol types, so its relatively hard to
    handle end-of-buffer correctly.
    Which could be one of the main reasons for blockwise structure of LZ-compressed files.

    > On the other hand decoding a small packet encoded with zlib
    > incurs only a very small overhead for reconstructing Huffman codes.

    The only benefit of LZ coders is speed
    (otherwise they're more complex and their compression is worse).
    But if we'd try immediately decoding any small packet of received data,
    LZ becomes much slower because of lots of additional checks.
    And if we'd cache the input instead, and only decode some larger blocks
    (which is how its normally done), we'd still have a delay until first
    buffer is filled.
    Either way, the CM's advantage at the (relatively) slow channel would
    be probably even more than I calculated.

    > Once I've made a wav and bmp preprocessors for LZMA, they were very
    > small and simple, yet they succesfully competed with RAR's ones.

    Rar is a bad target in that sense - almost anything is better.
    Why don't you instead compare to paq or mma or nz, which actually have CM models?

    > With mp3's you won't achieve much additional compression anyway.
    > 10 % savings doesn't impress a lot of people.

    For example, I just took a random song:
    Code:
     mp3 11165961
     7z   8997095
     rar  8959604
     zip  8942098
     mpz  8013256
    And seems you're right, comparing to rar/7z its about 10% gain.
    But imho its still noticeable, at least when looking at file size
    instead of percentage.

    Anyway, are you implying that we should close the forum and go get a life? :)
    Life is overrated :)

  14. #14
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Quote Originally Posted by Shelwien View Post
    @willvarfar:
    > In the original post you didn't really give a scenario but m^2 points to package managers.

    My scenario is app update, like what chrome does.
    So courgette-style disassembly and bsdiff?

    I've often wished there was a 'structured data' lib that knew how to divide up lots of regular formats for compressors to use. That'd be where exe transforms and other things belong.

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    >> My scenario is app update, like what chrome does.

    > So courgette-style disassembly and bsdiff?

    No, as you could see in http://encode.ru/threads/582-Executa...ration-methods
    in my case available disasm filters (courgette,dispack,disasm32) did worse than
    modified E8.

    > I've often wished there was a 'structured data' lib that knew how to
    > divide up lots of regular formats for compressors to use.

    For uncompressed table-of-struct formats (like mp3dump output) its somewhat possible,
    although never perfect - for example, there're a few fields in mp3 header which define
    the huffman tables used for spectral coef storage. Usually its not necessary to actually
    encode these, because their values are these that provide the best coef compression.
    But its clearly impossible to find that kind of dependency w/o additional information.

    Then there're more difficult structures - ones that contain pointers (or sizes of
    variable-size records). Imho its impossible to automatically analyze something like
    COFF structure.

    And then there're compressed structures, where automated analysis is the same
    as automatically breaking unknown encryption.

    > That'd be where exe transforms and other things belong.

    Not really. As we can see from dispack tests, its not enough to transform the code
    into some uniform structure. Its also important to ensure pattern matching
    (like for copies of the same function inlined with different registers), efficient
    coding of various data types ("universal" compressors only support byte strings,
    but there're integers, pointers and floats in the code) etc.
    So imho a really good exe compressor would be not so different from a decompiler,
    which is clearly different from general structure analysis.

  16. #16
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Yes, the 'structured data' lib I wished was one where a human had gone through and written the spec for mainstream encountered file formats.

  17. #17
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Shelwien:
    Huh, I gave you possible reasons why LZ is the most popular algorithm. You're best at complaining about popular formats and at the same time you're very experienced yet you didn't make a complete compression suite - or did you?

    Take a look at FreeArc. It's increasingly popular, warez releases are often compressed with FreeArc, Precomp etc You can improve it freely and it does use BWT, PPM and filters.


    If you study what average user downloads then it's mostly:
    - textual files from Internet: HTML, JS, CSS, whatever. They're mostly gzipped. You can win a lot of saving here almost freely - just use for example PPMS.
    - mp3, mp4, flv, wmv, avi, whatever. Compressing them gives at most 10 - 15 % gain and isn't that easy for end user. Those formats are very fast to seek, for example if you want to go to moment 1:56 of video you can go there instantly. If you recompress the file with strong entropy coder then you can lose the ability for fast seeking. Also you would have then to write plugins for various systems and distribute also data with standard compression methods so people can use hardware decoding.
    - installers - usually compressed with LZMA. Due to very different contents of various programs a single CM algorithm probably won't suffice or won't be fast enough.
    - linux packages - Ubuntu ones are packed with gzip. Considering that there's a few megabytes of updates daily for each user then Canonical would save terabytes of bandwidth daily. But maybe there are complex toolchains that aren't very flexible and cannot accept any other compression format.


    BTW, work on PAQ almost stopped lately. Maybe you, Shelwien, will continue it?
    Insert some of your counters magic and maybe there will be 0.5% gain in compression.

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > the spec for mainstream encountered file formats.

    I wonder how much it would cost to do that officially - http://www.iso.org/iso/iso_catalogue...csnumber=53943

    It could be good to at least get a usable binary parser generator to write these specs.
    Unfortunately I still didn't get anywhere near the point of "convergence" - the parsers I'd written so far don't have much in common,
    or I could at least try to generalize it somehow.

    Also it'd be really helpful to get a working tool for C++ refactoring, which could do stuff like merging class method definition with its
    declaration, discarding forward declarations, wrapping global vars/functions into a class, etc.

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Huh, I gave you possible reasons why LZ is the most popular algorithm.

    Yes, but of these only speed reason is really true - and I explained why.
    Also in the first post I tried to show how it can be wrong to compare
    CM vs LZ speed out of context.

    > You're best at complaining about popular formats and at the same
    > time you're very experienced yet you didn't make a complete
    > compression suite - or did you?

    Unfortunately my previous jobs didn't have much to do with general-purpose compression,
    and overall I'm not very productive.
    Also there're no people willing to discuss technical details,
    and that's bad for my motivation.
    Anyway, maybe I'd make some progress this year, because now I'm at least getting paid for that.

    > Take a look at FreeArc. It's increasingly popular, warez releases
    > are often compressed with FreeArc, Precomp etc You can improve it
    > freely and it does use BWT, PPM and filters.

    Sure its possible. Its mostly a question of Bulat's communication skills.

    Personally I'm not interested in any new archivers, especially if they're
    less convenient to use than rar.

    > If you study what average user downloads then it's mostly:
    > - textual files from Internet: HTML, JS, CSS, whatever.

    I have different stats, where 3 most popular formats processed by an archiver are jpg,mp3,pdf.

    > They're mostly gzipped.

    Do you mean http encoding? Unfortunately that hardly means anything,
    because at best we can make a proxy service with good compression.

    > You can win a lot of saving here almost freely - just use for example PPMS.

    1. The main problem with ppms is that its not a library, and hard to refactor into one.
    I already did that with ppmd though.
    2. In such cases it makes sense to find the best solution first, because its hard to change formats.
    So at least html preprocessing etc have to be considered.
    3. Its boring to use available libraries.

    > - mp3, mp4, flv, wmv, avi, whatever. Compressing them gives at most
    > 10 - 15 % gain and isn't that easy for end user.

    There's no difference whether I compress the file with lzma or mp3zip.
    In fact lzma has much more options.

    Also I still don't understand whether you mean absolute gain, or
    relative to deflate/lzma.
    Anyway, there's an important difference that a recompression
    format developed for some filetype can provide support for
    common usage of that filetype, while general-purpose formats
    don't have that option.

    > Those formats are very fast to seek, for example if you want to go
    > to moment 1:56 of video you can go there instantly. If you
    > recompress the file with strong entropy coder then you can lose the
    > ability for fast seeking.

    Its actually possible to provide seek support, while still keeping
    most of compression improvement. We did make that mistake with
    soundslimmer format though, because the task spec said that it
    was for archiving.
    Later solution was to use blockwise compression (with like 10s blocks),
    and it seems to work ok now - in a different codec though.

    > Also you would have then to write plugins for various systems and
    > distribute also data with standard compression methods so people can
    > use hardware decoding.

    Plugins are ok, windows players are mostly covered by a single directshow filter.
    Hardware decoding is hard though. I did write an iphone player for my codec,
    but still don't know the real constraints for hardware support.

    > - installers - usually compressed with LZMA. Due to very different
    > contents of various programs a single CM algorithm probably won't
    > suffice or won't be fast enough.

    Currently their compression can be improved with a single lzmarec though
    Anyway, its usually much easier to compete with LZ on structured data,
    because CM doesn't really lose in speed there.

    > - linux packages - Ubuntu ones are packed with gzip. Considering
    > that there's a few megabytes of updates daily for each user then
    > Canonical would save terabytes of bandwidth daily. But maybe there
    > are complex toolchains that aren't very flexible and cannot accept
    > any other compression format.

    Its not so complex to implement probably, but there'd be a lot of
    testing to do, and still a lot of issues to fix.
    At that scale, its not a matter of programming or algorithm's properties at all.

    > BTW, work on PAQ almost stopped lately. Maybe you, Shelwien, will continue it?

    I won't, because paq is what killed most CM/PPM-related research.
    Its good, but its a dead end.
    Although I certainly can improve it at least by replacing the rc,
    optimizing the fsm and some parameters, improving precision at some points,
    tweaking contexts, etc.
    But how is that interesting?

    > Insert some of your counters magic and maybe there will be 0.5% gain in compression.

    And who needs that?

    For now I have to write this stronger-than-lzma LZ77 codec, based on lzmarec backend,
    also there's an idea for new fast CM (or PPM?) based on
    http://encode.ru/threads/1173-New-da...for-bitwise-CM

  20. #20
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Its worth adding a bit of detail about linux package managers and delta compression:

    Red hat has been doing delta RPMs for years, and Fedora has been using it.

    Debian has a delta .deb format and there are enthusiasts running delta repos for years.

    But Ubuntu doesn't use it, even though it keeps repeatedly being claimed to be the biggest 'good idea' in the history of it all.

    Ubuntu will probably start doing delta .debs about the same time as others start with P2P hybrid downloads....

    (These delta formats are workable rather than squeezing the last drop out of redundancy. Plenty of room for more innovation in patching and partial updates.)

Similar Threads

  1. compression speed VS decomp speed: which is more important?
    By Lone_Wolf236 in forum Data Compression
    Replies: 14
    Last Post: 12th July 2010, 19:57
  2. Compression and speed
    By Wladmir in forum Data Compression
    Replies: 4
    Last Post: 25th April 2010, 12:15
  3. GCC 4.4 and compression speed
    By Hahobas in forum Data Compression
    Replies: 14
    Last Post: 5th March 2009, 18:31
  4. Data Compression Book with LZMA description [!]
    By encode in forum Forum Archive
    Replies: 11
    Last Post: 5th April 2008, 19:33
  5. CCM file compressor
    By LovePimple in forum Forum Archive
    Replies: 54
    Last Post: 22nd February 2007, 02:13

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •