Results 1 to 19 of 19

Thread: Framework for a basic command-line compressor

  1. #1
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts

    Framework for a basic command-line compressor

    I've been thinking for a while that it would be useful to have a framework for a basic command-line compressor. This would both reduce the effort needed to write a new one, and would also help improve the overall quality by providing high-quality standard implementations of repetitive stuff, like I/O and command-line arguments. The only framework I'm intimately familiar with is Drupal (web CMS), which is in PHP. I don't know if there are examples of frameworks that target native code, but it doesn't seem hard to create a system for plugging custom code into a C project.

  2. #2
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Depending on how much you want the "framework" to handle, you're basically describing either Squash (compression-specific, but handles everything except the core compression stuff), or glib and gio (general-purpose, so although they provide tons of helpers you'll still need to write more code, though it abstracts platform-specific bits away pretty well).

    For Squash, there is a section in the plugin guide explaining what you get from writing a plugin.

  3. #3
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by nemequ View Post
    Depending on how much you want the "framework" to handle, you're basically describing either Squash (compression-specific, but handles everything except the core compression stuff), or glib and gio (general-purpose, so although they provide tons of helpers you'll still need to write more code, though it abstracts platform-specific bits away pretty well).

    For Squash, there is a section in the plugin guide explaining what you get from writing a plugin.
    I'll have to take a look at Squash. My desire for a framework came from the observation that gzip has a perfectly good command line interface, and it would be nice if all compressors followed the conventions in that interface wherever possible.

    There could also be default implementations of things like standard models and entropy coders and you could replace one single piece of the chain and have a working compressor.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    I've moved that into separate topic since this is a large topic by itself. I also work on "standard compression API" topic, it will be created today.

    That said, freearc was created back in 2004 with almost exactly this goal in mind. Moreover, my idea was to go outside of "simple compressor" and provide entire archiver, with commands/options compatible to RAR, so that anyone can drop in new methods he developed, and get a complete solution. But this lead to development of large archiver, which i now plan to sell, rather than share.

    Later, i started fazip as simple single-file compressor that can use any of my compression methods. So you can fetch fazip sources, inherit the COMPRESSION_METHOD class and implement your own method. F.e. just a few days ago i added lz5 support by copying lz4 sources and modfying it here and there. Unfortunately, fazip doesn't support variety of gzip options, and not really portable outside of posix/windows world.

    Squash command-line tool is very similar to fazip - it has limited options, but provides access to dozens of squash methods, and you just need to implement the Squash API in order to be included in this utility. Unfortunately, it's Unix-only and seems not support method parameters in cmdline (such as lzma:d1g:fb128 in fazip).



    So, i think it's two parts - 1) for cmdline itself, we can adopt gzip sources. Moreover, anyone who need a framework rather than compiled tool, already can use gzip sources in this role

    The selling point, though, is that we need 2) standard API for compression libraries, so that compressors implementing this API can be used with our frameowrk automatically. I will make a separate topic to discuss APIs themself, so here we can discuss choosing API for this project or even implemention of support for multiple APIs

    I think that it is how this project should be implemented - use gzip as starting point and modify it to adopt compression libraries with some standard API

    The same may be done for "standard archiver" i.e. use zip/shar sources and add support for some compression APIs.
    Last edited by Bulat Ziganshin; 14th February 2017 at 12:50.

  5. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    nburns (14th February 2017)

  6. #5
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Squash command-line tool is very similar to fazip - it has limited options, but provides access to dozens of squash methods, and you just need to implement the Squash API in order to be included in this utility. Unfortunately, it's Unix-only
    No it isn't. 0.7 didn't work on Windows, but 0.8 (unreleased, but quite usable) does. It also works on Linux, OS X, BSD, Solaris, and probably others.

    [QUOTE=Bulat Ziganshin;51806]and seems not support method parameters in cmdline (such as lzma:d1g:fb128 in fazip).

    Yes they are. Plugins can support whatever options they want; I'm not sure what the ones you provided above mean, but the currently supported options for lzma are level, dict-size, lc, lp, and pb on both the encoder and decoder, plus check on the encoder and mem-limit on the decoder. See https://quixdb.github.io/squash/api/c/md_plugins_lzma_lzma.html

    For example:

    Code:
    squash -k -o level=3 -o dict-size=8388608 foo foo.lzma
    There is actually a fairly expressive API which plugins can implement to accept different parameters and types. For example, LZMA's is in plugins/lzma/squash-lzma.c. There are several different potential types (dict-size is a size, level is an int, strings are also an option), you can require sizes and ints be a multiple of some value, fall within a specific range, allow zero (if it's outside the range), set the default value, etc. It's not perfect, but it takes a lot of the parsing load off of plugins.

    The selling point, though, is that we need 2) standard API for compression libraries, so that compressors implementing this API can be used with our frameowrk automatically. I will make a separate topic to discuss APIs themself, so here we can discuss choosing API for this project or even implemention of support for multiple APIs
    https://github.com/quixdb/squash/blob/master/squash/squash-codec.h#L63

    I do have one more planned addition to that, but for the most part that should be pretty stable.

    The difficult part is adding all the stuff on top to present that to the user nicely. You might want to take a look at https://github.com/quixdb/squash/blo...s/internals.md

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    bugreport: lzma plugin code doesn't use the SQUASH_LZMA_OPT_DICT_SIZE

    I'm not sure what the ones you provided above mean
    these are parameter names as supported by 7-zip lzma method itself: https://sevenzip.osdn.jp/chm/cmdline...ethod.htm#LZMA May be these options aren't exposed by the xz code you are using?


    So, now squash supports windows and i believe this by itself deserves a new release. You have a complex parameter parsing API plus converter between buffer/stream/ReadWrites-style APIs. This makes squash a best candidate for the framework nburns desired.

  8. The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

    nburns (14th February 2017)

  9. #7
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by nburns View Post
    There could also be default implementations of things like standard models and entropy coders and you could replace one single piece of the chain and have a working compressor.
    You might want to take a look at issue #165, which describes something similar.

    Quote Originally Posted by Bulat Ziganshin View Post
    bugreport: lzma plugin code doesn't use the SQUASH_LZMA_OPT_DICT_SIZE
    Oops. Fixed, thanks. I also added an "mf" parameter.

    these are parameter names as supported by 7-zip lzma method itself: https://sevenzip.osdn.jp/chm/cmdline...ethod.htm#LZMA May be these options aren't exposed by the xz code you are using?

    Ah, so d1g → dict-size=1g, and fp128 → fp=128. So `squash -k -o dict-size=1g -o fp=128 foo foo.lzma`.

    Quote Originally Posted by Bulat Ziganshin View Post
    So, now squash supports windows and i believe this by itself deserves a new release. You have a complex parameter parsing API plus converter between buffer/stream/ReadWrites-style APIs. This makes squash a best candidate for the framework nburns desired.
    You're right, Squash is *long* overdue for a release. There are a few changes I'd really like to get in place before that happens, though, since they alter how some people will want to use the API. If anyone wants to help speed things up, I could really use it. Squash's issue tracker has tons of open items, and I'd be happy to help anyone understand what would be necessary to fix them.

    Another advantage that hasn't been mentioned on this thread is bindings for other languages. Instead of writing (at least) 1 binding per language *per codec*, people can just write bindings for each language for Squash, and all plugins will magically get bindings for that language. Right now there are really only Vala bindings, but someone working on Rust and I've just been informed that someone else is interested in working on Python. FWIW, I'm not sure how many people around here are looking to get started working with C, but IMHO this is an excellent way to do so.

  10. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it's fb (fast bytes), not fp. and it's not supported by squash. please carefully compare https://sevenzip.osdn.jp/chm/cmdline...ethod.htm#LZMA to the list of parameters you support

  11. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by nemequ View Post
    You might want to take a look at issue #165, which describes something similar.
    I was thinking of gstreamer as I was reading that and then I saw you mention it.

    You are loading plugins into squash and composing pipelines at runtime. If squash became the kind of framework I imagined, then there would no longer be a program called 'squash'. What I imagined is like this:

    I want to create a compressor called newzip that implements my insane idea of the moment. I download the framework into my source tree and edit a config file to set the program name as 'newzip'. Then I use the extension facility to override the default *whatever* with my insane version. I invoke the build script and it outputs an executable file called 'newzip'.

    Doing things this way avoids things like complex command line options, because one program doesn't have to support every use case.

  12. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    and you already can do that with zip, gzip, fazip or squash

  13. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    and you already can do that with zip, gzip, fazip or squash
    You can do that, but it isn't convenient. The difference between a piece of software and a framework (the way I'm using the term) is that the framework has explicit facilities for inserting custom code in specific places. You could call them hooks.

    Since the modifications get added before compile time, they could be inlined, and so they don't cost anything. You could have hooks in inner loops than modify one bit.

    lex and yacc are sort of in this category, but something like that would probably be overkill.

  14. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I haven't tried programming zpaq. Matt would probably argue that zpaq is already good enough.

  15. #13
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    A good start would just be to take a well-written and simple compressor and document it and make it easy to modify. Something on the scale of tangelo.

    It could evolve into a big, fancy project, or not, depending on interest.

  16. #14
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by nburns View Post
    You can do that, but it isn't convenient. The difference between a piece of software and a framework (the way I'm using the term) is that the framework has explicit facilities for inserting custom code in specific places. You could call them hooks.
    You mean like the members of SquashCodecImpl, which plugins insert callbacks into in order to glue the compression library to Squash?

    I think what you're really after is something like Squash (the library, not the CLI), except with a single hard-coded plugin (for your library). Once that's done it would be trivial to generate a CLI for a single plugin, with everything linked together statically. It wouldn't be all that hard to create something like that using Squash, and I'd be willing to accept patches to make it easier.

    That said, I think focusing on the CLI is a mistake… IMHO one of the biggest problems with a lot of compression codecs are that people tend to limit the APIs too much because they have a specific use case (the CLI) in mind, and they just want to write the compression code and be done with it. Squash gives you a few different options for which API(s) you want to provide, then uses that to present easy to use to use APIs to people who want to embed compression in their software, with 0 cost for adding new codecs.

    Most uses of compression aren't people firing up a command line and backing up some files, they are through integration in other tools. A CLI is often more useful as a demo (of the least imaginative use case) than something people to actually use.

    Squash is intended to be a lot more than a command line tool. Really, squash(1) is just a demo, as is the benchmark. The really cool potential uses are people using libsquash in their own applications, and Squash is designed in such a way that it's extremely easy to swap one codec out for another… so easy, in fact, that I expect a lot of applications will just make it a key in an ini file, or a registry setting, or even a menu in their application, etc. If we can get to that point, getting people to use your codec in the real world is *much* easier, since you don't have to convince developers that your codec is worth supporting.

    For example, my original use case for Squash was database compression (hence the "quixdb" project on GitHub). Imagine just being able to change one value, and suddenly a DB is compressed with a codec you just wrote. If nothing else, think about the opportunities for optimization… instead of writing a synthetic benchmark to try to convince a developer write some codec just to to try your codec, you just tell their project to use your codec and get real-world performance data to help optimize your codec. If you're outperforming other codecs, why wouldn't they change their default codec to yours? After all, it's only a matter of changing one little string. Now think about how slowly people are migrating to brotli and zstd.

  17. #15
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    @nemequ It's a big world, and there are many different use-cases. Algorithms have lifecycles from early experimentation to production-ready. A lot of what you're talking about are production-ready use-cases. I was envisioning something more at the reasearch end of the lifecycle, for tinkerers. You can't really do both in one tool.

  18. #16
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Having said that, Squash has probably solved problems that are highly relevant. I would definitely examine Squash if I was thinking of creating the hypothetical framework to avoid reinventing the wheel. I have no plans to create a framework as of now.

  19. #17
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by nburns View Post
    @nemequ It's a big world, and there are many different use-cases. Algorithms have lifecycles from early experimentation to production-ready. A lot of what you're talking about are production-ready use-cases. I was envisioning something more at the reasearch end of the lifecycle, for tinkerers. You can't really do both in one tool.
    Now I'm confused. Researchers and tinkerers are exactly the group I would think would be interested in something like that. All they have to do is provide a single, simple API (there are even three to choose from). In return they get, for no additional cost

    * A CLI (squash)
    * Additional APIs — for example, there is a stdio-like API (think gzread/gzwrite), a splicing API (to compress one FILE to another), etc.
    * Bindings for any language Squash supports
    * Integration with stuff which uses Squash (like the benchmark), directly or not

    One of Squash's major goals is to let people provide some compression code and have everything that isn't compression for them so they can focus on just the compression. I would think that would be exactly what researchers and tinkerers would want.

    Also, wouldn't in be valuable to have access to data about how your codec performs in those production use cases under real-world conditions? And about how other codecs compare under those exact same conditions? What better way to get that than to simply swap in whatever codec you want information on?

    If there is something Squash isn't doing, or something that should be easier, that's exactly the sort of thing I'd like to know about so I can fix it…
    Last edited by nemequ; 15th February 2017 at 05:16. Reason: add link

  20. The Following 2 Users Say Thank You to nemequ For This Useful Post:

    Bulat Ziganshin (15th February 2017),nburns (15th February 2017)

  21. #18
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by nburns View Post
    You can't really do both in one tool.
    Why?

  22. #19
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by m^2 View Post
    Why?
    I envisioned something small and minimalistic.

    I'm going to stop responding to this thread. I already said everything I wanted to say.

Similar Threads

  1. Independent command line interface
    By FatBit in forum Data Compression
    Replies: 11
    Last Post: 18th April 2016, 12:17
  2. Help!!! Need Recompression Command line for Precomp 0.4.3
    By Manjunath in forum Data Compression
    Replies: 9
    Last Post: 13th April 2014, 05:43
  3. Command Line Process Profiling Tool
    By david_werecat in forum Download Area
    Replies: 38
    Last Post: 30th December 2013, 18:12
  4. command-line calculator for Windows?
    By Alexander Rhatushnyak in forum The Off-Topic Lounge
    Replies: 5
    Last Post: 30th January 2012, 00:40
  5. DARK - a new BWT-based command-line archiver
    By encode in forum Forum Archive
    Replies: 138
    Last Post: 23rd September 2006, 21:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •