Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • schnaader's Avatar
    Today, 10:09
    However, comparing WebP to FLIF on "normal" images, compression ratio is often worse, so I'll keep following the pre-processing trail for FLIF and look at the other intra-frame alternatives. Hints on how to get better ratios with WebP lossless are welcome.
    6 replies | 162 view(s)
  • Shelwien's Avatar
    Today, 10:07
    1. v2f_static is probably not the right one. We discussed it multiple times here, I think the one that matched your entropy estimation was the one with decrements. I originally posted it in "freqtable_v0.rar" archive. 2. It doesn't remember anything "internally". There's an explicit frequency table, stored it a separate file by the console coder. 3. Read the .bat files. Yes, with normal sh_v2f that would work, while for static/frq the syntax is: sh_v2f_frq c SAMPLE1.bin SAMPLE1.ari SAMPLE1.frq sh_v2f_frq d SAMPLE1.ari SAMPLE1.rst SAMPLE1.frq 4. Yes, the frequency table is not included in compressed data and is passed around as is instead. But that's what matches your entropy estimation.
    95 replies | 8588 view(s)
  • schnaader's Avatar
    Today, 09:07
    WebP lossless does very well, this is what I wanted: 105.683 gen.png (generated, 64*64 pattern) 12.980 gen.webp_lossless_m_1_q100 12.750 gen.webp_lossless_m_6_q100 12.288 theoretical optimum (64*64*3) 210.768 glitch.webp_lossless_m_5_q_100 (time: 2 s) 182.418 glitch.png 133.693 glitch.pcf (time: 2 s) 119.674 glitch.webp_lossless_m_6_q_100 (time: 7 s) The minor downside that it needs method 6 (slowest) for the big spritesheet, but this is the expected time/size balance. Decoding is fast (~100 ms).
    6 replies | 162 view(s)
  • LawCounsels's Avatar
    Today, 01:50
    remote developer spent quite some time with sh_v2f_static encode / decode , but not thorough consistent decodes correct yet this c++ encoder produces 'pure' nearest to optimum combinatorics model entropy ( +1 bit over ) from SAMPLE.bin ( assumes requires to be in .bin , convertible using your utility ) , but when c++ activated again to decode from this 'pure' to reconstructs original input.bin presumably it remembers original multiplicities value internally ???? ( remote developer assumes attempts to initialse set these internal values all correctly as originally ! ) I assume thats how sh_v2f c++ encode/decodes works at present ( ie c++ encode then immediate decode when original multuplicities still recorded internally) ?
    95 replies | 8588 view(s)
  • moisesmcardona's Avatar
    Today, 01:01
    Probably use a domain catching service? Don't know if they work for .ru domains, but there's a change of getting the domain especially when backordering it from several services at the same time. You get charged only for the service that caught it. Used Snapnames.com to get a .com domain I also let expire and after a few years it was going to get dropped. I successfully got it back. Not sure if it may work in this case.
    11 replies | 262 view(s)
  • Shelwien's Avatar
    Today, 00:53
    Normally an arithmetic coder works with streams rather than strings or arrays. In some cases it might require additional data besides input stream - like the frequency table in static/decrement model. It is easy enough to port a coder to work with arrays too - although you'd normally pass address and length rather than a "single" array/vector/string object. But to work with actual strings (presumably of printable characters) coders would have to be specially modified... it might be easier to just use the "bit" utility for byte-to-bit conversion (I posted it before in 013.rar). For example, this coder: http://nishi.dreamhosters.com/u/marc_v2a.rar can be modified to write/read bitstrings by running it with "01" alphabet.
    95 replies | 8588 view(s)
  • LawCounsels's Avatar
    Today, 00:24
    I see now it's time for me to get to know hands-on a Littlemore.... With the C++ encoder producing 'pure' compressed bitstring.... Then can this compressed 'pure' bitstring be simply passed as parameter to C++ decoder, nothing more ( Original multiplicities etc?)?? Not likely, one needs pass on the original multiplicities parameters too. ( unless c++ remembers these multiplicities)
    95 replies | 8588 view(s)
  • Shelwien's Avatar
    Yesterday, 22:03
    Shelwien replied to a thread HBA in Data Compression
    BMP is not very compatible with BWT transform. BMP is kinda a 3D object (pixel structure X width X height), so bytewise BWT is not very compatible. You can at least split color planes to separate files. Or ideally, apply some specialized transform to 1D like this: https://encode.ru/threads/2702-Adaptive-Lossless-Prediction-impl-in-C
    15 replies | 1406 view(s)
  • encode's Avatar
    Yesterday, 21:50
    encode replied to a thread Status report in The Off-Topic Lounge
    Well, it was not exactly that I forgot to pay for the domain many years ago - I tried to change the domain provider that way. I didn't expect that the domain will be available through the auction only for the insane price... :_unknown2:
    11 replies | 262 view(s)
  • elit's Avatar
    Yesterday, 21:41
    elit replied to a thread HBA in Data Compression
    So i tried that binary coder(first link) to my BWT-only output. Silesia went to 46.4mb, which actually beat 7zip. On that 3.3mb exe it slightly beat bzip2 but not 7zip, and on bmp image is was surprisingly still weak at 126k. That said, it does look like ari encoder/model plays a lot in role and as you said, if I also used mtf it did hurt compression, unlike in my case. EDIT: and QLFC with my bwt output beat everything including bzip2 on bmp image, only on exe 7zip was better.
    15 replies | 1406 view(s)
  • Shelwien's Avatar
    Yesterday, 21:30
    Previous archives included perl scripts which I used to convert files, but okay, here's a C++ utility for the same thing, its easier than explaining perl. As to commandline syntax, open .bat files in texteditor and look how utilities are used. Commonly its "coder c input output" to compress a file, and "coder d input output" to decompress.
    95 replies | 8588 view(s)
  • Shelwien's Avatar
    Yesterday, 21:10
    Shelwien replied to a thread HBA in Data Compression
    Here're some reference sources: http://nishi.dreamhosters.com/u/rc_v5.rar binary adaptive arithmetic coder tuned for BWT data http://nishi.dreamhosters.com/u/BWT16.rar simple BWT/unBWT implementation for 8-bit and 16-bit alphabets. http://nishi.dreamhosters.com/u/gmtf_v1.rar MTF utility http://nishi.dreamhosters.com/u/bsc240_qlfc_v1.rar QLFC postcoder from bsc compressor In general: 1) Sorting with sub-byte alphabet is a bad idea for byte-aligned data (which is normal text, exes, 8bit+ images etc). It can be still good for other types of data (1-bit bitmap images). Its a bad idea because such sorting would mix up symbols from contexts with different alignment, and these would have different probability distributions. Basically the same reason as to why BWT compressors have problems with compression of sorted wordlists. 2) Compression-wise its best to directly encode BWT output with a bitwise arithmetic coder. MTF and other transforms actually hurt compression, but provide ways to improve coding speed. 3) LZ preprocessing is possible (it has to be LZ without entropy coding though, like LZ4 maybe), and can even be helpful to work around some common BWT issues with very redundant data, but like (2) its expected to hurt compression.
    15 replies | 1406 view(s)
  • LawCounsels's Avatar
    Yesterday, 21:02
    >> Update: I converted your file to binary and tested all 3 versions of sh_v2f coder (increment,static,decrement), all 3 seem to decode correctly. Attached Files 017.rar (29.7 KB, 0 views) can you post your text to binary converter please ... so far its remote developer testing , can you show me command to type to activate run your range coder encode and decode parameters format
    95 replies | 8588 view(s)
  • LawCounsels's Avatar
    Yesterday, 20:55
    total frequencies 26256 , frequencies now 0 to 14 ( not 0 to 15 as before , every symbols moved down 1 ) ie 11783 7936 3243 1658 794 409 213 116 48 33 12 7 2 1 1 attach SAMPLE2.rtf ( symbols 0 to 14 ) decoded different
    95 replies | 8588 view(s)
  • elit's Avatar
    Yesterday, 20:50
    elit replied to a thread HBA in Data Compression
    I would like to reveal ideas that got me into HBA, maybe it will be of use to someone else or perhaps I can get insight on where I got it wrong. But first of all, upon some research including on GPU's I decided to switch to integer arithmetic, I basically got working standard implementation of original concept (Witten ACM87) of arithmetic coder. I properly implemented decoders for BWT and arithmetic coder as well and these now work correctly, so my encoded sizes I later tested against my preset of files(including tarred silesia) I know now are correct now(still without headers+CRC as I have not yet bothered to implement full file format but that wont add too much to file size). With that being said, results were rather disappointing(to me, or my expectations at least) but I know its because I have not yet explored - especially arithmetic coder sufficiently. But lets talk about that original idea: (NOTE: before continuing it is very possible that any of these ideas already exist and were discovered before. I wouldn't know but pls keep it in mind. I got here on my own and did not "stole" anyone's idea, even if it exist already.) So, not counting dedup techniques like LZ, basically 2 biggest factors that dictate compression is message size and individual unit(aka byte) variation. In case of bytes, variation is as high as 256 states so to compress, say single byte, encoding complexity(for lack of better work) would be 1*256. So I thought one day, what if I break bytes to lower units of less variations? In theory, this should give me less complexity - even at the cost of increased message, but of course only if I can threat each "unit" as individual. So for example, if you break byte to base 16 and get something like say: FC F1 CF, that itself would not be very useful, but if I can make it say: F F F C C 1, now that would be something else. And of course, we do have a technique to sort and revert any symbols to original state, its called BWT. So what would be the complexity of single byte broken to 2 individual independent base16 units?: 2*16=32. Note this is significantly less than previous 1*256=256. We inflated original message(2*), but each individual unit have now only 16 stances. This(in my original theory) should give better sorting(because of less variations through message) thus lower entropy(which was true from my measurements). Message is bigger but we already know from tools like ztool or precomp that sometimes, increasing message size can be beneficial and yield lower size in the end. This was my general idea. Now what you do after this, whether you join them back to bytes(FF FC C1) and then apply MTF or do it already here + RLE+ari encoding without re-joining, its a matter of ideas. I tried all variations and as you will see later none gave any benefit over applying BWT+MTF+RLE+ARI directly to bytes. Anyway so after I had things working properly, I tried all combinations but was never able to reach even bzip2. For quick tests I used one 3.3mb file(exe). For reference, 7zip:mc32:d64m:lc4:32 was able to compress that file to 783kb(BCJ disabled). Bzip2:900kb to 995kb. With HBA, I was never able to go under 1.1mb and it was very close to 1.2mb in fact. If I worked with base 16, RLE was able to dedup anything 16-255, simply: if(lit != 0) output lit; else "count zeroes and if over 15 output up to 255 as a single byte". So I had all the room for me because of all free byte's free numbers because data were never over 15 unsigned integer(of uint8_t in fact). Of course, I tried things with and without RLE, MTF, even BWT, you name it. It was always worse, all 3 were useful and helped significantly to lower the size. Btw I also tried base4 and even base2, no better results. Then I tried to apply BWT+MTF+RLE+ARI directly to data(bytes, without breaking them further). To my surprise, I got best result of 1.1mb(1.106kb precisely). While I was happy, I was also disappointed because there goes my idea out of window. And still not even beating bzip2. Quickly to note, RLE with bytes work for obvious reason differently: if(lit != 0) output lit; else "count zeroes and: if < 2 just output 0; else output 0 + 0 + count - 2(count up to 257); So for every 2x0 there will be 3rd byte representing count, including 0 if was only 2. I found surprisingly this to give me best results and gain can be as much as 9% from RLE. So with all this implemented and buffer size same as bzip2: 900kb(for fair comparison), I got that file to 1.1mb as I said, which is still ~10% worse than bzip2(not even mentioning 7zip) and silesia I got to 57,936,443 = 55.2mb, where bzip: 51.7mb and 7zip: 47.2mb. However, silesia is actually hiding weakness of HBA. I tried also to compress 5.9mb bmp image, and while both 7zip and bzip2 got it to similar 117.6k and 113.2k, HBA did only to 151.2k, which is *shame* and this is where it should shine as even bzip2 could take over 7zip! That makes it ~26% worse than bzip2 and show clear weakness somewhere... Which bring me here. First if you read all this way, thank you for your time. Now, I think HBA weakness is arithmetic coder but I would like your opinion. After all tests I tried, it seems like BWT, MTF and even RLE are "good enough", but you tell me. First, ari encoder is standard original implementation working on bytes: https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf Probability model is also same, simply increment the symbol and half all if crossing max_frequency. However, I must say that I tried different computations, for example to increase by more than 1 for zeroes(because MTF effect), I tried starting probabilities not even and so on. Not only nothing I tried gave me better compression, it got worse in every single case even where I was sure it would benefit. One important thing, this simple model gave me same level of compression as computing precise ratio of each symbol by counting whole message block - aka non adaptive or static probability. Since then I read Mahoney's doc, I learned there are binary ari coders that encode bits(or few bits patterns) instead bytes. So I wanted to know, would this be any benefit in compression or its just for being universal(non-byte dependent)? Should I expect significant enough difference to implement bit(or 4 bit pattern?) ari encoder? Also, I am considering adding quick LZ pass on input before BWT, what you think? But then again, bzip2 and others don't have it and are able to compress better, so I don't think that's it. I wasted months to get here and feel stuck. I know this is my very first compressor ever and not long ago I had not even clue about all these algos, it is beating zip at least but my ego is still hurt :). I really expected to beat whole 7zip stack and bring us to new era of linear memory and GPU acceleration.
    15 replies | 1406 view(s)
  • Shelwien's Avatar
    Yesterday, 20:32
    Yes, its impossible to decode a symbol with 0 frequency - it shouldn't be encoded. Update: I converted your file to binary and tested all 3 versions of sh_v2f coder (increment,static,decrement), all 3 seem to decode correctly.
    95 replies | 8588 view(s)
  • LawCounsels's Avatar
    Yesterday, 20:31
    here is SAMPLE1.rtf , total frequencies 26256 and frequencies 0 to 15 of 0 11783 7936 3243 1658 794 409 213 116 48 33 12 7 2 1 1 shall the 0 symbol frequency made into 1 instead ?
    95 replies | 8588 view(s)
  • Shelwien's Avatar
    Yesterday, 20:22
    To get people to understand that they can't really compress random data. Many people don't understand logic or mathematics, but still make bold claims about data compression. So Mark Nelson proposed a technical method to verify these claims. His original message with the challenge was posted in 2002, and somehow nobody was able to demonstrate a working random compression scheme with it, although we get 2-3 such claims every year.
    7 replies | 195 view(s)
  • xinix's Avatar
    Yesterday, 20:17
    Because this is a compression contest.
    7 replies | 195 view(s)
  • xinix's Avatar
    Yesterday, 20:16
    Thank. Interesting. Do you have the opportunity to put this 847 byte file here? Only compressed file. I think we can squeeze it again.
    44 replies | 1449 view(s)
  • Shelwien's Avatar
    Yesterday, 20:11
    sh_v2f coder supports total_freq up to 2^31 =~ 2 billions. If you need more, there're also fp64 (2^40) and sh128 (2^56). 1) sum(freq)<max_total_freq 2) freq>0 for all i If these conditions are satisfied and sum of freqs is less than 2 billions, it could mean that there's a bug in the coder - I'd need your data to reproduce it. Otherwise you can try other coders with higher max_total_freq. Coding speed gets much slower with higher precision, so its not so easy. Plus, there're portability issues - for example, int128 type is only supported by compilers on 64-bit platforms, and float-point arithmetics can produce incompatible results on different platforms.
    95 replies | 8588 view(s)
  • Shelwien's Avatar
    Yesterday, 19:52
    The domain was originally owned by encode, but at some point he forgot to pay for it and it expired, similar to this time. webmaster bought it and hosts the forum for free since then. Since webmaster disappeared, this time we could wait until end of june and re-buy it, but it seemed better to get forum to work right away. If webmaster doesn't return, in a few weeks, we might have problems again, presuming his server can't work forever on its own.
    11 replies | 262 view(s)
  • Sportman's Avatar
    Yesterday, 19:47
    Sportman replied to a thread WinRAR in Data Compression
    Got this error 3 weeks ago:
    170 replies | 117505 view(s)
  • Obama's Avatar
    Yesterday, 19:46
    Why the guy want us compress and decompress this file ?
    7 replies | 195 view(s)
  • Shelwien's Avatar
    Yesterday, 19:42
    http://nishi.dreamhosters.com/u/nelson_dec2bin_v0.rar 1,440,000 digits.txt 1,000,000 digits1.txt 415,241 AMillionRandomDigits.bin digits.txt is the original rand.org file: https://www.rand.org/content/dam/rand/pubs/monograph_reports/MR1418/MR1418.digits.txt.zip digits1.txt is the stripped version without line numbers and spaces. AMillionRandomDigits.bin is the result of digits1.txt conversion to binary as a single long decimal number. This dec2bin conversion always seemed trivial, but yesterday somebody reminded me that we actually don't have an easy to use utility to do that, and in fact never even tried to verify the claim that digits.bin is a lossless representation of digits.txt. Well, now I made tools for digits->digits1->digits.bin conversion (both forward and backward) and verified that it works. The point of this is that compressing digits.bin as is is a futile project, while digits.txt has 50 bits of _known_ redundancy (column parity), so there can be more. Archive includes sources of both utilities and binaries both for windows and linux. Sources can be compiled on linux by running "sh g" in source directory. Unfortunately dec2bin utility only can be built for x64 atm, because of __int128 usage.
    0 replies | 51 view(s)
  • Obama's Avatar
    Yesterday, 19:39
    Done,just few hours,the result is 847 bytes.
    44 replies | 1449 view(s)
  • LawCounsels's Avatar
    Yesterday, 18:27
    It appears sh.v2f.cpp has finite arithmetics ( carries etc ) , tested not decode correct for very large input symbols with extreme probabilities Can you simply utilise BigInt / IntX to make work correct in all circumstance ?
    95 replies | 8588 view(s)
  • Sportman's Avatar
    Yesterday, 16:55
    Open binary files with a HEX editor like: https://www.hhdsoftware.com/free-hex-editor
    7 replies | 195 view(s)
  • CompressMaster's Avatar
    Yesterday, 16:17
    Why this domain isn´t owned by encode or Shelwien? Encode is the founder of this forum, so it should have FULL access to domain, hosting and CMS.
    11 replies | 262 view(s)
  • CompressMaster's Avatar
    Yesterday, 16:02
    It´s a binary file. They aren´t words. Try to open for example JPG file in notepad and you will see the same results - "garbage". This cannot be compressed at all in this mode, because if you alter even one character, saves the file, insert the same character into corresponding position, saves that, you wouldn´t be able to open it at all nor reconstruct it back losslessly. As for the particular file, try to compress directly only AMillionRandomDigits.txt
    7 replies | 195 view(s)
  • Obama's Avatar
    Yesterday, 13:50
    i mean the contain words is what?why like alien words?
    7 replies | 195 view(s)
  • Shelwien's Avatar
    Yesterday, 10:41
    @elit: free domain registration was never really an issue, unless there's also free php/mysql site hosting. Atm I don't see how to setup this forum to work forever without active maintenance, so disappearance of people doing that maintenance would always be a problem.
    11 replies | 262 view(s)
  • Mauro Vezzosi's Avatar
    Yesterday, 10:37
    Where did you find it? IMO, you are comparing an advanced LSTM implementation (NNCP) with a simpler version (lstm-compress). EDIT: Ops, sorry, I forgot that the results of lstm-compress can have the preprocessor! I'm retesting lstm-compress without preprocessor to compare with NNCP. Results for NNCP 2019-05-08 with about the same hyperparameters (= options) as lstm-compress. (1): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002 n_params=508k n_params_nie=232k mem=3.63MB (2): NNCP, cell=LSTM-C n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=4.000e-003 beta1=0.000000 beta2=0.999900 eps=1.000e-005 n_params=508k n_params_nie=232k mem=5.66MB (3): lstm-compress v3 2019/03/30, original version, without preprocessor. gradient_clipping=-/+2.0 n_layer=3 hidden_size=90 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=none lr=5.000e-002. (4): lstm-compress v3 2019/03/30 with cmix v17 2019/05/26 lstm*.* source files (they added adam optimizer), without preprocessor and vocabulary size fixed to 256 (it always output 256 predictions even if the file has fewer symbols, which disables an lstm-compress optimization trick). gradient_clipping=-/+2.0 n_layer=3 hidden_size=352 batch_size=10 time_steps=10 n_symb=256 ln=0 fc=0 sgd_opt=adam lr=5.000e-002 adam_lr=3.350e-003 adam_beta1=0.025000 adam_beta2=0.999900 adam_eps=1.000e-006 mem=62.412K. (1) (2) (3) | (4) 1.434.237 1.400.862 1.411.882 | 1.391.299 0.WAV 366.975 348.118 356.764 | 334.957 1.BMP 1.152.748 1.096.749 1.071.157 | A.TIF 1.104.139 1.039.702 1.038.606 | B.TGA 360.723 346.446 343.564 | C.TIF 333.875 317.526 313.724 | D.TGA 502.935 499.786 501.892 | E.TIF 111.953 111.539 111.831 | F.JPG 1.439.333 1.414.462 1.429.351 | G.EXE 691.345 653.221 647.638 | H.EXE 318.786 299.873 302.428 | I.EXE 44.865 44.707 44.364 | 44.187 J.EXE 4.381.606 4.220.113 4.298.921 | K.WAD 3.477.008 3.372.390 3.358.974 | L.PAK 126.592 115.555 120.772 | M.DBF 156.401 139.289 157.563 | N.ADX 10.824 8.945 10.190 | 8.569 O.APR 9.205 8.274 8.404 | P.FM3 405.454 367.267 359.346 | Q.WK3 46.719 42.745 46.377 | R.DOC 44.977 40.053 43.402 | S.DOC 31.331 28.316 30.812 | T.DOC 14.504 13.335 14.637 | U.DOC 31.727 28.482 30.584 | V.DOC 21.392 19.526 21.378 | 19.128 W.DOC 17.056 15.730 17.016 | 15.266 X.DOC 725 801 573 | 576 Y.CFG 358 382 235 | 237 Z.MSG 16.637.793 15.994.194 16.092.385 | Total
    54 replies | 3989 view(s)
  • xinix's Avatar
    Yesterday, 07:07
    xinix replied to a thread Status report in The Off-Topic Lounge
    Thank! _ But. The problem is not the date. The problem is that people disappear!
    11 replies | 262 view(s)
  • elit's Avatar
    Yesterday, 02:23
    elit replied to a thread Status report in The Off-Topic Lounge
    Shelwien, I don't know if it is viable considering long time established domain, but if something like encode.ru.eu.org would be ok for free, reliable and without spam or any shady practices, then I can recommend this: https://nic.eu.org/ I already run our NIC through them for years and never had a single issue. As long as encode is non-profit(as it is I believe) its great free alternative(unless that multi suffix is unacceptable). You wouldn't have to worry about expiry date anymore(among other things).
    11 replies | 262 view(s)
  • Shelwien's Avatar
    25th May 2019, 21:10
    I did make a clone - http://encode.dreamhosters.com But it has some issues - try opening any sticky thread. Maybe it just failed to import the whole database, maybe a difference in php version, dunno...
    11 replies | 262 view(s)
  • load's Avatar
    25th May 2019, 21:08
    load replied to a thread Status report in The Off-Topic Lounge
    thanks for bringing it back!
    11 replies | 262 view(s)
  • Shelwien's Avatar
    25th May 2019, 20:54
    > these sprite images are not that much of the whole data volume that humanity has. They are pretty frequent in games. Then, there're also mipmaps.
    6 replies | 162 view(s)
  • dado023's Avatar
    25th May 2019, 20:45
    dado023 replied to a thread FileOptimizer in Data Compression
    seems like new version is out: 13.80 - 2019/05/23 - Updated spanish translation (Thanks Edson Pacompía Ortiz). - Updated all custom built plugins to Visual C++ 2019: gifsicle, gifsicle-lossy, jpegoptim, jsmin, mp4v2 and sqlite. - Updated SQLite to 3.28.0 x86 and x64 Visual C++ 2019 custom builds. - Updated gifsicle to 1.92. - Removed gifsicle-lossy management because it is now integrated in gifsicle. - Updated to mutool 1.15. - Updated mp4v2 x86 and x64 to 4.1.0 Visual C++ 2019 custom builds. - Updated ImageMagick to 7.0.8.37 with HDRI support. - Updated pingo to 0.99 beta 32 x64 version. - Updated libdeflate to 1.2. - Updated Ghostscript to 9.27. - Updated Leanify to 0.4.3.231 daily binaries. - Updated EXE compatibility to PatchPE 1.31.
    638 replies | 181354 view(s)
  • Jyrki Alakuijala's Avatar
    25th May 2019, 20:38
    If they end up comparing the header size of different technologies, the study is worth nothing. We can look up the header size from the spec, no need to have experimental study for it. If they look for optimized header size techniques, the first version of WebP lossless was pretty good on that - it used six bytes for one pixel images.
    33 replies | 2753 view(s)
  • Jyrki Alakuijala's Avatar
    25th May 2019, 20:27
    How did WebP lossless do?
    6 replies | 162 view(s)
  • Jyrki Alakuijala's Avatar
    25th May 2019, 20:26
    Jpeg xl has some block matching. Block matching is not much better than line matching in practice. Also, these sprite images are not that much of the whole data volume that humanity has.
    6 replies | 162 view(s)
  • Jyrki Alakuijala's Avatar
    25th May 2019, 19:49
    They failed to benchmark in a useful way. Brotli is run with 4 MB window whereas zstd is run with 8 to 128 MB. If they use the same window length, brotli will compress 5 % better than zstd on an average corpora.
    50 replies | 6916 view(s)
  • Darek's Avatar
    25th May 2019, 19:06
    I've tested my testset with nncp to try to optimize options. It takes some time then I've tested batch size option, then main proposed options (full connect, layer normalisation, hidden size 512, number layers 5 and their combinations) then numner of layers. Scores in attached table. Test directions are from left to right part of table. Dark blue scores are the best scores for NNCP overall, light blue there are best scores for tested option. I've got 3.8% of gain (570KB) which is quite good. I've left to test hidden size (in progress) and learning rate options. Then I've tested RC_1 version with optimized settings and it adds another 0.6% of gain (100KB) to default original NNCP version. There are also a table with comparison with LSTM and default settings. NNCP beat LSTM for almost all files with good margin except 24bit uncompressed images when NNPC lost about 30% to LSTM... For two files especially NNCP got impressive results: D.TGA - third place overall, just behind cmix and paq8px (which have good parser for this file) and E.TIF - fourth place overall, just behind cmix, CMVe and paq8px (which also have good parser for this file) For C.TIF, G.EXE, H.EXE and Q.WK3 it got place in the first 10 - which are an also impressive results for compressor without any specialised models!
    54 replies | 3989 view(s)
  • Shelwien's Avatar
    25th May 2019, 18:49
    https://marknelson.us/assets/2012-10-09-the-random-compression-challenge-turns-ten/AMillionRandomDigits.bin works, just tested it.
    7 replies | 195 view(s)
  • Jarek's Avatar
    25th May 2019, 18:24
    This comparison clearly misses CRAM ( https://en.wikipedia.org/wiki/CRAM_(file_format) ), its winning NAF is from the same author, and seems to be preprocessing+zstd: http://kirill-kryukov.com/study/naf/ Regarding the texture compressor, I wonder how it compares with GST: http://gamma.cs.unc.edu/GST/
    50 replies | 6916 view(s)
  • Obama's Avatar
    25th May 2019, 18:11
    AMillionRandomDigits.bin from https://marknelson.us/posts/2012/10/09/the-random-compression-challenge-turns-ten.html , the contains is call what ? why can't read it?
    7 replies | 195 view(s)
  • Obama's Avatar
    25th May 2019, 17:53
    good question,never think before.I think i need a super computer.
    44 replies | 1449 view(s)
  • encode's Avatar
    25th May 2019, 15:50
    encode replied to a thread Status report in The Off-Topic Lounge
    Yep, money talks...
    11 replies | 262 view(s)
  • Shelwien's Avatar
    25th May 2019, 15:43
    Ok, with evil hacks and power of money, we made it work.
    11 replies | 262 view(s)
  • Shelwien's Avatar
    24th May 2019, 18:59
    1. encode.ru domain name is currently owned by "webmaster". 2. domain name expired on 2019-05-19 3. webmaster was last seen on 2019-04-19 on another forum, 15 here - https://encode.ru/members/554-webmaster 4. Adding "78.46.93.39 encode.ru" to hosts works 5. I attempted to modify phproxy to access encode.ru by ip, but failed since phproxy works with HTTP/1.0 and server requires 1.1+, otherwise returns 403. 6. Asked registrar (nic.ru) to pay for domain extension without change of ownership, but they don't allow this. 7. on 2019-06-19 it would become possible to buy the domain again, but there's auction and possible competition from squatters. 8. Managed to download forum scripts+attaches and mysql dump 9. Now trying to make a clone at another location
    11 replies | 262 view(s)
  • dnd's Avatar
    23rd May 2019, 01:27
    Google and Binomial Partner to Open-Source Basis Universal Texture Format The Basis Universal texture format is 6-8 times smaller than JPEG on the GPU... It creates compressed textures that work well in a variety of use cases - games, virtual & augmented reality, maps, photos, small-videos, and more!...
    50 replies | 6916 view(s)
  • Shelwien's Avatar
    22nd May 2019, 17:38
    Only video codecs have block matching unfortunately. It may be possible to port a I-frame compressor from AV1 or something. I guess you can also test "official" new codecs, like JPEG2000+, webp. Also maybe pik. I also had an idea for fast detection of 2D block matches - its possible to implement 2D anchor hashing (aka content-dependent chunking). Like, we can split horizontal and vertical lines of the image to 1D fragments, the use intersection points as block grid. Then it should be enough to look up / insert one hash per block.
    6 replies | 162 view(s)
  • dnd's Avatar
    22nd May 2019, 17:28
    Is it time to replace gzip? Comparison of modern compressors for molecular sequence databases
    50 replies | 6916 view(s)
  • Jarek's Avatar
    22nd May 2019, 11:18
    Oh, they were from 16th and 17th May ... and here is another from the Berkley team from 21st (yesterday) - also using flows: Compression with Flows via Local Bits-Back Coding, https://arxiv.org/pdf/1905.08500 3.318 on CIFAR10 3.882 on ImageNet32 3.703 on ImageNet64 much better, but I don't see available implementation.
    33 replies | 2753 view(s)
  • Jarek's Avatar
    21st May 2019, 14:37
    Another fresh paper - this time using reversible discrete transformations (+rANS): Integer Discrete Flows and Lossless Compression, https://arxiv.org/pdf/1905.07376 Claims a bit better: 3.34 on CIFAR10 4.18 on ImageNet32 3.90 on ImageNet64 but I don't see available implementation.
    33 replies | 2753 view(s)
  • schnaader's Avatar
    20th May 2019, 19:03
    When compressing images and especially game data, I often made the observation that some images compress worse using image compression algorithms (PNG and FLIF). Further analysing revealed that one of the cases when that happens is when the images are repetitive. Attached are two images, a real world example (source) and a generated image (random noise pattern 64*64, total size 512*512). Both PNG and FLIF perform bad here while general compression algorithms like LZMA or mixed ones (PAQ) can handle the situation. 786.486 gen.bmp (generated, 64*64 pattern) 300.366 gen.flif (-e) 253.216 gen.flif (-e -N) 230.139 gen.flif (-e -N -R5) 105.683 gen.png 13.647 gen.pcf (Precomp 0.4.7, uses LZMA) 12.775 gen.paq8p (-3) 12.288 theoretical optimum (64*64*3) 786.486 gen2.bmp (generated, 64*21 pattern) 142.530 gen2.flif (-e) 50.982 gen2.flif (-e -N) 8.791 gen2.png 5.400 gen2.pcf 4.428 gen2.paq8p (-3) 4.032 theoretical optimum (64*12*3) 226.338 glitch.flif (-e) 203.995 glitch.flif (-e -N) 182.418 glitch.png 133.693 glitch.pcf With the second test file (64*21 pattern) we can see that the 32K deflate window size limits PNG here, because in the 64*64 case, it takes 64 lines for a repetition and 512*64*3 = 98304 is out of the window, while in the 64*21 case, the pattern can be matched. So I just wanted to share these observations. Does somebody know a (lossless) image format that doesn't have this problem? Didn't try all the others (BPG, Gralic etc.) yet. Since I would prefer to use FLIF by default for image data in Precomp, this is a problem I'd like to solve, so a preprocessor that detects and replaces repetitive patterns in images would be the next step, I'll post further progress with this.
    6 replies | 162 view(s)
  • dnd's Avatar
    20th May 2019, 15:36
    Paper: Parallel decompression of gzip-compressed files and random access to DNA sequences Code: Pugz - Truly parallel gzip decompression
    50 replies | 6916 view(s)
  • Sportman's Avatar
    19th May 2019, 23:03
    If you take 100 different movies with 4GB file size each and convert them to characters you can compress and end with 10GB file size each and compress each 10GB file 900 times to 11MB file size each, how much is your dictionary increased after compressing that 100 movies?
    44 replies | 1449 view(s)
  • Obama's Avatar
    19th May 2019, 22:23
    increased with every input nope , you are right. Because i just split it and compress.That's why so fast complete.If normally need few days.
    44 replies | 1449 view(s)
  • Sportman's Avatar
    19th May 2019, 22:14
    Is "few GB" a static fixed dictionary or altered and/or increased with every input? How can it be a "Few day" while there was only 11:22 hour between the random text test file post and your answer or did you decompress it later?
    44 replies | 1449 view(s)
  • Obama's Avatar
    19th May 2019, 21:44
    letters,numbers and some symbols on keyboard only
    44 replies | 1449 view(s)
  • xinix's Avatar
    19th May 2019, 21:42
    Do not worry) He is about the text.
    44 replies | 1449 view(s)
  • compgt's Avatar
    19th May 2019, 21:35
    At first, he spoke of digits and letters, then he quoted 1,000,000 to 1028, bytes to bytes. But that's implied when someone tells about an algorithm that all or almost all input files of sizes that large "can be compressed to that very small output size."
    44 replies | 1449 view(s)
  • Obama's Avatar
    19th May 2019, 21:14
    yea,im not expert.
    44 replies | 1449 view(s)
  • xinix's Avatar
    19th May 2019, 21:10
    He did not talk about all the random files. He spoke only about texts in English.
    44 replies | 1449 view(s)
  • Obama's Avatar
    19th May 2019, 21:00
    haiz...This is real result,but i will try again.
    44 replies | 1449 view(s)
  • compgt's Avatar
    19th May 2019, 20:53
    Man, 1028 bytes? Seems to me like my output 1K frequency table plus the famed 32-bit filesize. You might have an algorithm for some tailored inputs, but sure you can't compress *all* the 1,000,000-byte random files into just 1028 bytes.
    44 replies | 1449 view(s)
  • xinix's Avatar
    19th May 2019, 20:33
    xinix replied to a thread paq8px in Data Compression
    paq8px stopped compressing files with a resolution of 16x995810 and the like. paq8px_v179 does not see the bmp file. Although paq8p1 has no such problems and it perfectly compresses the example.
    1586 replies | 459894 view(s)
  • Obama's Avatar
    19th May 2019, 20:14
    never trust me ? just come my home , need my home address?
    44 replies | 1449 view(s)
More Activity