Christian
original s & s does cost optimization backward and path construction forward. igor pavlov in 7-zip uses the variation you described.
Christian
original s & s does cost optimization backward and path construction forward. igor pavlov in 7-zip uses the variation you described.
Of course. I was referring to the "bit optimal" description in the linked post I posted several pages ago.
Im quite sure, that LZX and Malcoms ROLZ does it forward, too. Otherwise it doesnt make much sense (e.g. for extended syntax, ...).Originally Posted by donkey7
Would it be possible to make a multithreaded brute force approach.Originally Posted by Christian
one thread would encode without filter the other thread with filter. Then compare and use the smallest output.
thats kind of the idea Im using my compression batch for. Going through all the different ways in parallel utilizing all my cores and finding the smallest output.
Just some small test i did
Directory of the installed Office 2003 Danish version, stored inside a 7-zip container (store method)
Org - 133.600.881 bytes
7zip - 42.801.176 bytes)
RZM - 40.182.244 bytes
CCMx - 38.672.064 bytes
Hmm seems to be a 2gb filesize limit with RZM![]()
Thanks Cristian for another good compressor.
4Gb - CSS Game
Core2 T5500, 2 Gb Ram
Ratio ///////// Comp. ////// Decomp. /// Archiver
34.417% //// 341kb/s //// 2097kb/s // WinRK 3.0.3 Rolz3 Normal
34.427% //// 892kb/s //// 9262kb/s // WinArc 0.50a -mx -ld=1gb -mc-rep
35.043% // 1425kb/s // 18095kb/s // 7-zip 4.56 Ultra -d110m
36.209% //// 674kb/s //// 6364kb/s // RLZ v0.6c
37.541% //// 919kb/s //// 2115kb/s // WinRK 3.0.3 Rolz3 Fastest
41.060% // 1966kb/s // 15327kb/s // WinRar v3.71 Best
is it just me that cannot get RZM to work with big files (2gb?) ?
The files are corrupted after decompression.
The same seems to go with precomp.
I just store-split-7z my testset for RLZ.
Christian, why don't suupport the usual dolls - 4gb+ files, stdin/out, linux? rzm seems to be really useful and these features will help people
#bulat.
You migt want to look into rep.it doesn't seem to like 4GB+ files
![]()
thank you for report. i will probably update all my programs to use single cmdline/io core that supports linux, large files, stdin/out and gzip-compatible cmdlineOriginally Posted by SvenBent
# Bulat
Delta 1.0 seem to work fines with big files
original: 5.946.890.240 bytes
delta: 5.947.099.436 bytes
so the size seems right but I haven testet for propper decoding yet
it should: its driver is newer, borrowed from tor0.3. i suspect that only "incompatibility" of rep with large files is improper printing of their sizes and processing speedsOriginally Posted by SvenBent
![]()
#0
my test with Rep and 5.53gb files was a result in a 0byte file when it was decompressed.
i will try again right now
SvenBent, thanks. anyway, i will publish rep with new driver. i got an idea of publishing universal driver that any compression algorithm author can use for his own creature. such driver can impement all the features i mentioned in lzturbo thread allowing developers to focus on core compression. almost all current standalone compressors are poor in their drivers and such effort should help developers to produce really useful utilities without switching to tiresome coding. this driver can also provide MT support to cutoff lzturbo base
actually, freearc was intended for this purpose but now it was grown to much larger project.. now tornado already contains rather sophisticated driver that may be used as a base for such effort
Couldn?t you create a new thread to discuss things like this which has nothing to do with rzm anymore?
just tested two files.
rep seem ok . Must be a fault in my prior test
test1 (same as the first test)
Org: 5.946.890.240 bytes
REP: 5.909.111.554 bytes
CRC matches
test2:
Org: 4.300.154.880 bytes
REP: 4.257.603.507 bytes
CRC untestet
I withdraw my claim regarding REP and lack of 4gb+ file support![]()
Hi everyone!
Im back from my short trip to Sardinia - it was pretty nice, but quite cold. Sorry for the lack of answers. Ill try to catch-up.
Of course, but I prefer to do a good detection. Otherwise nearly all string-searching data structures have to be doubled - which is ugly.Originally Posted by SvenBent
Yes. I want to improve the naked compression core before adding all this unnecessary stuff. But stdin/stdout and big files will be added, of course.Originally Posted by SvenBent
Thanks!Originally Posted by Zonder
There is a new version of RZM, too. It works quite well, but it is not very polished, leaving room for further improvements (speed and ratio). I did some major changes to the core and extended the syntax. It handles strange files like "valley_cmb, proteins, ..." in a better way now.
There are still some deficiencies in the syntax which I want to address - e.g. long distance matches are still missing. Good news is, that I figured several ways howto merge them with ROLZ. Bad news is, that LZ77 is better suited for LDMs. Filters will be added at a later point.
[removed]
Have fun with this new version.![]()
i dont understand - what is a problem with large files for stream compressor? also, what you think about using standard driver i plan to provide? your code will just need to call read/write callbacks and process compression setting options, everything else will be handled thereOriginally Posted by Christian
thats more interesting, i have an idea for making compression multithreaded without losing ratio. am i correctly understand that rzm indexing is much faster than string searching?
why not just integrate rep-like engine? it should allow to find 16+byte matches with a small memory footprintOriginally Posted by Christian
MOC Test ->150.036.048 comp. 285,394 sec. dec. 47,121 s.
Enwik8->24.342.076
@Christian
Did I want to know why you have not added the filter delta for the images and the audio?
Its just some overflows in the match-finder, ... But Im changing all this stuff every now and then while altering the syntax. So, I do not want to do things twice.Originally Posted by Bulat Ziganshin
Honestly, Im not so fond about this. I think its a great idea - but it depends on how much existing code has to be changed in order to make it work. Additionally, Ill most probably add precomp support - so, I dont know if the framework will fit.Originally Posted by Bulat Ziganshin
Yep. String-searching eats most time - maybe 60-90% (heavily depending on the data).Originally Posted by Bulat Ziganshin
I dont know, maybe. Btw., funny story: I wrote such a tool for a friend once. He wrote a BWT based compressor whose string sorting stage had some serious worst case behaviour - the tool was a workaround for this.Originally Posted by Bulat Ziganshin
Because Ive been on a short vacation.Originally Posted by Nania Francesco Antonio
But still, I do content based data detection - this needs more tuning and time. Please try the new version - maybe its better on the MOC testset.
RZM 0.07c
Thanks Chris!
Mirror: Download
Thanks for the mirror, LovePimple!
but this prohibits using of rzm for real compressionOriginally Posted by Christian
even more - why you dont just use your own deiver from CCM? i dont undestand why this driver should be program-specific - i personally just copy the same code from project to projectOriginally Posted by Christian
i thought that it eats even more time: it should be very easy to add string to the match finder indexes. the idea is obvious: imagine that you have N cores. split data to the N chunks and run two processes in parallel - first process compress first chunk of data as usual while second just index them into separate hash table. when second process finished, start two new threads - one compress second chunk of data while another makes copy of hash table and continue to index them, and so onOriginally Posted by Christian
moreover, these processes may try to share indexing structures. for rolz-1, the main table (which stores 64k entries for each context byte) probably may be shared
also, you said about problems with too long distances. cant this be solved by using "segmented" table, i.e. instead of saving exactly 64k entries for each byte you may have, say, 16k segments 1024 entries each and realloc them between chars dynamically, depending on current usage stats
Quick test...
A10.jpg > 836,117
AcroRd32.exe > 1,236,155
english.dic > 608,859
FlashMX.pdf > 3,678,457
FP.LOG > 505,744
MSO97.DLL > 1,646,797
ohs.doc > 784,108
rafale.bmp > 920,106
vcfiu.hlp > 579,879
world95.txt > 525,901
Total = 11,322,123 bytes
ENWIK8 > 24,334,580 bytes
MOC Test ->149.669.630 comp. 302,440 sec. dec. 53,634 s.
Thanks for all the suggestions, Bulat. Other thoughts are always good.
I know. Still, Im doing this for fun. So, I do the fun things first - the actual algorithm. But Ill add the other stuff later.Originally Posted by Bulat Ziganshin
Actually, I do. But as always, you find things which can be improved. And CCM was my first compressor - the whole filtering was half assed - excuse the wording. This time I planned filtering from the beginning - I just didnt do it yet.Originally Posted by Bulat Ziganshin
It really depends on the data. On already compressed data string-searching is pretty fast. Your idea seems to be good, but there is at least one better approach for my ROLZ. I use 16M binary trees. You can just distribute the trees by their context over several threads. This way, thered be only some syncing with the parser. Since the threads work on different trees, you even dont need additional data structures (maybe ~64k for match-results). Only a good distribution has to be selected for each block - assuming block based optimal parsing. But this could be done by a fast data analysis. Still, I dont plan adding threading anytime soon.Originally Posted by Bulat Ziganshin
Actually, the problem is most prominent on already compressed data (2x the same file) because each context gets discarded nearly equally fast. In this case segmentation does not help. In other cases it might help. But it would double the memory requirements for the binary trees.Originally Posted by Bulat Ziganshin
Anyway, I already figured several workarounds for this, but I have to try them out.![]()
Now thats just selfishOriginally Posted by Christian
![]()
Hehe... You know, you have to set priorities.Originally Posted by SvenBent
![]()