Page 1 of 3 123 LastLast
Results 1 to 30 of 80

Thread: Data deduplication

  1. #1
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Data deduplication

    I've (QuickLZ dude) developed a data deduplicating archiver at www.exdupe.com

    I think it's the first tool that does deduplication while being as simple as gzip and other command line based archivers.

    Data deduplication finds identical blocks across terabytes of input data, so compression ratio increases dramatically if you compress things like vmdk files that contains common application and system files.

    Also if you compress, say, your Windows system drive, lots of .dll files contain common data areas, and in a small test on my own PC on the C drive it outperformed winrar compression ratio-wise, with a throughput of ~100 MB/s (disk I/O bound)

    It's in the last beta phase, so mail me any bugs you find!

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    how it compares to srep?

  3. #3
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    I havn't tested (s)rep. But the website says "rep finds repetitions at the distances up to 1gb" where eXdupe finds them terabytes/petabytes apart. Again, because exdupe is deduplication and not regular dictionary compression

    Also, the website says rep runs at 10-30 mb/s. exdupe runs at around 100 mb/s per core and is multi threaded

  4. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I guess there is no standard dedupe test corpus yet?
    I don't want to invent my own...

  5. #5
    Member zody's Avatar
    Join Date
    Aug 2009
    Location
    Germany
    Posts
    90
    Thanks
    0
    Thanked 1 Time in 1 Post
    Exdupe compresses much better than srep - but in a second pass using e.g. lzma the archive using srep gets smaller than the other one.
    For backup at high speed exdupe seems to be better, whereas srep allows the better compression ratio...but for a real test exdupes internal data compression needs to be turned off.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it's not necessarily internal compression. RZIP was able to compress sequences 32 bytes long and was able to find duplictaes at any distance, although with decreased efficiency. so far it looks like EXDUPE implements the highly optimized sort of RZIP algorithm

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    my own benchmark on 98 gb VHD file (image of my system drive):

    Code:
    Z:\vhd>read 7b97adb9-012e-11e0-b3cf-806e6f6e6963.vhd
    7b97adb9-012e-11e0-b3cf-806e6f6e6963.vhd 98gb: time 198.707619 seconds, speed 495.303245 mbytes/sec
    
    Z:\vhd>exdupe.exe 7b97adb9-012e-11e0-b3cf-806e6f6e6963.vhd vhd.exdupe
    
    COMPRESSED 98,420,528,640 bytes in 1 file(s) into 41,867,666,767 bytes
    
    Kernel Time  =    47.112 = 00:00:47.112 =  14%
    User Time    =   995.707 = 00:16:35.707 = 310%
    Process Time =  1042.819 = 00:17:22.819 = 325%
    Global Time  =   320.629 = 00:05:20.629 = 100%
    Global Time  =   218 seconds if output data are written to nul
    
    Memory stats:
            Page faults count       : [   351'460]
            Peak pagefile usage     : [1'384'370'176] bytes
            Peak virtual size       : [1'429'762'048] bytes
            Peak working set size   : [1'342'357'504] bytes
    I/O stats:
            Total reads : [          11'735]        Total read    : [  98'420'528'640] bytes
            Total writes: [          98'367]        Total written : [  41'867'666'767] bytes
            Total other : [              82]        Total other   : [           1'368] bytes
    
    
    Z:\vhd>exdupe.exe -R vhd.exdupe 1
    Kernel Time  =    82.696 = 00:01:22.696 =   4%
    User Time    =   377.085 = 00:06:17.085 =  22%
    Process Time =   459.781 = 00:07:39.781 =  26%
    Global Time  =  1709.974 = 00:28:29.974 = 100%
    
    Memory stats:
            Page faults count       : [    84'092]
            Peak pagefile usage     : [333'873'152] bytes
            Peak virtual size       : [375'717'888] bytes
            Peak working set size   : [218'316'800] bytes
    I/O stats:
            Total reads : [       3'259'533]        Total read    : [  71'342'667'432] bytes
            Total writes: [         750'891]        Total written : [  98'420'528'640] bytes
            Total other : [              81]        Total other   : [      29'232'354] bytes
    
    
    Z:\vhd>arc create a *.exdupe -t -m1 ; create a *.exdupe -t -m2 ; create a *.exdupe -t -m3 ; create a *.exdupe -t -m4
    Compressed 1 file, 41,867,284,269 => 37,974,401,582 bytes. Ratio 90.7%
    Compression time: cpu 830.39 secs, real 211.61 secs. Speed 197,849 kB/s
    Testing time: cpu 671.74 secs, real 93.61 secs. Speed 447,271 kB/s
    
    Compressed 1 file, 41,867,284,269 => 36,584,904,165 bytes. Ratio 87.3%
    Compression time: cpu 3876.77 secs, real 513.59 secs. Speed 81,519 kB/s
    Testing time: cpu 908.11 secs, real 122.52 secs. Speed 341,710 kB/s
    
    Compressed 1 file, 41,867,284,269 => 35,913,005,006 bytes. Ratio 85.7%
    Compression time: cpu 8596.70 secs, real 1112.21 secs. Speed 37,643 kB/s
    Testing time: cpu 2448.20 secs, real 315.24 secs. Speed 132,811 kB/s
    
    Compressed 1 file, 41,867,284,269 => 35,096,359,327 bytes. Ratio 83.8%
    Compression time: cpu 14347.33 secs, real 1841.05 secs. Speed 22,741 kB/s
    Testing time: cpu 2399.20 secs, real 309.43 secs. Speed 135,306 kB/s
    
    
    Z:\vhd>timer exdupe.exe -g4 7b97adb9-012e-11e0-b3cf-806e6f6e6963.vhd vhd.exdupe
    COMPRESSED 98,420,528,640 bytes in 1 file(s) into 41,194,031,439 bytes
    Global Time  =   321.939 = 00:05:21.939 = 100%

    SREP:

    Code:
    Z:\vhd>timer "C:\!\FreeArchiver\Compression\SREP\srep64i.exe" -m1f 7b97adb9-012e-11e0-b3cf-806e6f6e6963.vhd vhd.srep
    
    SREP 2.991 alpha (August 13, 2011): input 93861 mb, ram 6480 mb, -m1f -l512 -c512 -a4
    100%: 98,420,528,640 -> 55,214,609,228: 56.10%. Cpu 109 mb/s, real 103 mb/s
    
    Kernel Time  =   115.627 = 00:01:55.627 =   8%
    User Time    =  1408.907 = 00:23:28.907 = 109%
    Process Time =  1524.535 = 00:25:24.535 = 118%
    Global Time  =  1284.980 = 00:21:24.980 = 100%
    
    Z:\vhd>timer "C:\!\FreeArchiver\Compression\SREP\srep64i.exe" -d vhd.srep nul
    
    Cpu 429 mb/s, real 283 mb/s. Matches 0 976361 7616620, I/Os 0, RAM 0/9129, VM 0/0, R/W 0/0
    
    Kernel Time  =    21.715 = 00:00:21.715 =   6%
    User Time    =   218.573 = 00:03:38.573 =  65%
    Process Time =   240.288 = 00:04:00.288 =  72%
    Global Time  =   332.703 = 00:05:32.703 = 100%
    
    Z:\vhd>timer "C:\!\FreeArchiver\Compression\SREP\srep64i.exe" -d -mem256 vhd.srep nul
    
    Cpu 215 mb/s, real 140 mb/s. Matches 0 151862 16103900, I/Os 0, RAM 0/216, VM 0/8640, R/W 63040/63040
    
    Kernel Time  =    54.179 = 00:00:54.179 =   8%
    User Time    =   437.099 = 00:07:17.099 =  65%
    Process Time =   491.278 = 00:08:11.278 =  73%
    Global Time  =   671.725 = 00:11:11.725 = 100%
    
    Z:\vhd>arc a b1 *.srep -t -m1 ; a b2 *.srep -t -m2 ; a b3 *.srep -t -m3 ; a b4 *.srep -t -m4
    Compressed 1 file, 55,214,690,844 => 30,733,999,991 bytes. Ratio 55.6%
    Compression time: cpu 811.86 secs, real 201.86 secs. Speed 273,534 kB/s
    Testing time: cpu 602.74 secs, real 85.73 secs. Speed 644,069 kB/s
    
    Compressed 1 file, 55,214,690,844 => 28,932,018,682 bytes. Ratio 52.3%
    Compression time: cpu 3185.96 secs, real 460.34 secs. Speed 119,942 kB/s
    Testing time: cpu 782.35 secs, real 112.83 secs. Speed 489,382 kB/s
    
    Compressed 1 file, 55,214,690,844 => 27,607,687,070 bytes. Ratio 50.0%
    Compression time: cpu 6932.39 secs, real 997.61 secs. Speed 55,347 kB/s
    Testing time: cpu 1933.54 secs, real 263.33 secs. Speed 209,682 kB/s
    
    Compressed 1 file, 55,214,690,844 => 26,452,847,627 bytes. Ratio 47.9%
    Compression time: cpu 15606.14 secs, real 2014.77 secs. Speed 27,405 kB/s
    Testing time: cpu 1891.96 secs, real 248.56 secs. Speed 222,140 kB/s
    Last edited by Bulat Ziganshin; 9th September 2011 at 22:58.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    On my 2600k@4.6, it processed 450 mb/s (close to hdd speed) while utilizing CPU by 60-70%. Given faster hdds it should be able to process 700 mb/s

  9. #9
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    hi,

    how much decompressions memory is needed?
    in future will be support for incremental backups?

  10. #10
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    539
    Thanks
    192
    Thanked 174 Times in 81 Posts
    Quote Originally Posted by thometal View Post
    in future will be support for incremental backups?
    According to the website (I don't have a 64-bit OS/VM installed to test myself), incremental backups (diffs) are possible using -D switch.
    http://schnaader.info
    Damn kids. They're all alike.

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    also:

    Code:
    Z:\vhd>timer slug c vhd.exdupe vhd.exdupe.slug
    Slug 1.27b (May 12 2008) - Copyright (c) 2007-2008 by Christian Martelock
    
    Crunching [vhd.exdupe] - 41,868,477,698 -> 37,368,982,628 (89.25%)
    
    Kernel Time  =    23.088 = 00:00:23.088 =   4%
    User Time    =   452.886 = 00:07:32.886 =  92%
    Process Time =   475.974 = 00:07:55.974 =  97%
    Global Time  =   489.953 = 00:08:09.953 = 100%
    
    Z:\vhd>srep64i vhd.srep
    100%: 55,184,163,152 -> 98,420,528,640: 56.07%. Cpu 268 mb/s, real 20 mb/s
    
    Kernel Time  =   106.283 = 00:01:46.283 =   2%
    User Time    =   350.409 = 00:05:50.409 =   7%
    Process Time =   456.692 = 00:07:36.692 =   9%
    Global Time  =  4726.300 = 01:18:46.300 = 100%
    i.e. w/o future-lz, srep decompression is much slower than exdupe
    Last edited by Bulat Ziganshin; 10th September 2011 at 13:36.

  12. #12
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Quote Originally Posted by schnaader View Post
    According to the website (I don't have a 64-bit OS/VM installed to test myself), incremental backups (diffs) are possible using -D switch.
    incremental backups != differential backups


    if you do only diffs u need to restore a from all backup files

    but if you do backup via inc then u need only restore from the incremental backups

    incremental files are more than 1 merged diff files


    will it be possible to restore single files from a backup file?
    Last edited by thometal; 10th September 2011 at 11:23.

  13. #13
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by thometal View Post
    incremental backups != differential backups


    if you do only diffs u need to restore a from all backup files but if you do backup via inc then u need only restore from the incremental backups incremental files are more than 1 merged diff files


    will it be possible to restore single files from a backup file?
    exdupe supports differential backups, so a restore depends on just 1 diff file and 1 full file. Intermediate diff files aren't used for restore.

    It also supports restoring individual files/directories. You can specify multiple individuals of those

    And thanks for the benchmark, Bulat! Seems like there's not very much redundance on that drive...

  14. #14
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by thometal View Post
    hi,

    how much decompressions memory is needed?
    in future will be support for incremental backups?
    Decompression needs around 256 MB, regardless of data size and memory flags during compression. It supports diff backups already (again, byte-sliding window, not some lame whole-file or fixed-offsets blocks method). Not sure if I'll implement incremental backups. Also not sure about synthetic diff backups.

  15. #15
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Oh, and I'm looking for somebody to create a nice GUI + job scheduling + etc for Windows that must communicate with this console version. It's not making any money yet (still in beta) so it would be a cooperation/partnership.

  16. #16
    Member BetaTester's Avatar
    Join Date
    Dec 2010
    Location
    Brazil
    Posts
    43
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Click image for larger version. 

Name:	Vssapi.png 
Views:	492 
Size:	16.4 KB 
ID:	1655
    WinXp Pro x64, version 2003 Sp2

    Exdupe uses the Microsoft Volume Shadow Services API (Vssapi.dll)
    http://encode.ru/threads/1286-A-Microsoft-study-on-deduplication

  17. #17
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    It's written in its changelog:
    0.19 Supports snapshots through Volume Shadow Copy Service (Windows only).
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  18. #18
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    I did some manual tests:

    Input:
    55,173,527,965 bytes rar store Window Server 2008 R2 Std. boot disk

    Output:
    26,672,880,413 bytes 8 min lzturbo -11 -m8
    25,539,531,344 bytes 8 min lzturbo -21 -m8
    24,261,130,529 bytes 41 min serp -m1
    21,828,783,130 bytes 8 min lzturbo -41 -m8
    21,177,509,823 bytes 12 min arc zip
    20,500,444,889 bytes 16 min slug
    20,466,885,179 bytes 40 min rar -m1
    20,338,770,203 bytes 9 min lzturbo -51 -m8
    19,294,373,373 bytes 18 min sx
    18,153,840,910 bytes 55 min zpaq -m1
    18,034,737,268 bytes 19 min lzturbo -53 -m8
    17,606,969,343 bytes 9 min arc -2
    17,275,940,088 bytes 21 min bsc -m3
    17,187,041,086 bytes 89 min zpaq -m2
    16,679,700,910 bytes 25 min bsc -m0 -f
    16,485,960,007 bytes 35 min bsc -m0
    16,412,222,232 bytes 5 min exdupe
    16,278,594,728 bytes 21 min bsc -m4
    16,167,142,667 bytes 29 min bsc -m6
    16,112,325,941 bytes 25 min bsc -m5
    15,900,789,088 bytes 22 min arc -3
    13,724,023,705 bytes 5+4=9 min exdupe & arc -m2
    9,385,075,289 bytes 41+5=46 min serp & arc -m2

    Exdupe do a very good and quick job, well done!

    Is this kind of software also behind that new launched Bitcasa infiniti storage service?
    http://blog.bitcasa.com/69884449
    Last edited by Sportman; 14th September 2011 at 17:58.

  19. #19
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Oh, and I'm looking for somebody to create a nice GUI + job scheduling + etc for Windows that must communicate with this console version. It's not making any money yet (still in beta) so it would be a cooperation/partnership.
    I could put some work on this, but it will be written in java and via eclipse rcp. Also i cannot promise that i can work every week a fixed amount of time on it, because currently i have not much sparetime.
    Last edited by thometal; 14th September 2011 at 22:39.

  20. #20
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    37
    Thanked 168 Times in 84 Posts
    I'm also have problem while trying to run exdupe on XP x64 SP2. Same problem as mentioned by BetaTester in message #16

  21. #21
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    exdupe crashes under windows at a file with the name "★ text für danksagung bei trauerfall............ Sonstiges (Plauderecke) im WWW.CHEFKOCH.DE Forum.url" maybe because of the star

  22. #22
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by Sportman View Post
    Is this kind of software also behind that new launched Bitcasa infiniti storage service?
    http://blog.bitcasa.com/69884449
    I found today a partial answer:

    "So if I upload a file and Marissa uploads the same file, do you store two different copies of that or one?"

    "TG: No, we do de-duplication on the server side. So we actually determine on the server side if it's there, and if it's already there, we don't have to upload it again."

    http://techcrunch.com/2011/09/18/bit...ns-encryption/

    Their explanation how they can do this with encrypted files with different keys used is not clear to me...

  23. #23
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    It's a simple trick: use a checksum as a key. This way different people will encrypt the same block in the same way.

  24. #24
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Well, that somewhat sucks. If have files A and B, where file B is file a with prepended additional byte, then after encryption those files would not be deduplicable. At least if the block boundaries are content independent. If block boundaries are data dependent (like in Shelwien's variant of SREP) then inserting data in any place won't break the whole deduplication process completely. Hmmm, probably that's what they are doing. But then we need to store and transmit also block lengths.

  25. #25
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Majority of dedup appliances use fixed 4k-128k blocks. Only lz-like ones are better, but there are scalability problems.

    Their scheme has a bigger drawback: If they have some file, they can check if you have it too. Therefore "we can't tell what you have" is only half-truth. Don't put MP3s that you downloaded from the net on it.

  26. #26
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Well, if you would do additional encryption then you would be safe (is my grammar good?)

  27. #27
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by thometal View Post
    exdupe crashes under windows at a file with the name "★ text für danksagung bei trauerfall............ Sonstiges (Plauderecke) im WWW.CHEFKOCH.DE Forum.url" maybe because of the star
    Phew, eXdupe 0.30 now supports such Unicode filenames. That required a large rewrite on Windows. Tip: Support it from beginning!

    On Linux/Mac/etc it's much simpler because it (API) uses UTF-8 so you don't need any changes and can keep your char *, strlen, etc as you used to to. On Windows you need to use L prefix on constant strings, wchar_t type and wstring. And change API names and tons of C string function names.

  28. #28
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Lasse, it may be simpler to provide utf-8 api on windows:

    my_fopen(char* filename)
    {
    #if WIN32
    wchar real_filename[100];
    utf8_to_utf16(filename,real_filename);
    return wopen(real_filename);
    #else
    return open(filename);
    #endif
    }

  29. #29
    Programmer
    Join Date
    May 2008
    Location
    denmark
    Posts
    94
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Lasse, it may be simpler to provide utf-8 api on windows:

    my_fopen(char* filename)
    {
    #if WIN32
    wchar real_filename[100];
    utf8_to_utf16(filename,real_filename);
    return wopen(real_filename);
    #else
    return open(filename);
    #endif
    }
    I had tons of operations on filenames and pathnames and used std::string almost everywhere instead of char *. Worked fine with Japanese, etc, on *nix which uses UTF8, but failed on Windows.

    So in VC++ I chose "Unicode" project settings and wrote the attached unicode.h that I included in every source file.

    Now I changed char into CHR, constant strings like "hello" into TT("hello"), std::string into std::STRING, strlen into STRLEN and so on.

    Yes, I know that VS provides macros for much of this, and also conditionally appends W to API names. But these macros are intended only for ANSI-vs-Unicode on Windows, they are not intended for Windows-vs-*nix and don't handle _wfopen, std::string, strlen, vfprintf and man others. So I could just as well re-invent the wheel like above.

    I then created two helper functions to read/write strings:

    STRING Cio::readstr(FILE *_File)
    {
    char tmp[4096];
    memset(tmp, 0, 4096);
    size_t t = read32(_File);
    try_read(tmp, t, _File); // read string

    #ifdef WINDOWS
    wchar_t tmp2[4096];
    memset(tmp2, 0, 4096);
    MultiByteToWideChar(CP_UTF8, 0, tmp, -1, tmp2, t);
    return STRING(tmp2);
    #else
    return STRING(tmp);
    #endif
    }


    size_t Cio::writestr(STRING str, FILE *_File)
    {
    CHR tmp[4096];
    memset(tmp, 0, 4096);
    char tmp2[4096];

    #ifdef WINDOWS
    memset(tmp2, 0, 4096);
    size_t t = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), -1, tmp2, 4096, 0, 0);
    #else
    size_t t = str.length();
    memcpy(tmp2, str.c_str(), str.length());
    #endif

    size_t r = write32((unsigned int)t, _File);
    r += try_write(tmp2, t, _File);
    return r;
    }

    Phew. I might sum it all up in another thread and provide some source code if anybody are interested. Indeed a very time consuming problem...

    To sum it up, you can now compress a japanese filename on Windows and decompress it on Linux and vice versa
    Attached Files Attached Files
    Last edited by Lasse Reinhold; 4th October 2011 at 13:56.

  30. #30
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    372
    Thanks
    26
    Thanked 22 Times in 15 Posts
    Under Fedora 15 x64 I get a seg fault using 0.30 via -g2 -t2 compressing .mp3 . jpg .flv .java and many documents. If I compress the dir on which the compression failed seperatetly it will not crash.

    Error opening source file '/home/thomas/Documents/Studium/Eclipse Workspace/.metadata/.plugins/org.eclipse.core.resources/.history/bf/108383195b03001e1a49f42aaf336ec1'
    *** glibc detected *** ./exdupe: double free or corruption (!prev): 0x0000000004054630 ***
    ======= Backtrace: =========
    /lib64/libc.so.6[0x30ede7703a]
    /lib64/libc.so.6(fclose+0x155)[0x30ede66e45]
    ./exdupe[0x42ec85]
    ./exdupe[0x438f8b]
    ./exdupe[0x43a660]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43b633]
    ./exdupe[0x43bc79]
    ./exdupe[0x43d93a]
    /lib64/libc.so.6(__libc_start_main+0xed)[0x30ede2139d]
    ./exdupe[0x4041e9]
    ======= Memory map: ========
    Aborted (core dumped)
    with option -c i still get

    Skipped '/home/thomas/Documents/Studium/Eclipse Workspace/.metadata/.plugins/org.eclipse.core.resources/.history/bf/108383195b03001e1a49f42aaf336ec1' (error opening)
    Skipped '/home/thomas/Documents/Studium/Eclipse Workspace/.metadata/.plugins/org.eclipse.core.resources/.history/bf/b0f140df1e08001e1738a1ee2f76ef2d' (error opening)
    Segmentation fault (core dumped)
    Last edited by thometal; 5th October 2011 at 02:12.

Page 1 of 3 123 LastLast

Similar Threads

  1. loseless data compression method for all digital data type
    By rarkyan in forum Data Compression
    Replies: 157
    Last Post: 9th July 2019, 17:28
  2. A Microsoft study on deduplication
    By m^2 in forum Data Compression
    Replies: 1
    Last Post: 5th May 2011, 18:15

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •