Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: New GUI tool

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts

    Question New GUI tool

    For quite a long time, I have an idea about to create a new GUI tool, essentially sort of "Army Knife" for data compression scene.

    As example, testing my new compressors I do compress and decompress a file and to check decompressed file integrity I do ZIP both that files using WinRAR to check both CRCs.

    To solve that problem, I may wrote a GUI program for computing CRCs - you need just drop both files to a window to see file sizes and CRCs.

    Currently just collecting the ideas and a feature list. As example, apart from standard CRC32 the program may have CRC16, ADLER32, MD5, Own HASH function etc.

    Also program may have any extra features such as data compression/encryption, including sort of deduplication, built-in hex editor, file detector - including header reading and analysis, etc. etc.

    Any ideas?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    It doesn't make sense to do that just for crcs - for that a .bat script calling md5sum would be enough,
    and there're GUI programs for hash calculation anyway.

    In a way, I have a similar idea about compression IDE, but its more like extended shell than real GUI.
    cmd.exe syntax is too simple - at least, I'd like a stream redirection support allowing to implement something
    like 7zip bcj2 filter.

    Then, it'd require a lot of archiver functionality,
    binary file parsing support by provided structure description
    (see http://www.sweetscape.com/010editor/, http://flavor.sourceforge.net),
    support for file editing and comparison, and various visualization features.

    The main problem with this is that many required features are not readily available
    as convenient modules (even plain archiving w/o compression), and once they're
    available, you'd be able to use them even without such IDE

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Awesome feedback!

    Anyway, check this out:
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	chk.png 
Views:	439 
Size:	306.1 KB 
ID:	1401  

  4. #4
    Member VoLT's Avatar
    Join Date
    Mar 2010
    Location
    Moscow, Russia
    Posts
    20
    Thanks
    2
    Thanked 1 Time in 1 Post
    nice, very simple GUI

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    It's much more than meets the eye.

    Currently it has:
    • Full unicode support
    • Full Drag&Drop support - you may drop multiple files and folders. Folders will be fully scanned. You even may scan full drives.
    • Duplicate files will be marked by new icon and a file copy will have a full path to original file (Currently it's "Copy" column of a list)


    Continue working...

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Cool. I'm sure that its big enough to be used as a test sample (that app)

    Anyway, here's a modular coder from 2003, something related to what that
    "compression IDE" could do: http://nishi.dreamhosters.com/u/dc-kit2.rar
    (beware of 16-bit executables)

    Here's a script which implements Distance Coding:

    Code:
    set a=book1
    
    if not exist "%1" exit
    rmdir /s /q temp
    mkdir temp
    copy /b %1 temp\%a%
    cd temp
    set path=..\bin
    
    BWT %a% %a%.bwt
    lng %a%.bwt %a%.lng
    frq %a%.lng %a%.frq
    rem fixed book1.lng -> book1.dc1 
    dc1_c
    lng-asc %a%.frq %a%.dc
    frq %a%.dc1 %a%.frq
    lng-asc %a%.frq %a%.asc
    addfile %a%.dc %a%.asc
    lng-ari %a%.dc1 %a%.ari
    addfile %a%.dc %a%.ari
    
    rem book1.dc is the result
    lng - convert file of bytes into array of uint32s
    frq - compute a frequency table for given array of uint32s
    lng-asc - compress an array of uint32s (freqtable model, but anything goes)
    lng-ari - combinatoric arithmethic encoder, needs a frequency table for decoding
    addfile - add a file (and its size) to archive

    Then there're backward transforms for these and plenty other modules, including "lng-ppm"
    and various bitcodes (huffman etc), most working with these 32-bit tables.

    Well, I had the same idea at that time basically - to make lots of components for
    various transforms, and make it possible to try out various configurations in .bat scripts.

    But then...
    - People couldn't even run my scripts for various reasons, and hated to read them.
    - It was hard to write algorithms for 32-bit alphabet
    - The focus changed to other types of components which work at single symbol level (mixers etc)

    So I quit it (along with .asm programs which people seemed to hate as well)
    and started posting C/C++ coders like I still do.

    But maybe you have an idea what to change to make it useful?

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Well, scripts are not for the masses for sure. GUI with tools or something...

    Anyway, testing my new app. It began as a file comparer and now it looks like duplicate file finder - scanned my drives and surprisingly found many copies of same files.

    Currently I'm thinking to add a second checksum/hash. Along with CRC32, what kind of a HASH new program must have?

    MD4, MD5, SHA1, TIGER, ... or own HASH (like 64-bit FNV HASH or something)

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    That starts to sound like http://www.whereisit-soft.com/product.html

  9. #9
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I vote fore MD5, and off course byte by byte comparison

    Gosh win7 have a blurry fonts
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	chk+.png 
Views:	260 
Size:	188.4 KB 
ID:	1404  
    Last edited by chornobyl; 6th November 2010 at 03:23.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Looks like MD5 is not the best choice. MD5 has well known flaws. MD6, TIGER, SHA256 or SHA1 is under testing...

    And this is what I've got now:
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	1.jpg 
Views:	314 
Size:	97.2 KB 
ID:	1403  

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    CRC32 is far much weaker than I expect. MD5 and others are slow.

    So, I designed my own 64-bit hash. It's fast (closely to CRC32) and looks great! Currently it's under heavy testing...

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The idea about how to handle duplicate files:

    File icon in the list indicates it's state:

    Green icon with OK sign - file has no duplicates

    Black icon - File has a duplicate(s)

    Grey icon - This file is a duplicate file (A copy of previously seen file)

    Red icon with the Cancel sign - An error occurred (Unable to open file)
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	new.jpg 
Views:	284 
Size:	181.8 KB 
ID:	1406  

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    The screen shot looks blurry inside the forum. (At least under my Opera. Downloading this file solves that issue)

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    on FF it looks ok

  15. #15
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Looks blurred in both Opera and FF for me ;p after downloading it still looks blurred...i'm guessing encode is talking about the text inside the window? the only thing clear is the window borders and buttons (minimise/X etc) and the text in the title bar.

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Regarding hashes, if you goal is to positively identify identical files by comparing the hashes only, then the fastest and shortest hash to do this is probably SHA-1, which is 160 bits. All well known 128 bit hashes (MD4, MD5, Tiger, RIPEMD-12 have known collisions. 128 bits should otherwise be enough. It's just that the well known algorithms have weaknesses. You could probably use 128 bits of SHA-1 or SHA-256, throw away the rest and still be safe.

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Added SHA1.

    Changed icon behavior:

    Light gray icon - Regular file with no duplicates
    Black icon - This file has a matched copy.
    Green icon with OK sign - File copy. (Correctly decompressed file)
    Red with Cancel sign - Error

    Check out the screen shot:
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	chk.png 
Views:	278 
Size:	244.5 KB 
ID:	1411  

  18. #18
    Member Sanmayce's Avatar
    Join Date
    Apr 2010
    Location
    Sofia
    Posts
    57
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi Matt and Encode,

    I would like to read your opinion: 'Could 2(or 3) non-cryptographic-very-fast-hashes with many collisions compete(regarding speed) with a far-far stronger one?'
    I mean if 1st+2nd[+3rd] total time is less than SHA1 time for example.
    My basic idea is to calculate in parallel(dedicated threads). I am interested not in files as keys but some strings(mainly alpha non-numeric up to 960chars).
    Matt could you tell please some appropriate hash function for such a task?

    And also whether 2(or 3) weak hash values could compete(regarding collisions) with 1 strong hash value?
    I didn't benchmark the SHA1, the question in fact is a practical(and principal in the same time) one.
    I disregard the obvious handicap: 160bits vs 3x32bits.
    Last edited by Sanmayce; 10th November 2010 at 18:33.

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    You compare apples to oranges. Secure Hash Algorithm and hash for string searching is two different things with two diffrent goals. Hash algorithm can be selected and optimized for specific data. The example is multiplicative hashing with bruteforcelly optimized to given data parameters like multiplier constant.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    As a note, 160 bits = 5 x 32 bit ints...

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Combining insecure hashes doesn't give you a secure hash. BTW here are some benchmarks for enwik9 using Slavasoft fsum on a 2 GHz T3200 (32 bit Vista). Secure hashes don't need to be slow.

    adler 1.8 sec (not secure)
    crc32 5.0 sec (not secure)
    md4 4.3 sec (broken)
    md5 5.1 sec (broken)
    ripemd 10.8 sec (broken)
    tiger 12.2 sec (broken)
    sha1 5.5 sec (weak, see http://en.wikipedia.org/wiki/SHA-1#SHA-1 )
    sha256 13.3 sec (secure)
    sha512 141.3 sec (secure)

    My libzpaq implementation of SHA1 is 18 sec for some reason

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    on my 3.2 ghz core2, crc32 from 7-zip is 2gb/sec (!)

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Broken or not, these hashes (MD5,SHA1,CRC) are too popular to ignore them, and the file checker/validator should support them. Anyway, we should wait for SHA3...

    Whatsoever, SHA1 is really good choice - fast and secure enough to work in practice, at least for the file verification.

    And since CHK compares all hashes known (CRC,SHA1,MD5) it's secure enough for sure...

  24. #24
    Member Sanmayce's Avatar
    Join Date
    Apr 2010
    Location
    Sofia
    Posts
    57
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for replies,

    Encode, I just wanted to direct attention to stronger than current fastest multiplicative hashes. I thought that my question is kind of silly but I guess while oranges and apples are from different families(there is no definitive answer whether they can't mix) one of my favorite fruits is a mix between mandarin and lemon or so called minneola.

    Thanks Matt,
    "Secure hashes don't need to be slow."
    I don't know a thing about secure(or any other than multiplicative) hashes, that's why I am asking you, my goal is to combine ideas from different kinds of hashers in some hybrid/crossbred and eventually to use(borrow techniques not necessarily from secure hashes) it as strengthened fast multiplicative hash.
    Bulat, it is interesting to compare 'CRC32' versus 'murmur' and my new FNV1A variants: Whiz/Smaragd/Peregrine/Nefertiti.

    A PDF booklet containing C sources(with corresponding 32bit instructions generated from VS2010):

    http://www.sanmayce.com/Downloads/Ha...13-HASHERS.pdf

    My(written by Peter) new heavy hash test:

    http://www.sanmayce.com/Downloads/_KAZE_hash_test_r2.7z

    Just found something benchmark suitable in C:

    http://rhash.anz.ru/
    http://homes.esat.kuleuven.be/~bosselae/ripemd160.html
    Last edited by Sanmayce; 15th November 2010 at 19:03. Reason: New revisions

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Added "Copy Info" feature. It's really important feature, just what I missed. Select a file from a list and press "Copy Info" or Ctrl+C and the file info will be copied to the clipboard:

    Something like:

    File: en_windows_7_ultimate_x64_dvd_x15-65922.iso Size: xxx bytes SHA1: 326327CC2FF9F05379F5058C41BE6BC5E004BAA7 CRC: 1F1257CA

    Not decided yet about the layout (Feel free to suggest)

    Thus you will able to post such info in the forum or save somewhere. As example, if you compress a file and need the compressed size, just drop a file to CHKs window and press Ctrl+C! And paste that info to the benchmark table.

    Added "Sort" ("Sort By SHA1" or "Compare") feature. All files will be sorted by SHA1, same files will be marked by the Black Square icon instead of a Light Grey one (Visually easier to find SHA1 matches)

    Added "Delete" command - for selected file deletion.

  26. #26
    Member
    Join Date
    Nov 2010
    Location
    fr
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    hi
    It should be nice to :
    sort column you want (double click in column or right click colum -> sort or by menu)
    save in ascii form the content of window

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Added the column sort feature. Now if you click on column header (Name, Size, SHA1, ...) The list will be sorted by Name, Size, SHA1, ... Also, files with matched SHA1 will be marked by special icon (Black icon instead of a light gray one)

    The "Copy Info" feature now copies the file info to the clipboard in the followed manner:

    File Name: BOOK1
    Size: 768,771 bytes
    SHA1: 673C583D45544003EB0EDD57F32A683B3C414A18
    CRC: 24E19972

    Yet another example:

    File Name: world95.txt
    Size: 2,988,578 bytes
    SHA1: D840A0C10D84B536D91F7838F72F747BFB7011EE
    CRC: 35098A94


    In addition, I'm working on "Save As..." command that will save the complete file list with info to a plain text file.

    Just two things.

    Should that file will be in Unicode plain text format? Or ASCII text file is okay?

    The output layout. I think that it should be compatible with checksum parsers. But what layout to use?

  28. #28
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > Should that file will be in Unicode plain text format? Or ASCII text file is okay?

    utf8 plaintext?

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Just kept MD5 and SHA1 only:
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	chk-md5.png 
Views:	277 
Size:	144.7 KB 
ID:	1415  

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Should I keep MD5 only or keep MD5/SHA1 separately (hash option)? Since second hash makes CHK two times slower! (Today I switched to VCL components - to not reinvent the wheel - and VCL is so slow...)

Page 1 of 2 12 LastLast

Similar Threads

  1. compression trace tool
    By Shelwien in forum Data Compression
    Replies: 6
    Last Post: 19th August 2009, 03:52
  2. LZBW1 - compression tool by another newbye :)
    By stfox in forum Data Compression
    Replies: 4
    Last Post: 28th April 2009, 16:33
  3. GUI for CCMx and DURILCA?
    By LovePimple in forum Forum Archive
    Replies: 27
    Last Post: 23rd March 2008, 21:12
  4. FreeArc GUI - how it should look&feel?
    By Bulat Ziganshin in forum Forum Archive
    Replies: 31
    Last Post: 20th July 2007, 17:32
  5. GUI for creation of 7z-SFX-archives
    By Vacon in forum Forum Archive
    Replies: 0
    Last Post: 8th June 2007, 15:16

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •