+ Reply to Thread
Results 1 to 3 of 3

Thread: Dict preprocessor

  1. #1
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    463

    Dict preprocessor

    Another interesting preprocessor for highly redundant txt files is Bulat's DICT (http://www.haskell.org/bz/)
    FreeArc uses this DICT preprocessor for txt-alike files (in some modes)

    Tested on FP.LOG (20.617.071 bytes)

    timer dict -pt fp.log 1.dict -> 5.736.001 bytes (time 0.733s)

    paq8o10

    timer paq8o10 -7 fplog_paq8o10_7 fp.log
    20617071 -> 264985 (1984.70 sec)

    timer paq8o10 -7 1.dict fplog_dict_pre_paq8o10_7
    5736001 -> 223619 (516.85 sec)

    Even in paq8o10 -8 mode, dict preprocessor outputs a smaller file
    in about 25% time compared to no dict preproc. !!

    timer paq8o10 -8 fplog_paq8o10_8 fp.log
    20617071 -> 263139 (2040.63 sec)

    timer paq8o10 -8 fplog_dict_paq8o10_8 fp.dict
    5736001 -> 223267 (518.68 sec)

    Paq8o10t :
    timer paq8o10t -8 fplog_paq8o10t_8 fp.log
    20617071 -> 258587

    timer paq8o10t -8 fplog_dict_paq8o10t_8 fp.dict
    5736001 -> 236272 (416.76 sec)

    dict + paq8o10 outputs a smaller file than the new paq8o10t !
    Paq8o10t has some room for improvement !
    Maybe paq8o10t could use this preprocessor ;-)

    CMM4 v0.1f :
    timer cmm4 57 fp.log fplog_cmm4_1f_57.cmm4f
    Ratio: 426815/20617071 bytes (0.17 bpc) (Time: 16.07 s)

    timer cmm4 57 fp.dict fplog_dict_cmm4_1f_57.cmm4f
    Ratio: 357406/5736001 bytes (0.50 bpc) (Time: 5.33 s)

    RZM 0.07h
    timer rzm c fp.log fplog_rzm007h_nr2.rzm
    20133kb -> 494kb (506420b, 2.46%), done.
    --> 506.427 bytes (rzm reports 7 bytes less !!)
    time = 19.235s

    timer rzm c fp.dict fplog_dict_rzm007h.rzm
    5601kb -> 421kb (431458b, 7.52%), done.
    -> 431.465 byes (rzm reports 7 bytes less)
    time = 5.148s (4x faster !!)

    BALZ v1.13
    timer balz ex fp.log fplog_balz113.balz
    -> 551055 bytes in 20.904 sec

    timer balz ex fp.dict fplog_dict_balz113.balz
    -> 514806 bytes in 6.615 sec

    timer balz e fp.log fplog_balz113_e_nr2.balz
    -> 662.941 bytes in 2.777s

    timer balz e fp.dict fplog_dict_balz113_e.balz
    -> 543.768 bytes in 1.982s

    Brute CM v0.1d2
    timer bcm e fp.log fplog_bcm01d2.bcm
    -> 786.749 bytes in 1.388s

    timer bcm e fp.dict fplog_dict_bcm01d2.bcm
    -> 580.857 bytes in 1.186s

    BIT v0.2b
    timer bit02b a fplog_bit02b_mem9 -m lwcx -mem 9 -files fp.log
    -> 576.533 bytes in 26.498s

    timer bit02b a fplog_dict_bit02b_mem9_nr1 -m lwcx -mem 9 -files fp.dict
    -> 481.710 bytes in 9.110s

    7z 4.59 a2 -mx9
    fp.log -> 839.075 bytes
    fp.dict -> 606.485 bytes

    LPAQ 8e
    lpaq8e 7 -> 358267 bytes
    dict + lpaq8e -> 322625 bytes

    lpaq8e 7 fp.log fplog_lpaq8e_7.lpaq8e
    20617071 -> 358267 in 22.287 sec. using 390 MB memory

    paq8e 7 fp.dict fplog_dict_lpaq8e_7.lpaq8e
    5736001 -> 322625 in 7.426 sec. using 390 MB memory


    Slim 023d
    fplog_slim_o40_m1024.fb -> 343.637 bytes
    (the smallest file I could get for different orders)
    fplog_dict_prepr_slim_o9_m512.fb -> 319.601 bytes

    PPMonstr J
    fplog_ppmonstr_m900_o64.ppmm -> 355.722 bytes
    fplog_dict_preproc_ppmd_o10_m256.ppmm - > 321.429 bytes

    FP.LOG -order 10
    timer ppmonstr e -ffplog_ppmonstrJ_o10_m900.ppmdm -o10 -m900 fp.log
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.log:20617071 > 472084, 0.183 bpb, used: 13.6MB, speed: 1793 KB/sec
    Global Time = 11.232 = 00:00:11.232 = 100%

    FP.LOG -order 64 gives smaller output :
    timer ppmonstr e -ffplog_ppmonstrJ_o64_m1200.ppmdm -o64 -m1200 fp.log
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.log:20617071 > 355700, 0.138 bpb, used:621.0MB, speed: 913 KB/sec
    Global Time = 22.106 = 00:00:22.106 = 100%
    (PPMonstr reports 355700 bytes although the created file is 355722 bytes)

    FP.DICT -o10
    timer ppmonstr e -ffplog_dict_ppmonstrJ_o10_m500.ppmdm -o10 -m256 fp.dict
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.dict:5736001 > 321406, 0.448 bpb, used: 26.2MB, speed: 1061 KB/sec
    Global Time = 5.304 = 00:00:05.304 = 100%
    (PPMonstr reports 321406 bytes although the created file is 321.429 bytes)

    PPMonstr J outputs a smaller file using dict preprocessor; both compression time and memory usage decrease.

    FreeArc 0.50a (June 9 2008)

    timer arc a -mx fplog_arc5a5_mx_nr4 fp.log
    FreeArc 0.50 alpha (June 9 2008) updating archive: fplog_arc5a5_mx.arc
    Compressed 1 file, 20.617.071 => 527.623 bytes. Ratio 2.5%
    Compression time 1.76 secs, speed 11.696 kB/s. Total 2.11 secs
    All OK
    Global Time = 2.146 = 00:00:02.146 = 100%

    or a bit optimized :
    timer arc a -mdict:30m+ppmd:9:900mb fplog_arc5a5_opt fp.log
    FreeArc 0.50 alpha (June 9 2008) creating archive: fplog_arc5a5_opt
    Compressed 1 file, 20.617.071 => 489.353 bytes. Ratio 2.3%
    Compression time 1.12 secs, speed 18.356 kB/s. Total 1.22 secs
    All OK
    Global Time = 1.358 = 00:00:01.358 = 100%
    -> 489590 bytes in 1.35s !!

    Conclusion (only for FP.LOG !!)
    All tested compressors/archivers output a smaller file with DICT preprocessing. Speed is also increased, up to 4-5x faster for some compressors.
    Last edited by pat357; 22nd June 2008 at 01:39.

  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    649
    Quote Originally Posted by pat357 View Post
    BIT v0.2b
    timer bit02b a fplog_bit02b_mem9 -m lwcx -mem 9 -files fp.log
    -> 576.533 bytes in 26.498s

    timer bit02b a fplog_dict_bit02b_mem9_nr1 -m lwcx -mem 9 -files fp.dict
    -> 481.710 bytes in 9.110s
    Hmm...It seems BIT needs like that text filtering. Good job Bulat. You always write very good preprocessors. Thanks pat357 and Bulat.

  3. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    649
    I want to share some other tests with ENWIK8.

    BIT 0.2b (-mem 8) 22,190,736 bytes (271.289 seconds)

    DICT 52,812,125 bytes (11.316 seconds)
    BIT 0.2b (-mem 8) + DICT 21,798,670 bytes (156.977+11.316 seconds)

    DICT (-P) 53,897,083 bytes (9.042 seconds)
    BIT 0.2b (-mem 8) + DICT (-P) 21,522,217 bytes (159.678+9.042 seconds)

    Tested on AMD Athlon 64 X2 Dual 4200+ (2.2 GHz), 1 GB RAM, WinXP SP3.

+ Reply to Thread

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 411
    Last Post: 9th May 2012, 21:27
  2. Images PreProcessor - PrePNG
    By PAQer in forum Data Compression
    Replies: 3
    Last Post: 21st May 2010, 13:21
  3. Two dimentional Multimedia preprocessor
    By chornobyl in forum Data Compression
    Replies: 18
    Last Post: 7th October 2008, 17:54
  4. flzp, new LZP compressor/preprocessor
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 23rd June 2008, 18:24
  5. impresseed by RPE preprocessor
    By SvenBent in forum Forum Archive
    Replies: 6
    Last Post: 24th October 2007, 13:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts