Results 1 to 21 of 21

Thread: bijective fpaq0p_sh

  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Talking bijective fpaq0p_sh

    I get in this work moods I have thought about this alot but have often started and stopped. So today I worked on it. First of all its not about the code. I changed arb255 to become the bijective version. It will compress 99.999% of all files smaller then fpaq0p_sh. I left all the bells and whistles of arb255 alone and used 64 bit state instead of 32 which is what was used. The output looks nothing like fpaq0p_sh. But it should compress the last majority of files smaller than what fpaq0p_sh does. The method I used was to calculate P the way it was in original fpaq0p_sh and then convert it to a ONE ZERO count at which point it just feeds into arb255. Also included arb255 code. It may not be the best for errors since it was only slightly cleaned up for MinGW.

    book1 book1bwts book1unbwts book1bwt are all 768771 bytes long
    the first 3 files all have the same information and can be transformed to each other
    the last file book1bwt is missing information to get to the others. It needs an index
    so though it compresses smaller than the others sometimes one must not forget the file space needed for the index.

    That said arb255 compress all there files to 435125 bytes

    the other two compresss nonstationary and compress to various lengths
    sometimes smaller than what arb255 does and sometimes longer. However
    it is thought by most that for useful files fpaq0p_sh will compress better
    than arb255 and I have to agree that seems true.

    first the old fpaq0p_sh length and then the arb0p_sh_32 length
    book1 441,139 441,131
    book1bwts 241,458 241,450
    book1unbwts 443,487 443,479
    book1bwt 241,455 241,448

    Note arb255 beats fpaq0p_sh for book1unbwts
    Note though it looks like book1.bwt is better than book1bwts. It is not since you still need the index for the book1bwt. But for various files they alternate as to which is best


    However arb0p_sh_32 will most like beat any file normally encountered when compared to fpaq0p_sh. If you can find one let me now. Also it was quick and dirty if any errors let me know.
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    How does bijective AC compare to the classic algorithm speed-wise, from a theoretical and practice standpoint?

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Well, any non-carryless bitwise rc/ac can be fixed up to be bijective.
    It mainly affects the handling of head/tail, also maybe some random permutation of probability intervals
    to stabilize the mapping (otherwise you're likely to get a few GBs by decoding a 100-byte zero file).
    So I guess it would be the same or a little slower.

    Its also possible, but a little harder for bytewise rcs, because bijectivity isn't compatible with little
    quirks like "+1" in "(qword(cumFreq)*range)/totFreq + 1", which makes it more troublesome
    to correctly implement "GetFreq" ( cumFreq <= (qword(code)*totFreq)/range < cumFreq+freq ).
    Btw, the usual version with rounding ( (range/totFreq)*cumFreq ) is also no good because it
    wastes a small interval near 1.0.

    There're actually some applications for this - coding of short records, coding with restricted alphabet,
    steganography, error correction.
    I can't say that David's work is related to these, though ;)

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by willvarfar View Post
    How does bijective AC compare to the classic algorithm speed-wise, from a theoretical and practice standpoint?
    What is coded is much slower than needed however even if I did it to optimize speed it could never theoretically match the underlying AC method. From a pratical stand point there is no free lunch. To squeeze those extra few bytes is going to cost in complexity and time.

    However to better match the underlying compressor which used 32 bits for high and low I could have used 32 But I hate going to smaller than 64 bits.

  5. #5
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Shelwien View Post
    Well, any non-carryless bitwise rc/ac can be fixed up to be bijective.
    It mainly affects the handling of head/tail, also maybe some random permutation of probability intervals
    to stabilize the mapping (otherwise you're likely to get a few GBs by decoding a 100-byte zero file).
    So I guess it would be the same or a little slower.

    Its also possible, but a little harder for bytewise rcs, because bijectivity isn't compatible with little
    quirks like "+1" in "(qword(cumFreq)*range)/totFreq + 1", which makes it more troublesome
    to correctly implement "GetFreq" ( cumFreq <= (qword(code)*totFreq)/range < cumFreq+freq ).
    Btw, the usual version with rounding ( (range/totFreq)*cumFreq ) is also no good because it
    wastes a small interval near 1.0.

    There're actually some applications for this - coding of short records, coding with restricted alphabet,
    steganography, error correction.
    I can't say that David's work is related to these, though
    Short file and uses for encryption have always been my main driving force.
    In fact years ago I asked Matt if he wanted to use a bijective coder for one
    of his fpaq programs. What it would do on decompression is check the ending
    when it ends to see if it matchs his coded length. Or if it starts to decompress
    past what he has a coded length then you could stop and write an error message

    One does not normally try to decompress text or files of all zero bytes. I have not tested this with my version of bijective fpaq0p_sh. However I did test the uncompress of book1 with it. I thought it got to a little over 6 million bytes and then it compressed back to book one. The fact it can uncompress any file and then compress back is only a fact of what bijective compresses can do it shows no space wasted in compressed file. Its not something that you would do every day. One could always limit the size of a decompressed file and cause it to abort or give an error message.

    Go ahead decompress a file of 100 bytes of zero or ones it will not likely blow up to large even though I have not tested this. I used what I called my stability out put so that normal looking text does not cause this blow up. However is you hit it with the right random looking stream it will decompress to a massive file.

  6. #6
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I don't want to encourage crazy thinking but here's a crazy thought anyway:

    1) if you can decompress any data and recompress it and get the same input because its bijective then...

    2) you could 'decompress' bijectively a source document; you get a lot of bytes

    3) then reverse it

    4) and then apply some encryption to it

    5) and then compress it; you likely get something a bit bigger than the original input

    6) for all its inefficiencies it would be extremely slow to brute force any password on it

    7) although perhaps not as good tradeoff as encrypting normally then streaming the output backwards through it

    encryption should obviously only be invented by experts
    Last edited by willvarfar; 14th November 2011 at 11:25.

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by willvarfar View Post
    I don't want to encourage crazy thinking but here's a crazy thought anyway:

    1) if you can decompress any data and recompress it and get the same input because its bijective then...

    2) you could 'decompress' bijectively a source document; you get a lot of bytes

    3) then reverse it

    4) and then apply some encryption to it

    5) and then compress it; you likely get something a bit bigger than the original input

    6) for all its inefficiencies it would be extremely slow to brute force any password on it

    7) although perhaps not as good tradeoff as encrypting normally then streaming the output backwards through it

    encryption should obviously only be invented by experts

    Of course your right see how well it worked for the Germans buy only having there elite experts working on encryption Enigma worked so well for them by only using EXPERTS.

    Also glad you seem to understand better than Claude Shannon that Unicity Distance and low entropy per byte coding really means so little in encryption. Thanks for the education.

    Bijective compression of long files and repeated whole file use of BWTS or UNBWTS with intervening whole file passes with bijective encryption would be something the learned experts should stay away from.

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    It's just me or David's English is hard to understand?

    David, do you mean that willvarfar's method is good or bad?

  9. #9
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Well I usually understand what I wrote if its fresh. But after a day to two it becomes less clear. I don't think I know enough of willvarfar's method to like it. Its not compete enough. I don't like 4 then 5. As a general rule compress then encrypt. Of course I am sure one could find exceptions. I don't trust so called encryption experts.

    I think of most encryption experts as full of hot air. Since its hard to test what they do till a big war or such. I trust more compression experts since you can check the work with your own files. Besides with encryption governments have a vested interest to mislead and lie to the public keeping them in the dark.

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by willvarfar View Post
    I don't want to encourage crazy thinking but here's a crazy thought anyway:

    1) if you can decompress any data and recompress it and get the same input because its bijective then...

    2) you could 'decompress' bijectively a source document; you get a lot of bytes

    3) then reverse it

    4) and then apply some encryption to it

    5) and then compress it; you likely get something a bit bigger than the original input

    6) for all its inefficiencies it would be extremely slow to brute force any password on it

    7) although perhaps not as good tradeoff as encrypting normally then streaming the output backwards through it

    encryption should obviously only be invented by experts
    Wouldn't it be the same if you removed steps 2 and 3? Or do you mean to encrypt the "decompressed" file? In that case, the decompressed file could be impractically huge. I mean, with any decent model a 1 TB file of zero bytes would compress down to a few bytes.

    Besides being impractical, it would weaken the encryption because the attacker has more data to work with and knows that a correct decryption should have low entropy (even if the original data was random). The correct way to make a brute force key search hard is to compress (preferably bijectively to reduce known plaintext) and then encrypt.

    As for "experts", the reason for using standard encryption algorithms like AES is that they are well tested, because there is no other way to "prove" that an algorithm is secure. The only way to know is for lots of people to attack it and fail. If you design your own algorithm, that doesn't happen. There is a long history of home brewed crypto that looked bulletproof to the developer, but was quickly broken once released. Many brains are smarter than one.

    As for conspiracy theories about the NSA or CIA having secret backdoors and such, AES was designed by the Dutch (it was originally called Rijndahl) and analyzed by lots of crypto experts that tend to be libertarian, distrust the government (especially the U.S.), and have a strong incentive to publish any weaknesses they find because it would enhance their reputations.

  11. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Yup, adding redundancy is a bad idea unless that redundancy is pure noise. I think we can make cipher breakage harder by applying a lossy text compression. For example we can omit/ introduce some letters or use wrong letters sometimes. Also we can permute the letters order a little. It was shown that human brain doesn't care much of the order of inner letters of a words is the text is being read quickly, at least if the positions of letters doesn't change much (say not more than 2 - 3 positions).

  12. #12
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I was trolling a bit, I apologise

    it would be exceedingly slow to brute force though

  13. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Yup, adding redundancy is a bad idea unless that redundancy is pure noise.
    I think you meant "extra information"

    Quote Originally Posted by Piotr Tarsa View Post
    I think we can make cipher breakage harder by applying a lossy text compression. For example we can omit/ introduce some letters or use wrong letters sometimes. Also we can permute the letters order a little. It was shown that human brain doesn't care much of the order of inner letters of a words is the text is being read quickly, at least if the positions of letters doesn't change much (say not more than 2 - 3 positions).
    Lossless compression breaks such properties too. And encryption algorithms are designed to be robust against known plaintext attacks anyway, so removing basic information this way is meaningless.

  14. #14
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I think you meant "extra information"
    I meant: regular structure.

    Loosening the constraints for valid looking text will cause much more false positives on single blocks when doing brute force attack.

    How does known plaintext attacks relate to my proposal? In my proposal the attacker doesn't know in advance where lossess were introduced by the lossy compression algorithm (assuming that the lossy compressor is not deterministic of course - otherwise using lossy compression would have less sense).

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,612
    Thanks
    30
    Thanked 65 Times in 47 Posts
    How does known plaintext attacks relate to my proposal?
    The reasoning is that if knowing full plain text doesn't help, then knowing its language's vocabulary and grammar doesn't help either.

    Loosening the constraints for valid looking text will cause much more false positives on single blocks when doing brute force attack.
    But this...well, it's just security by obscurity. If the cracker knows that you use such techinques, they are useless. But if they don't...well, I think it's extremely rare case, but it might be somewhat helpful. Anyway using lossless compression would help just as well if not better in such case.

  16. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I haven't assumed obscurity anywhere. The cracker can know the nondeterministic lossy compression algorithm (but of course can't predict the random decisions performed by that algorithm). My algorithm is based on the fact that brute force crackers try all possible keys and then decompress/ decrypts/ de-whatever the data. If the recovered data looks like "#$^G3wg#q3#$" then it is assumed to be wrong and rejected (so the brute force attack must continue). I think that currently even strings like "Ti&hs_is a* e7axplme tex(t" are rejected but a human can certainly retrieve the message "This is an example text" from that lossy compressed source.

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by m^2 View Post
    I think you meant "extra information"


    Lossless compression breaks such properties too. And encryption algorithms are designed to be robust against known plaintext attacks anyway, so removing basic information this way is meaningless.
    Of course, bijective compression, or any compression, only helps against weak ciphers. But there is no reason to use weak ciphers when we have strong ciphers that are resistant to chosen plaintext attacks. If you are worried that the cipher you are using might be broken in the future (it happens), then encrypt it twice with two different algorithms. But really, there are much more practical risks like choosing an easy to guess key or a poor random number generator, or having the key stolen, or leaving a plaintext copy around somewhere, or many other things that have nothing to do with the strength of the algorithm.

  18. #18
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I find the concept that bijective fpaq0 can take any input and decompress it into some gibberish that it can then compress it back into the original input fascinating. Obvious only in hindsight. I haven't thought of a thing to use it for, however.

    Until I thought a little about password cracking. Password hashes are designed to be slow to compute simply to slow crackers down; they use layer upon layer simply as a speed-break, rather than as stronger encryption as such.

    And I imagined an attacker having to bijectively decompress every brute force password attempt to then get encrypted text to brute force a second time to be a fun speed-break of a similar nature; it could use a very large amount of diskspace at least!

    Although the idea of simply running back through an encrypted message a second time - rather more like password hashing speed breaks - is probably a far saner scheme.

    I'm just an armchair reader of popular science accounts of cryptography but even I can debunk my own ideas. However, I'll leave that entertainment as an exercise to the readers
    Last edited by willvarfar; 15th November 2011 at 23:31.

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I haven't thought of a thing to use it for, however.

    I already said how to use it in this thread.
    One example is this: http://nishi.dreamhosters.com/u/marc_v1.rar
    It essentially runs rc _decoding_ on given input data to produce its restricted alphabet output,
    then rc encoding is used to restore the original data.

    > Until I thought a little about password cracking

    In case of password cracking, the same approach can be used for smart password bruteforcing,
    based on statistics of known passwords.

    In general, the same approach can be used for any kind of data generation.

    > having to bijectively decompress every brute force password attempt

    That doesn't really make any sense, because normally to slow down the bruteforcing you
    just add more iterations to the key function.

    Bijectivity is useful in another place - its necessary to compress the data before encryption
    to prevent plaintext attacks based on data redundancy.
    But all normal (not bijective) compression methods add even easier options for plaintext attacks -
    for example, there're lots of constraints in a deflate block header which can be used to check
    whether the password is valid.

    Still, the optimal use of compression with encryption implies that we'd have to decode
    the whole file to test a password, thus the bijectivity of entropy coding becomes kinda irrelevant,
    if we'd be able to check the plaintext after extracting a few 100s of bytes anyway.

    So BWT compression is probably the strongest in encryption sense (while LZ is weakest),
    because it forces you to undo the postcoding and BWT to access the plaintext.

  20. #20
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Shelwien View Post
    >
    ...

    Still, the optimal use of compression with encryption implies that we'd have to decode
    the whole file to test a password, thus the bijectivity of entropy coding becomes kinda irrelevant,
    if we'd be able to check the plaintext after extracting a few 100s of bytes anyway.

    So BWT compression is probably the strongest in encryption sense (while LZ is weakest),
    because it forces you to undo the postcoding and BWT to access the plaintext.
    Yes I think we can almost agree. The idea of a longer Unicity Distance would imply that the attacker forced to deal with almost the whole file before there is even enough information for a solution. Part of hiding the info is that when compression is used then the attacker can not gain info by illiminating some of possible solutions by noting the that certain passwords can be impossible due the compression used. Why give the attacker that info. like wise BWT is the obvious answer except for two problems. First of all most files even 100 bytes long its impossible to do an UNBWT on the data since most data files could not be the result of a BWT. Then there is the obvious index problem. It has to be a number in the range of the lenght of the file. Luckly BWTS does not suffer from there two problems.

  21. #21
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I my not be clear. I feel its best to have different methods for different file types.

    But clearly for a file of from a few thousand to a million bytes. If one pruned the tree in the bijective version of fpaq0p_sh and preset the values of P(,) from sample files. If the compress done and you exclude chosen plan text file attack ( you defeat chosen plain text file by injecting random data front and back of file ) At this point after the compression do the Scott Transform. The data will look about the same before and after this transform but the information will be honestly spread through the entire file which does not occur in the CBC and other similar advertised modes of encryption. At this point use AES if you wish or just use Matt Timmermans BICOM I trust it.

    Actually in the old days I used BICOM alot it a bijective ppm Rijndeal full block compressor.

    I am still surprised its on the web it was good

    http://www3.sympatico.ca/mt0000/bicom/


    His code does 4 things

    1) compress with bijective ppm
    2) compress with bijective ppm and then encrypt using bijective full block Rijndael with passord of your choice
    3) reverses 1 by uncompressing
    4) reverses 3 by decrypting and then decompressing

    Now if I was doing that today I would want to compress and then do a BWTS or Scott Transfrom.

    you could do a 1)
    followed by scott transform or bwts or unbwts or similar
    then do a 3)
    now do a 2)

    the result is the same due to the bijective nature of the code
    as if you did the bijective ppm followed by the Scott Trandform and then full blown Rijndeal

    Of course since your doing 3 calls of BCOM it might be stronger just to pick 3 seperate passwords
    and do 2) bwts 4) 2)

    This should be good enough for most casual encryptions in my view.

Similar Threads

  1. BIJECTIVE BWTS-DC-FIB
    By biject.bwts in forum Data Compression
    Replies: 4
    Last Post: 13th February 2012, 02:36
  2. Replies: 0
    Last Post: 30th August 2011, 16:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •