Page 1 of 2 12 LastLast
Results 1 to 30 of 42

Thread: Random Data Compression - Why people are claiming or trying to develop?

  1. #1
    Member
    Join Date
    Jun 2014
    Location
    Bangalore
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Random Data Compression - Why people are claiming or trying to develop?

    What will be the application and implication of the new algorithm if any one found?




    This is to know why often people are claiming magical compression as if they foud without


    showing solid proof or publishing exe.




    What they will get if they solved?

    regards,


    Chandrasekaran B

  2. #2
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Nothing

    Quote Originally Posted by bcs2k3 View Post
    What will be the application and implication of the new algorithm if any one found?This is to know why often people are claiming magical compression as if they foud without showing solid proof or publishing exe.What they will get if they solved?regards, Chandrasekaran B
    They will get nothing since it is impossible.And forget all your excuses about some super genius solving it. It can't be done period.And if you want to claim otherwise .... PROVE IT!

  3. #3
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by bcs2k3 View Post
    What they will get if they solved?
    The fountain of youth?
    A perpetual motion machine?

    I dunno.

  4. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by bcs2k3 View Post
    What will be the application and implication of the new algorithm if any one found?
    Nothing, it shall be stopped, to big (economical/security) impact.

    Quote Originally Posted by bcs2k3 View Post
    What they will get if they solved?
    Regret, if not stopped voluntarily, then forced till death. In best case government (via multinational) buy patent/inventing (including a do not talk hand sign) to let it disappear.
    Last edited by Sportman; 26th June 2014 at 12:53.

  5. #5
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by Kennon Conrad View Post
    A perpetual motion machine?
    I dunno.
    http://www.dailymotion.com/video/x1z...-to-clear_tech

  6. #6
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by bcs2k3 View Post
    What will be the application and implication of the new algorithm if any one found?




    This is to know why often people are claiming magical compression as if they foud without


    showing solid proof or publishing exe.




    What they will get if they solved?

    regards,


    Chandrasekaran B
    Well first of all they may get sued since I think there are enough patients that corporations that own them would sue. So they stand to lose a lot of money fighting in courts. Now that is if they claim to have such a compressor. I don't think the lawyers involved could care less if such a thing is possible or not. In fact in modern day law truth and validity really have no meaning. Perception is all.

  7. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by bcs2k3 View Post
    What will be the application and implication of the new algorithm if any one found?




    This is to know why often people are claiming magical compression as if they foud without


    showing solid proof or publishing exe.




    What they will get if they solved?

    regards,


    Chandrasekaran B
    Nothing, because the universe would instantly destroy itself.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Most people that believe random compression is possible don't have enough math background to understand the counting argument. So they ignore or don't believe the proof. Understanding the proof requires some high school algebra.

    Nevertheless I urge the OP to learn the basic principles of data compression to avoid being booted from the forum for posting nonsense. http://mattmahoney.net/dc/dce.html

  9. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Most people that believe random compression is possible don't have enough math background to understand the counting argument. So they ignore or don't believe the proof. Understanding the proof requires some high school algebra.
    How about a game to practice random compression skills?

    N-1 Questions

    It's like 20 questions. One person picks N binary digits, such as by flipping a coin. These are secret. The other person is allowed to ask N-1 yes-or-no questions, then guess all N secret digits.

    The game for N=1 is especially simple, because you skip the questions altogether.

    Anyone that can win consistently will receive a check for $1 billion USD.
    Last edited by nburns; 27th June 2014 at 07:41.

  10. #10
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    I guess it all depends on the meaning of compression. If data is truly random, it contains no useful information other than the file size, so it should be highly compressible. You could regenerate random data knowing only the file size. Yeah, it would be different, but if both are random, who really cares except lottery players?
    Last edited by Kennon Conrad; 27th June 2014 at 08:26.

  11. #11
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    who really cares?
    The person whose encryption key that was?

    Random data has its uses.

    One angle is that in the field of lossy compression, approximations using prng are perhaps under-explored. There are lots of genetic algorithm image compression schemes that give startlingly good artistic impressions of images in almost no space, but rather large quantitates of time. For hobby programmers with a lot of computer time, spending weeks making a slightly smaller Lena may be worth a gripping blog post.

  12. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I guess it all depends on the meaning of compression. If data is truly random, it contains no useful information other than the file size, so it should be highly compressible. You could regenerate random data knowing only the file size. Yeah, it would be different, but if both are random, who really cares except lottery players?
    I advanced a proposal for a lossy white noise compressor on an older thread that was based on the same principle: the compressor deletes the original, the decompressor generates white noise. If a person can tell the difference, then it wasn't white noise.

  13. #13
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    http://encode.ru/threads/2006-Hello-...e-an-algorithm
    I wrote a second edit in the first post about 'random data' and 'uncompressible' data.

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Are you talking about how to detect random data or about how to compress it?

  15. #15
    Member
    Join Date
    Jun 2014
    Location
    Ro
    Posts
    19
    Thanks
    4
    Thanked 3 Times in 3 Posts
    What those people claim is a contradiction. An extremely simplified example of "true is false". Claiming about data that it is random has it's implications. One of those implication is that you shouldn't be able to describe that data in a shorter way than to actually output the data. If you get to compress the data, the previous statement about the data being random is nullified. The data "looks random".

  16. #16
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by AlexDoro View Post
    What those people claim is a contradiction. An extremely simplified example of "true is false". Claiming about data that it is random has it's implications. One of those implication is that you shouldn't be able to describe that data in a shorter way than to actually output the data. If you get to compress the data, the previous statement about the data being random is nullified. The data "looks random".

    I think random data does not exist.In my opinion the data is not capable to be compressible by any known pattern . But it's not uncompressible.Random data is generated by computer frequency and some complex factors.If you can find these factors,you can compress the data .If there's an algo to create random data,probably there are an algo to compress the random data.


    Believing in randomness is believe in coincidences. I don t think coincidences exist.
    Last edited by lunaris; 16th July 2014 at 19:32.

  17. #17
    Member
    Join Date
    Jun 2014
    Location
    Ro
    Posts
    19
    Thanks
    4
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by lunaris View Post
    I think random data does not exist.
    Here: http://www.ciphersbyritter.com/RES/RANDTEST.HTM
    Read about some of the attempts to define random data, find a randomness test, and what is pseudorandom data and data that looks random.

    And let's not forget random.org. They post how they test their numbers. http://www.random.org/analysis/
    Last edited by AlexDoro; 16th July 2014 at 19:50. Reason: added random.org

  18. #18
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Matt Mahoney View Post
    Are you talking about how to detect random data or about how to compress it?
    I'm saying that 50% of data in the data spectrum is 'compressible' and 50% is 'uncompressible'.
    That post describes in general what lies in the 'compressible' and 'uncompressible'.

    Random connotes that any file of some characteristics like length can be generated using its own method like hardware info, and from all the potential files, in general 50% can be compressible/uncompressible, for data that has enough repetition and not (potential and compressed file version) in a way. This takes into consideration specifically skipping areas which are 'uncompressible' and having 'compressed' data concurrently within the result.

    (EDIT)
    Realistic random implies a 'random' sequence in a set length if decided, or other constraint. A computer requires some kind of information to apply some math to have a 'random' generated string, like hardware information used with an algorithm.

    Random is 'doable' provided string is 'compressible'.
    Last edited by SoraK05; 16th July 2014 at 20:09.

  19. #19
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    In the context of information theory, a good way to think of randomness, IMO, is as things that you don't know. In other situations in math and CS, randomness may be significant for other reasons, but in information theory and compression, I think of it like a placeholder for information not yet revealed. Sort of like a playing card that's face down. Since you don't know anything about it, you can't compress it.

  20. #20
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    what's always funny to me, is that all these people claiming it, show absolutely now understanding of compression. (well excluding for BARF off cause )

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The vast majority of strings are random, meaning that for a chosen programming language, there is no program shorter than the string that outputs it. This is easy to see because for any length N, there are 2^N possible strings and only a small (relative to 2^N) number of valid shorter programs producing N bits of output.

    There is also no algorithm that tells you in general which of those 2^N strings are non random. If such an algorithm existed, then you could write a small program that output the first random string of length N, where N is some big number greater than the length of your program. That would contradict your assumption that the algorithm exists.

    A practical example would be an encrypted string of all zero bits. It would look random, and the only way you could tell the difference would be to guess the key.

    In practice, a lot of strings such as text and video are not random because they are generated by (in theory) computable processes. This is why compression works, when in theory, it shouldn't.

    It is sometimes still useful to have fast tests that usually detect random data. zpaq uses this test to store the data without bothering to try to compress it. The test is to look at the order 1 statistics left over from the fragmentation algorithm to see if they differ from random. The test is not perfect (it can't be), but it is usually an acceptable tradeoff of compression for speed.

  22. #22
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by SvenBent View Post
    what's always funny to me, is that all these people claiming it, show absolutely now understanding of compression. (well excluding for BARF off cause )
    Barf in my opinion convert random data in text format.Because it adds an extra extension (text) on file.
    Text format have a lot of possibilities and are very compressible.


    Well , I don t like random related things because they are almost useless in data compression and computer science..Randomness is against synchronicity .Randomness depends on probability (probability is not an absolute concept. It's a estimation of possibilities.Probability does not define the solution.There's no probability in real world,because in real world you have the complete information.With complete information you have 100% predictability,although predictability is not perfect too)
    Last edited by lunaris; 17th July 2014 at 02:51.

  23. #23
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by lunaris View Post
    Barf in my opinion convert random data in text format.Because it adds an extra extension (text) on file.
    Text format have a lot of possibilites and are very compressible.
    I views barf as more of a joke attempt that was used to point out a flaw in badly written rules for a compression text. But basically its just moving data from file to filesystem (i might as usually be totally) wrong)

    Data hiding tricks has been used an a few other "compressors" to cheat in compression results

  24. #24
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by SvenBent View Post
    I views barf as more of a joke attempt that was used to point out a flaw in badly written rules for a compression text. But basically its just moving data from file to filesystem (i might as usually be totally) wrong)

    Data hiding tricks has been used an a few other "compressors" to cheat in compression results
    Well I don t know.The extension on a pure random file must not exceed the size of 1 byte/extension(I don t know the extension size on windows filesystems.Maybe 3 bytes?). And/or use a powerful text compressor.Still I don t know if it works.
    Last edited by lunaris; 17th July 2014 at 03:05.

  25. #25
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The vast majority of strings are random, meaning that for a chosen programming language, there is no program shorter than the string that outputs it. This is easy to see because for any length N, there are 2^N possible strings and only a small (relative to 2^N) number of valid shorter programs producing N bits of output.

    There is also no algorithm that tells you in general which of those 2^N strings are non random. If such an algorithm existed, then you could write a small program that output the first random string of length N, where N is some big number greater than the length of your program. That would contradict your assumption that the algorithm exists.

    A practical example would be an encrypted string of all zero bits. It would look random, and the only way you could tell the difference would be to guess the key.

    In practice, a lot of strings such as text and video are not random because they are generated by (in theory) computable processes. This is why compression works, when in theory, it shouldn't.

    It is sometimes still useful to have fast tests that usually detect random data. zpaq uses this test to store the data without bothering to try to compress it. The test is to look at the order 1 statistics left over from the fragmentation algorithm to see if they differ from random. The test is not perfect (it can't be), but it is usually an acceptable tradeoff of compression for speed.
    The concept of random that you describe here seems to center around Kolmogorov complexity and computability. It would require some amount of faith to believe that random numbers really exist, because you can't see them (seeing one seems to trigger the contradiction). To me, it seems simpler if you don't propose that some numbers are intrinsically random; I'm unconvinced that there is any construction that would make that concept well-defined and describe a consistent subset.

    It's simpler just to define random as unknown, for purposes of information theory.

    I think random describes other concepts in other contexts, too. One concept is the idea that a random function contains all other functions within it. That's what makes it good for things like quicksort.

  26. #26
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Algorithmic (Kolmogorov) randomness is nice because it is absolute and doesn't depend on what you know.

    BARF is a joke program intended to debunk claims of random compression. But it is also a real compressor. It uses 3 different algorithms:

    1. It tests if the input is one of the Calgary corpus files. If so, it assigns a 1 byte code to indicate which one.
    2. If not, it uses LZ77 with one byte codes indicating either a 1..32 byte literal or a match offset of 1..224. The match length is always 2. Some files can be compressed more than once like this (for example enwik9: http://mattmahoney.net/dc/text.html#7594 ).
    3. If LZ77 does not reduce the file size, then it removes the first byte of input and encodes it in the file name, making the name 4 characters longer.

    barf.exe has to contain a copy of the Calgary corpus in order to decompress. It is about 1 MB compressed with UPX.

    The source code contains a utility to convert a set of files into C++ source code strings that are then compiled with the rest of the program. I used this to make another version (barfest.exe) that compresses the million random digits file to 1 byte (or 0 bytes in 2 passes).

  27. #27
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    When looking at the post I made with the algorithm, it shows a strategy of determining compressible/uncompressible areas.

    There are 2 possibilities for data, being 0 and 1.
    Each can store 1 piece of data. Where you have repetition of only one, you can simply use one of the two possibilities.
    As data expands, the more repetition determined, the less potential combinations of bits used to store each piece of recognized string info. This can be static, like shrinking each to its own to a series of static length strings, or in a variable manner using prefixes which will not be confused reading left to right (like code tree).


    There are 4 iterations of strings/lengths of data.
    They are mentioned in the initial area scans, of being degrees of combinations of the 4 possibilities of having 2 possible bits next to each other.
    The idea is where there is presence of 0 and 1 in data, one looks for prevalence of the 4 possibilities of combining them when looking next to each other.

    One requires a specific amount of prevalence in order to represent that in less data and overall reduce the size of the data by this, giving less prevalent data more info for example to be used and also take into account the minimal database/code tree info.

    Iteration of having 1 and 2 combinations in sequence only will always be 100% compression since either case will always be represented by less data than current (can fit into 1 bit each of the 2 possible).
    Iteration 3 and 4 are possible in a ratio of about 75/25, requiring less/more prevalence to be in a range of being able to compress.
    That also takes into account for 4 that it is data which is not classified at all as being able to be identified by iteration 1/2/3 for their presence.


    When one compresses data, in general with something like variable code tree prefixes, the result of the data will normally end up having a balance of the 4 combinations of 2 bits being 25% each in general, having no prevalence of one over another, and thus not having data to shrink and compensate with the remaining.

    From the spectrum of data, in principle based on this notion of presence of prevalence for having the ability to store a code tree and then shrink more repetitive / prevalent data and gain compression is 50/50.

    The 100% from iteration 1, 2, 75% from 3 and 25% from 4 all constitute the exact 50% of the file spectrum with prevalence being able to be 'compressed'. The other data of 25% 3 and 75% 4 has balanced combinations data and not enough prevalence/repetition to compress.


    When you compress a file in general with a tool like zip using a code tree, the result of the shrunk data will fall into data of the 'uncompressible' where the data is balanced and dispersed. However, from a source file, one can have a 'looser' or 'tighter' compression, meaning for the exact same data, you can make the source smaller by capturing more repetition/making more use of the available prevalence of data.

    If you have a tool doing chunks of data for example, it may capture the repetition but not be as effective as doing the file on the whole. The result of the chunks will be larger 'uncompressible' data than the one on the whole being less 'uncompressible' data.
    If you have a tool using static bits like 8 bit combinations and checking their frequency, depending this can be less effective for the file than 4 bit combinations for example. The 8 bit one will have a larger 'uncompressible' result than the one using 4 bit combinations which will be less 'uncompressible' data.


    You have different compression strengths in tools based on their string detection capacity for example and in general their ability to make use of all prevalence available.

    When you compress a file, depending on its makeup, it is possible from the shuffling result and the content of the data that it can recompressed.
    You can find that zipping a file multiple times can reduce its size. However, one this extra data is exhausted, the file will always become larger and larger.

    If one uses a stronger zip level, the result of this repeating process will have a smaller file even with repeats before becoming bigger.
    Zip doesn't necessarily have the capacity to ignore writing if no compression is confirmed.


    The capacity to compress from available prevalence is what allows for capturing the entirety of the prevalence available, ensuring entire capturing of the full 50% possible in the file spectrum.
    (the below percentages are not exact, it is example)
    If a compression algorithm is only looking at 8 bit patterns and its frequency on a code tree, it can only capture like 10% of the possible 50% compressible files in the spectrum if looking to pair the 50% on either side.
    Where it is dynamic to getting exact variable bit strings to suit the file for its data, like 5 bit, 3 bit, 3 bit, 2 bit, 1 bit, 1 bit for example, it is much more specific to suit the file and its alignment from available prevalent data. This can increase the possible files the tool can capture to like 20% from 10% of the possible 50% range of compressible files.

    Where repetition in a row is captured as a number instead of a string of repeating variable code tree prefix values, in conjunction to variable string detecting it can get it up to like 35%.

    The more specific your algorithm is to capturing prevalence, your tool is capable of the 50% available possible compressible data strings to become the 50% available possible uncompressible data string.
    You can get cloned data, repeat a process to layer and encapsulate, inverse/mirroring for example.


    To bypass that part of repeating a process like rezipping where the data can be compressed from realigning can be done in advance by moving/data in advance.

    The algorithm I have with all its bells and whistles should be able to accurately capture only specific areas in the file and layer/encapsulate these parts so only these specific parts will extract temporarily and decode, so that 'compressed (one layer)' + 'layered (any amount)' + 'raw' can all coexist and be recognized.
    The plus to this is immediate extraction of the data, with some temporary decoding.

    Moving all the data in advance bypasses the internal encapsulating from having the data all be compressed in go, and can make up for a minute amount of compression from smaller code tree/header/database data and require more processing to have the data be a form readable from left-to-right.


    Taking this into account, considering what the algorithm in the post is capable of, such as being specific to areas, layering and concurrency (compressed + raw together within), variable strings, cloned data, repetition in a row as a number, inverse / mirroring I would say that it is technically capable of 49% of the compressible files in the spectrum and be read left-to-right.
    It misses coding to move data in advance to skip/reduce the layering part and make up for this by smaller header/database info and slightly smaller file also.



    Random connotes any file, and can be part of the 50% spectrum of compressible/uncompressible based on the prevalence of data to find and compress data based on the capability of the algorithm used.
    Where there is an algorithm capable of all 50%, then a random file has a possibility of being in either 50% spectrum, and have a 50% possibility to compress
    It is up to whether the file is capable of being compressed in general in the appropriate spectrum from its prevalence/repetition, and if the algorithm can capture it.


    If you have 'uncompressible' data like the result of a compressed file, adding data to is counter productive, and calculating alterations of the data will also be counter productive, as the result of having that additional data and compressing with the tool capable of all 50% of the files will have a larger result.

    Prevalence / repetition is required for the transformation/shrinking, where the source is in the compressible 50% range.


    EDIT:
    The only thing you can do on the 'uncompressible' data, which can be obtained at random, is to perform a calculation to 'extract' it as though it were using looser constraints in its algorithm like restoring it to an original form, then compressing it at a thorough constraint setting like one capable of the 50% possible options.
    This is like taking a PNG, extracting it, and the recompressing it to become smaller.

    With all the settings and adjustable options in the algorithm, this is possible and does require multiple steps to 'fake' extracting this data and then 'fake' recompressing it using a stronger algorithm.

    This notion is limited but doable, as a 'means to compress random/uncompressible' data, relying on pretending per se it is compressed with a poorer algorithm and you are recompressing it with a stronger one.
    Last edited by SoraK05; 17th July 2014 at 13:30.

  28. #28
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    It appears that you are suggesting the use of an iterative partial O2 bit context model with explicit enables and disables of the model. I think your compression method will not compress as well as established programs that use higher order context models. I also think there has been significant research into detecting changing data characteristics, but others are more qualified to speak about that than me.

    If you have not read this online book, you might want to take a look. It has helped me a lot. http://mattmahoney.net/dc/dce.html

  29. #29
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    I know it can internally layer data where multiple compression attempts are done, and compared to results like from RAR and 7z, it should be smaller than even DEFLATE which was much closer than RAR/7z. I am aware RAR/7z use looser constraints and chunks, but having variable bit strings used than something set like byte+frequency, being specific to areas in the file and piecing the result after the intense process, getting repetition in a row as a number and others should make this very compact.

    Using this as it is on a file of just 0 bits for GBs of data should be much smaller than most stuff out there (immediate string and repetition directly as a number), and has an approach of preprocessing and assembling to have a compact file for other data as well.

    Like mentioned, it should be able to pick up specific strips in a file which are 'uncompressible' that are able and do these than work on the file as a whole, like looking for byte patterns and having distribution affected overall where if the one area is done it can be accounted. An example is a specific area prevalent of just 0 bits, and the rest of the file is generally 'random' / 'uncompressible', and a scan for byte+frequency will have this section overshadowed. This should get that section accurately and distinguish raw/compressed to just about specific offsets than in a method using chunks.

    Anyway, this isn't necessarily the thread to discuss this compression
    Thanks for link also.

  30. #30
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Random numbers/data is equal to irrational numbers? (like pi)

    "Irrational" numbers are compressible. Random numbers probably can be compressible too.

Page 1 of 2 12 LastLast

Similar Threads

  1. Euler's Number Triangle and Random Data
    By BetaTester in forum Data Compression
    Replies: 55
    Last Post: 19th February 2013, 05:02
  2. Random reads vs random writes
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 22
    Last Post: 16th May 2011, 09:58
  3. Sometimes data look like random... here's an interesting file:
    By Alexander Rhatushnyak in forum The Off-Topic Lounge
    Replies: 29
    Last Post: 25th December 2010, 04:05
  4. Replies: 11
    Last Post: 18th August 2008, 21:02
  5. Random Data Question.
    By Tribune in forum Data Compression
    Replies: 7
    Last Post: 13th June 2008, 19:30

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •