Page 1 of 3 123 LastLast
Results 1 to 30 of 61

Thread: How much compress text worth ?

  1. #1
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts

    How much compress text worth ?

    I develop a algorithm can compress text from 10000 letters to 9 letters ,it can decompress so.But useless for me , i want to sell it .

  2. #2
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    80
    Thanks
    22
    Thanked 3 Times in 3 Posts
    What kind of text? Are there many repetitions? If so, it´s easily compressible.
    If your file is random, then you are out of luck. But random content is compressible at least to few percent, but not that big as you have mentioned. But never say never - I am working on my custom data compression software that will be able to handle ANY filetype and compress it at least to 90% losslessly, but it will be terribly slow.

    Could you post some screenshots of your algorithm or at least compressed sample? Maybe then we can tell you more about it and also we can help you to compress it much better.

    Thanks.
    Last edited by CompressMaster; 9th May 2019 at 21:05. Reason: small typo

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    You don't need to sell it directly.
    Just apply to http://prize.hutter1.net/ or http://mailcom.com/challenge/ or https://marknelson.us/posts/2012/10/...turns-ten.html .
    There're also plenty of other contests where you can advertise your work.

  4. #4
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Sorry , i can't show you. mine just algorithm not software so got some lack. It can compress random letters and 10k letters is random.

  5. #5
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    thanks.But my algorithm not so advance , can't reach at that level.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Just split it to blocks. If you can compress 10000 letters to 9 bytes, it means you can split enwik8 to 10k blocks and compress them to 10k*9=90k total.
    It means you can claim the whole 50k euro prize.

  7. #7
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    but when decompress it need huge database or super computer and it cant compress chinese word .That's why i want let it go.

  8. #8
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    80
    Thanks
    22
    Thanked 3 Times in 3 Posts
    Try to compress these pure random files and post compressed results in attachment.

    Database size is not a problem for me and chinese strings can be completely filtered. So that´s not problem. I don´t need to decompress your results back to original files, I want only compressed archive of original files. Thanks.
    Attached Files Attached Files

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Its not a problem even if it can only compress valid english... Just type out the data as text, ie 0xFF = 255 = "two five five".
    Even if enwik8 becomes 1G, it should be still compressible to 900k, so you'd still get the full prize.

    Btw, hashing is not a solution for compression, not because it needs "huge database or super computer"
    to restore input data from hash value, but because of collisions.
    Even with assumed charset [\x20a-z] of 27 letters, you'd still start having collisions with 16 input symbols
    and 9 bytes of output:
    Code:
    16 letters = 27^16 = 79766443076872509863361
    9 bytes = 256^9 =     4722366482869645213696
    Also, you can't sell software rights that easily - the trade has to be officially registered in some way,
    usually you'd get a patent for your algorithm, then sell it.
    Otherwise you can always claim that your software was stolen, once the buyer starts making money from it.

  10. The Following User Says Thank You to Shelwien For This Useful Post:

    Obama (10th May 2019)

  11. #10
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    for 1 million random alphabet letters.txt -1028 bytes result
    Last edited by Obama; 10th May 2019 at 13:46.

  12. #11
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    You so nice , lead the stranger to the point.So good you are here.

  13. #12
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Input:
    1,000,000 bytes - 1 million random alphabet letters.txt

    Output:
    588,286 bytes - paq8px v178
    588,001 bytes - cmix v17

    -------------------------------------------------------

    Input:
    1,000,000 bytes - 1 million pure random data.txt

    Output:
    749,400 bytes - paq8px v178
    748,956 bytes - cmix v17

  14. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    1 million random alphabet letters.txt
    charset = [a-z], size 26
    1000000*Log[256.,26] = 587555

    1 million pure random data.txt
    charset = [\x0C\x1E07-9;=?ABD-FHIKMO-QTY\x5D\x5Ea-z\x7F\x83\x8D\x9E\xAF\xB0\xC6\xC7\xCE\xD3\xD5\xD8\ xDF\xE0\xE5\xE7\xEC-\xF0\xF3\xF6\xF8\xFA\xFC], size 79
    1000000*Log[256.,79] = 787973

  15. #14
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    how do you know the charset ?


  16. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    This script prints it. Its in perl.
    Attached Files Attached Files
    • File Type: zip 1.zip (497 Bytes, 20 views)

  17. The Following User Says Thank You to Shelwien For This Useful Post:

    Obama (10th May 2019)

  18. #16
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    To apply patent it need around RM15k .My algorithm can make unlimited compress data (I think , just tried 100k compress to 11 letters) , if apply patent my algorithm worth it or not ?

  19. #17
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Quote Originally Posted by Obama View Post
    To apply patent it need around RM15k .My algorithm can make unlimited compress data (I think , just tried 100k compress to 11 letters) , if apply patent my algorithm worth it or not ?
    We do not know whether to patent it.
    Since you need more information about the algorithm.
    ___
    By the way, how much can you compress book1?
    ___
    Have you tried to decode the archive and check the MD5 files?
    Attached Files Attached Files

  20. #18
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by Obama View Post
    1 million random alphabet letters.txt -1028 bytes
    You mean 1,000,000 bytes input, 1,028 bytes output is 973 times smaller?

    Did decompress also work and is file compare output equal to input?
    What file size has your compress and decompress software and do it use a database, if yes what size has the database?
    How long took it to compress and decompress?
    What program language did you use?
    Any idea for what price you want to sell your algorithm?

  21. #19
    Member
    Join Date
    Oct 2009
    Location
    usa
    Posts
    56
    Thanks
    1
    Thanked 9 Times in 6 Posts
    It is obvious that this fellow is pulling our chains and pressing our buttons. Let's make a graceful exit from his nonsense.

    1,000,000 random digits to 1028 bytes? Absolute rubbish, and no way even given 10^9 years of time...

  22. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    It depends on how you define "random" really: https://encode.ru/threads/3099-Compr...ll=1#post59940

  23. The Following User Says Thank You to Shelwien For This Useful Post:

    xinix (14th May 2019)

  24. #21
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    29
    Thanks
    12
    Thanked 0 Times in 0 Posts
    > How much compress text worth ?

    I had heard of your algorithm in the high places during the Cold War. But there are only 2 ^ (8,224) files addressed or compressed by your algorithm, not enough to cover all the files in your 2 ^ (8,000,000) source file space, of course.

    But how much compression algorithm worth? I guess it must be larger than $120 million paid by Microsoft to Stac for the Doublespace program infringement. Maybe compression algorithm must be worth more than $210 million or $220 million, considering that DeepMind was bought by Google for only $400 million.
    Last edited by compgt; 18th May 2019 at 08:59.

  25. #22
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    14
    Thanks
    6
    Thanked 3 Times in 2 Posts
    I skim through this thread for the third time and it strikes me again that this is a "Nigerian Prince" kind of thing. Bogus claims for gullible audience. But what if this could be true? Well, it reminds me that patent offices in some countries are prohibited by law to grant patents related to any kind of perpetual motion machines. The laws must be amended to prohibit the possibility of compression below the entropy.

  26. #23
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    29
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by svpv View Post
    I skim through this thread for the third time and it strikes me again that this is a "Nigerian Prince" kind of thing. Bogus claims for gullible audience. But what if this could be true? Well, it reminds me that patent offices in some countries are prohibited by law to grant patents related to any kind of perpetual motion machines. The laws must be amended to prohibit the possibility of compression below the entropy.

    Disallowing perpetual motion machines was, i think, at the height of thermodynamics theory debates. Heat is entropy. Heat is disorder or causes disorder in a system. As such, we don't want perpetual machines that generate heat and clog the universe of unnecessary motions.

    If he's the real Barack Obama, then he's from early information theory days, more like Claude Shannon adherents, who continue to revise the Shannon papers to suit the times.

    It could be true to them who don't write actual code for their compression ideas. But since i was remembering from the Cold War, highly-classified geniuses lurking in the background might actually have their own breakthrough fast compressors but limited by an NDA. Think men in black suits from the early days of information theory history.

    And i recall, back in the early days, some compilers were actually "rigged" or bugged for lesser compression, as well as MS-DOS etc, maybe corrupting the file_length() function or in the printed console. Thanks to Open Source nowadays, expert programmers can scrutinize how the file_length() function is implemented or how the file length is actually printed on screen. Or have they? Are there anomalies found? If you're a crooked Windows or Linux systems programmer, how would you do it? [Beware, this is of the conspiracy theory kind. I was deciding for the computing industry before; we were definitely concerned of these things.]
    Last edited by compgt; 18th May 2019 at 11:41.

  27. #24
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    772
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by svpv View Post
    that patent offices in some countries are prohibited by law to grant patents related to any kind of perpetual motion machines.
    All patents who are a threat to a country or their allies (military or economic against big companies) are wiped out after paying a compensation (by both agreed):

    "If the office considers that the secrecy of the contents of a patent application may be in the interests of the defense of the country or its allies, it shall make this as soon as possible, but no later than three months after the submission of the application is known. Our defense minister may give instructions to the agency regarding the assessment of the question or such interest."

  28. #25
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by zyzzle View Post
    It is obvious that this fellow is pulling our chains and pressing our buttons. Let's make a graceful exit from his nonsense.

    1,000,000 random digits to 1028 bytes? Absolute rubbish, and no way even given 10^9 years of time...
    Why rubbish please explain to me ?

  29. #26
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by xinix View Post
    We do not know whether to patent it.
    Since you need more information about the algorithm.
    ___
    By the way, how much can you compress book1?
    ___
    Have you tried to decode the archive and check the MD5 files?
    compress for me no problem ,but decompress i need huge database or super computer.Thats the reason i want sell it.i cant afford it .

  30. #27
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Quote Originally Posted by Obama View Post
    compress for me no problem ,but decompress i need huge database or super computer.Thats the reason i want sell it.i cant afford it .
    How long will it take to unpack a 1MB file?

  31. #28
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sportman View Post
    You mean 1,000,000 bytes input, 1,028 bytes output is 973 times smaller?

    Did decompress also work and is file compare output equal to input?
    What file size has your compress and decompress software and do it use a database, if yes what size has the database?
    How long took it to compress and decompress?
    What program language did you use?
    Any idea for what price you want to sell your algorithm?
    Yes, 1,000,000 bytes to 1,028 bytes.
    Yes,equal if try with short compress.
    few GB.
    Few day
    Python,but i can use any language cause i got algorithm.
    No idea,just offer me.

  32. #29
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    32
    Thanks
    8
    Thanked 0 Times in 0 Posts
    It depends on you want to use database or just calculate.
    if database sure will faster,if calculate only need super computer

  33. #30
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    46
    Thanks
    202
    Thanked 10 Times in 9 Posts
    Quote Originally Posted by Obama View Post
    It depends on you want to use database or just calculate.
    if database sure will faster,if calculate only need super computer
    Unpack speed using database?
    And her size?

Page 1 of 3 123 LastLast

Similar Threads

  1. How to compress text bitmap?
    By hey in forum Data Compression
    Replies: 5
    Last Post: 4th August 2017, 19:55
  2. The lag-based compression algorithm (worth 1B$!)
    By EagleOne in forum Data Compression
    Replies: 7
    Last Post: 17th September 2015, 18:21
  3. Would this be worth it for a compression rig ?
    By SvenBent in forum The Off-Topic Lounge
    Replies: 3
    Last Post: 19th May 2015, 07:09
  4. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 16:31
  5. Text Detection
    By Simon Berger in forum Data Compression
    Replies: 15
    Last Post: 30th May 2009, 09:58

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •