Results 1 to 7 of 7

Thread: Universal data detector?

  1. #1
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts

    Universal data detector?

    Is it possible to create an universal data detector?


    For example,the data detector scan all the entire file searching for compressible/uncompressible patterns like file type detection (zip and jpeg files which are uncompressible and wav type files inside of file) or data patterns (big repetitions like srep does ,except it will not compress the file).


    And after the scanning ,the data detector generates a instruction file (or preprocessed file) to how the compressor must apply the compression in the most efficient way.

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    No (proved by Kolmogorov).

  3. #3
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    No (proved by Kolmogorov).
    You can't create an optimal universal data detector, as Matt said. If you can tolerate a heuristic-based one, then of course it's possible. There are archive utilities and compressors that attempt to recognize compressible versus non-compressible components and act accordingly.
    Last edited by nburns; 10th July 2014 at 04:32.

  4. #4
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by lunaris View Post
    zip and jpeg files which are uncompressible
    Many .zip files are compressible. I think the simplest method is unzip->zpaq.

  5. #5
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by nburns View Post
    You can't create an optimal universal data detector, as Matt said. If you can tolerate a heuristic-based one, then of course it's possible. There are archive utilities and compressors that attempt to recognize compressible versus non-compressible components and act accordingly.
    Precomp for example ,it detects lot of zipped file types and some deflate streams.Srep detects huge distance repetitions .So I think precomp can detect another type of files (like wav and apply flac or wavpack on them or do nothing,just detect).Precomp can be extended for a lot of files.

    I think data detection is a form of opposite of compression.Data detection tries to make the file more compressible(file more simple to understand).And compression makes the data less compressible(files more hard to understand)

    But it depends of compressor used and (type) of intelligence of user on analisyng the data.Some people thinks multiplications are more simple than additions.
    Last edited by lunaris; 10th July 2014 at 11:38.

  6. #6
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    "Many .zip files are compressible. I think the simplest method is unzip->zpaq. "



    That's what I was trying to explain.They are compressible(or better compressible) only on uncompressed forms.

  7. #7
    Member
    Join Date
    Jul 2014
    Location
    Kenya
    Posts
    59
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by lunaris View Post
    Is it possible to create an universal data detector?


    For example,the data detector scan all the entire file searching for compressible/uncompressible patterns like file type detection (zip and jpeg files which are uncompressible and wav type files inside of file) or data patterns (big repetitions like srep does ,except it will not compress the file).


    And after the scanning ,the data detector generates a instruction file (or preprocessed file) to how the compressor must apply the compression in the most efficient way.
    http://encode.ru/threads/2006-Hello-...e-an-algorithm
    This algorithm has information to preprocess and scan a file for specific areas to specifically work on in advance than on the whole file in one go to be more specific to file distribution as well as compressible/uncompressible areas.

    A bit more info in this:
    http://encode.ru/threads/1987-Random...ll=1#post39529

Similar Threads

  1. reflate - a new universal deflate recompressor
    By Shelwien in forum Data Compression
    Replies: 119
    Last Post: 28th March 2018, 22:52
  2. 64 GB/s aka 16 bytes/cycle universal hashing
    By Bulat Ziganshin in forum Data Compression
    Replies: 2
    Last Post: 10th April 2014, 01:33
  3. Detector for ex-JPEG images
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 22
    Last Post: 13th September 2012, 03:18
  4. Universal Archive Format
    By Bulat Ziganshin in forum Data Compression
    Replies: 1
    Last Post: 9th July 2008, 00:54
  5. MM type detector
    By Bulat Ziganshin in forum Forum Archive
    Replies: 10
    Last Post: 5th April 2007, 15:32

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •