Results 1 to 29 of 29

Thread: Memory Limit?

  1. #1
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Memory Limit?

    Hello,
    I have a few questions. In a number of threads I keep seeing references to problems/tricks to getting dictionaries that are larger than 256MB. And this is on machines with GBs of memory.

    What am I missing? Do the authors leave out the size of support structures? If yes why? Or do you need to support multiple copies of the dictionaries? IE maybe copying and reordering the contents of one dictionary into a revised version.

    I have to ask, on my one GB machine I can easily allocate over 760MB in a single block (not using Windows mind you).

    Is this a limit in Windows or is there something else I am missing.

    PS. Since compression seems to go up with the size of dictionaries has anyone tried using a very large virtual memory or memory mapped files to test compression for large memory systems that are not available yet?
    Last edited by Earl Colby Pottinger; 7th April 2010 at 21:52.

  2. #2
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I don't understand where any artificial limits can come from, like the 256MB. It is the biggest you can get into a 28-bit number, of course - that could be it. In which case, people might explore using aligned allocations and getting a few more entries in that way.

    There's of course a distinction between address space and physical RAM. And virtual address space can be backed by disk, whether explicitly by the programmer (e.g. nmap) or by the OS (e.g. swap).

    A 32bit x86 binary has basically between 2 and 3 GB of address space (depending on OS), and that sets the upper limit on possible dictionary even if the machine has more RAM (e.g. PAE).

    I use 64bit linux so I have 256TB of addressable memory. But I only 4GB, and ~1GB is roughly spoken for by other stuff.

    So I've learnt that if my programs use more than 3GB of RAM, it automatically flows over into swap. As large dictionaries are typically random access, and random access is precisely worst case for harddisks (SSDs of course its not), then once I overflow my 3GB of real RAM the performance of compressors drops like a stone. In fact, like a very heavy stone in very thin air from a great height. Hmm, a very sharp needle shaped stone in fact. Now I'm beginning to wonder what I remember from school about terminal velocities and things. Its always that way when you try to draw comparisons between computing and the real world. But swap is that bad, believe me.

    So I'd recommend avoiding disk for random access structures e.g. dictionaries.

    It might be that there are some datastructures that are more disk-friendly than the hashmaps and trees that I've used, just as there are more cache-conscious versions of these structures; I am not aware of anyone trying to utilise them - in fact, putting disk aside for a bit, cache conscious datastructure decisions might help people chasing speed.

    But disk? Naw, avoid it.

    The current LTCB king Durcilia used 13GB of real RAM in to mount the summit. I've seen machines with considerably more real RAM than that - 96GB and beyond is available in average single system image boxes these days, and you can always back it up with some SSD instead of disk.

  3. #3
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    But

    Look at this thread: http://encode.dreamhosters.com/showthread.php?t=562 where he has 4GB of memory but can't even create 500MB dictionary. Why?

  4. #4
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by willvarfar View Post
    As large dictionaries are typically random access, and random access is precisely worst case for harddisks (SSDs of course its not), then once I overflow my 3GB of real RAM the performance of compressors drops like a stone. In fact, like a very heavy stone in very thin air from a great height. Hmm, a very sharp needle shaped stone in fact. Now I'm beginning to wonder what I remember from school about terminal velocities and things. Its always that way when you try to draw comparisons between computing and the real world. But swap is that bad, believe me.

    So I'd recommend avoiding disk for random access structures e.g. dictionaries.

    It might be that there are some datastructures that are more disk-friendly than the hashmaps and trees that I've used, just as there are more cache-conscious versions of these structures; I am not aware of anyone trying to utilise them - in fact, putting disk aside for a bit, cache conscious datastructure decisions might help people chasing speed.

    But disk? Naw, avoid it.

    The current LTCB king Durcilia used 13GB of real RAM in to mount the summit. I've seen machines with considerably more real RAM than that - 96GB and beyond is available in average single system image boxes these days, and you can always back it up with some SSD instead of disk.
    I do have a 80BG SSD, Windows 7 takes up 40GB. The remaining is available for direct access if needed.

  5. #5
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well on windows 7 64-bit, with 4GB of ram I can use dictionary sizes up to 300MB, with windows itself running on around 500MB of RAM.

    Quote Originally Posted by Earl Colby Pottinger View Post
    Look at this thread: http://encode.dreamhosters.com/showthread.php?t=562 where he has 4GB of memory but can't even create 500MB dictionary. Why?
    Quick answer is a 500MB dictionary uses up a lot more than 500MB of memory.
    Just the 300MB dictionary I can use takes up around 3.4GB of RAM.

  6. #6
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Why then?

    Quote Originally Posted by ForPosterity View Post
    Well on windows 7 64-bit, with 4GB of ram I can use dictionary sizes up to 300MB, with windows itself running on around 500MB of RAM.

    Quick answer is a 500MB dictionary uses up a lot more than 500MB of memory.
    Just the 300MB dictionary I can use takes up around 3.4GB of RAM.
    That does not make sense, if it takes up 3.4GB then why not call it a 3.4GB dictionary?

    If you are talking of an array of pointers to strings that make up the entries in the dictionary why not talk in terms of the number of pointers. After-all pointers could be 2-3-4 bytes long and even in some special cases only one byte pointers are needed.

    It seems a strange way to talk about dictionary size to me. Is there a reason it is talked about this way?

  7. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Because its only working on 300mb of data.
    the dictionary size (the size of the block being compressed) is 300mb
    but handling all the actions on working on that 300mb blk take 3.4 gb ram

    the memory usage is differet from the same size dictionary depending on how you compression work
    but you still only has 300mb of input tat you work on.


    let say you have lots of identical data block at 100mb a 300mb dictionary would easyli see this as it works on 300mb "at a time" but if your complete identical block is 301mb in size you program will not see the identical data as it only works on 300mb at at time.


    you should have tried just reading the forum you would discover this answer

    I have no programming or any expert knowledge around this. and i could figure this out by reading the forums.


    I even asked directly about this so you answer was already giving by bulat to me in the forum.

    my advice is to read before posting questions.
    Last edited by SvenBent; 8th April 2010 at 10:21.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Earl Colby Pottinger View Post
    I do have a 80BG SSD, Windows 7 takes up 40GB. The remaining is available for direct access if needed.
    don't forget to take into account internet resources!

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by willvarfar View Post
    The current LTCB king Durcilia used 13GB of real RAM in to mount the summit. I've seen machines with considerably more real RAM than that - 96GB and beyond is available in average single system image boxes these days, and you can always back it up with some SSD instead of disk.
    try it! reduce your ram to 64mb with ssd-backed swap and try to load vista

  10. #10
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Well

    First, I have been reading the posts. That is why I asked the question. To me a dictionary would be the separate work data needed to compress the data block. The posts were getting very confusing to me.

    Second, if it is so clear why did you have to ask the same question yourself? You stated yourself you had to ask.

    Ok, I get it now, the terms are inverted from the way "I expected them to be used". That is why I had to ask, so I could understand how the terms were being used here.

    ================================================== =


    Now going back to my second question, my SSD transfers data at 115 MB/s and I understand with newer drives they will easily handle 250MB/s, so does anyone consider that fast enough for compression purposes? Remember compared to regular hard drives there is nearly zero latency to data access.

  11. #11
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Loaded Test

    Quote Originally Posted by Bulat Ziganshin View Post
    try it! reduce your ram to 64mb with ssd-backed swap and try to load vista
    That is a loaded test, Vista handles ALL low resource problems poorly.

    Haiku-OS (what I am programming in) will run in 32MB without even touching the swap file, and I bet if I tested Windows 7 that it probably will run fine in under 64MB if the swap file is on a SSD. I don't have a memory module that small for the machines I have do that have Windows 7 installed otherwise I would confirm it myself.

  12. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Quote Originally Posted by Earl Colby Pottinger View Post
    That does not make sense, if it takes up 3.4GB then why not call it a 3.4GB dictionary?

    If you are talking of an array of pointers to strings that make up the entries in the dictionary why not talk in terms of the number of pointers. After-all pointers could be 2-3-4 bytes long and even in some special cases only one byte pointers are needed.

    It seems a strange way to talk about dictionary size to me. Is there a reason it is talked about this way?
    300 MiB from that 3.4 GiB is a buffer containing literal data. The rest are structures for speeding up calculations. It would be possible to use only those 300 MiB but then compression would take years.

    Another thing is that one program can use 5 GiB for 500 MiB dictionary and another program can use 20 GiB for same 500 MiB dictionary and output identical files and they both need only 500 MiB of free memory to decompress.

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    there is nearly zero latency to data access.
    try to find exact numbers for ram and ssd

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Earl Colby Pottinger View Post
    That is a loaded test, Vista handles ALL low resource problems poorly.
    compare this to that you said before. you should either believe that you can replace ram with cheaper ssd or not believe - or you think that vista has "bad handling" only for my arguments?

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I don't have a memory module that small

    http://msdn.microsoft.com/en-us/libr...02(VS.85).aspx

    You can use the "truncatememory" option or something.

  16. #16
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Okay

    Quote Originally Posted by Piotr Tarsa View Post
    Another thing is that one program can use 5 GiB for 500 MiB dictionary and another program can use 20 GiB for same 500 MiB dictionary and output identical files and they both need only 500 MiB of free memory to decompress.
    AH! Now that make sense to me now why you use the terms that way.

  17. #17
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    ???????

    Quote Originally Posted by Bulat Ziganshin View Post
    compare this to that you said before. you should either believe that you can replace ram with cheaper ssd or not believe - or you think that vista has "bad handling" only for my arguments?
    No, SSDs are slow compared to ram, true. But compare the speed of present day SSDs to the ram speed of machines even just ten years ago. And people where doing compression then.

    Additional not all compression schemes need/uses a sliding window to work on data.

    For example a multi-pass compressor may stream the data file thru once to collect stats of a file, then read the file again from the beginning to apply the the collect information to compress the file.

    Static Huffman would be a simple example, but now the huffman table can be larger because the original data is not kept in memory, additionally you are decoupling the size of the data block from how much memory the computer has. And unless I misunderstand how it works LZ can use collections of strings assign to the tokens in memory instead of pointers into the original data.

    Yes, I know the above is slower because of all the extra memory moves, but the question I am asking is "Aside from speed, will this allow you to get better compression?".

    I understand why it was not done with hard drives with their very high latency compared to RAM, but SSDs are a lot faster.

    Still, I have seen no figures for memory transfer lately, so how fast is memcopy on a modern machine now?

    I will also write a test in a few minutes, but I am presently on a 1.6 GHz netbook.

    PS. What am I not seeing?

  18. #18
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    RAM (Random Access Memory) are meant to be sollicitated anytime for instantly reading and writing anywhere and be responsive.
    In contrast, SSD are meant to store files.

    So you are right, SSD are vastly superior to HDD when it comes to access time, however, they are not meant for reading and writing bytes anywhere anytime. A quick look at Google should give you some clue on how they work.
    Writing, in particular, is an operation which requires re-initializing a whole "sector", typically 4KB. So it's just a plain waste for updating a single 64-bit pointer.

    You can, however, imagine SSD as a kind of "sequential storage area", for writing and reading sizable amounts of data (well, say at least 64KB, so it's not huge). This is only useful if you organize your algorithm in a way that it is not triggering 64KB just to read a single field, hence the "sequential" advisable property.
    Alas, most compression algorithm are not meant for that.


    Now thing can evolve, and maybe in the future, we'll get SSD acting like RAM in their capability to read and write bytes anywhere anytime. I've read a few claims on that from various tech companies. Just we are not there yet.

  19. #19
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    RAM vs SSDs

    Quote Originally Posted by Bulat Ziganshin View Post
    try to find exact numbers for ram and ssd
    Well, I tested my SSD in a Toshiba NB205, not exactly the world's fastest machine but it gets the job done for me.

    I already tested the SSD drive as transferring 115MB/s when reading from the drive as a block device using Haiku-OS using a 1MB buffer.

    I then just quickly wrote this program:

    #include <SupportDefs.h>
    #include <KernelExport.h>
    #include <Drivers.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #include <time.h>
    #include <string.h>

    uchar Buffer1[1000000];
    uchar Buffer2[1000000];

    int main(int argc, char **argv) {

    long index, count; // General purpose variables.
    bigtime_t stimer;

    stimer = system_time();

    for (count = 0L; count < 1000L; count++) {
    memcpy(Buffer1,Buffer2,1000000);
    }

    index = count*1000000L/(system_time()-stimer);

    printf("Copy speed in MB/s: %d\n",index);
    printf("Compared to SSDs: %d\n", index*100/115L);
    return 0L; }

    It shows memory copy working at 965 MB/s on my 1.6GHz Atom machine or just over 8 times faster than the SSD.

  20. #20
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Compression methods

    Quote Originally Posted by Cyan View Post
    RAM (Random Access Memory) are meant to be sollicitated anytime for instantly reading and writing anywhere and be responsive. In contrast, SSD are meant to store files..
    And do the all compression methods in use today need that random access to process files?

    I am asking for real, has the main thrust of research really found this is the best approach?

    For example, why would static Huffman or static Arithmetic need random access to a file's content? Again I am asking, just like my original question about dictionaries I feel like I am missing a point here.

    I am not trying to cause problems, it just is that I stop doing serious working in compression methods in the late 1990's, and what I learnt was getting the job done for me for the last decade or so.

    But now I need to relearn the art of compression because it is a key element in the ram drive I want to write for the Haiku-OS, and that is why you find me here asking question.

    I really want to learn, not annoy.

    Sorry.

  21. #21
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    I already tested the SSD drive as transferring 115MB/s when reading from the drive as a block device using Haiku-OS using a 1MB buffer.
    This is "sequential streaming" comparison. Even mechanical HDD will give you fair performances (for instance one of mine provides 70MB/s in a sequential scenario).
    This just miss the point how RAM is being used.

    And do the all compression methods in use today need that random access to process files?
    In a word, yes.

    You have a point with static huffman, although this is not really within the scope of your own question. I think your idea was about using "more than physical RAM" data structures, so static Huffman, which just needs a few kilobytes, does not qualify.

    RAM usage is not about reading the source file and outputting a bit sequence. This part is indeed sequential, and can be done from/to hard disk as you say (and is done this way).

    It is about creating data references, pointers, tables and trees, for searching, comparing, or building multiple context-dependent statistics. By essence these are random access.

    For example, keeping the "Huffman" example, you could imagine an Order-3 huffman compressor, using 3 bytes context as an index into a table. This would mean 16 millions huffman tree statistics, a scenario for which you are likely to need a few Gigabytes of memory (if not using any "trick"). So you could imagine mapping all these statistics into a big file. (Note : this is just an example, remove the word "huffman" and use the more generic "entropy coder" instead, and you have the basic of PPM).

    Now, if you read your source stream to collect statistics, how do you imagine you will access your "statistic file" and write statistics ? Is it sequential access ?
    If unclear, just manually start the exercise on this post content. You'll get the idea.

    After writing your own compressor, if you happen to design one that needs a lot of memory, all this will look much clearer to you.
    Last edited by Cyan; 9th April 2010 at 02:00.

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Compression vs. memory: see the second graph in http://mattmahoney.net/dc/dce.html#Section_22 LTCB size vs. memory. Notice that compression ratio depends on memory size much more than on which algorithm you use.

    The reason for this effect is that there is mutual information between all points in the file. As you scan the file, each byte or bit to be predicted depends on statistics that are scattered throughout the history. It requires random access.

    Not all data has this property. Small files can be compressed with small memory. So can large files that depend only on local context, such as images or audio or some simple data types.

    Random array access time is O(log n) when you consider address decoding time and data multiplexing time. Random access is slower than sequential access for all types of memory or storage.
    Last edited by Matt Mahoney; 9th April 2010 at 03:13.

  23. #23
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Earl Colby Pottinger View Post
    It shows memory copy working at 965 MB/s on my 1.6GHz Atom machine or just over 8 times faster than the SSD.
    you are fooled by hardware reviewers. actually, main difference between ssd and hdd is access time, not the sequential transfer speed. and since compressors are reading 1-8 bytes a time, only access time is important:
    hdd: 10 ms
    ssd: 100 us
    ram: 50 ns

    btw, srep is the only algo i know that is liberal to access times. but its usage is limited - it only finds repetitions of large strings (e.g. >512 bytes)

  24. #24
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I came across this blog post and thought it obliquely useful to anybody a bit fuzzy how things work who might later find this thread, so here's a link:

    http://igoro.com/archive/gallery-of-...cache-effects/

    Now when I read it I have one hundred "what about?... does he not know about?... ah but I think that's mischaracterised... etc" but I think as a general introduction its excellently readable.

  25. #25
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    856
    Thanks
    447
    Thanked 254 Times in 103 Posts
    Very interesting reading. Though not related to SSD or HDD, it's a useful read for performance-minded coders. The author has the documented testing tools and readable chart results to back up his words. That's excellent methodology.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    btw, about optimizing memory access. ht matchfinder, especially cached ht, is a good optimization technique. in srep, i have found another one, seems to be unique: bit maps. bitmap answers to one question "does this this string present in large hash?". since bitmap stores only one bit for every element, it's much smaller than hash itself. of course, it has meaning only if probability of presence is rather small, say <=50%, and total number of elements isn't very much (otherwise, it can't fit into L2 cache). the second limitation probably makes it useful only for srep-like algos

  27. #27
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    856
    Thanks
    45
    Thanked 104 Times in 82 Posts
    Quote Originally Posted by Earl Colby Pottinger View Post

    Second, if it is so clear why did you have to ask the same question yourself? You stated yourself you had to ask.
    Ermm. lets just never mention that again

    Sorry if i got a little offensive. I think i read the "tone" in you prior messages wrong"

  28. #28
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    ... bit maps. bitmap answers to one question "does this this string present in large hash?". since bitmap stores only one bit for every element, it's much smaller than hash itself...
    As in a bloom filter?

  29. #29
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Error in math

    Working on my own compression code I found a number of assumptions based on a simple math error.

    In my code 112 / 4 = 23! Don't ask how I made that mistake, but now I have to go thru my entire code very carefully to see what else may be wrong!

    I think I may have cost my best friend 6 months of his life as I yelled so loud when I discovered that mistake.

Similar Threads

  1. CLI memory teste needed
    By SvenBent in forum The Off-Topic Lounge
    Replies: 7
    Last Post: 21st April 2010, 08:06
  2. IBM Active Memory Expansion
    By m^2 in forum Data Compression
    Replies: 0
    Last Post: 17th February 2010, 01:15
  3. 2G+ memory blocks
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 6th March 2009, 03:13
  4. Data decompression on in-memory kernel
    By cregd in forum Data Compression
    Replies: 8
    Last Post: 27th January 2009, 18:24
  5. Can't allocate memory required for (de)compression..help!
    By Duarte in forum Data Compression
    Replies: 19
    Last Post: 18th July 2008, 18:14

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •