Results 1 to 17 of 17

Thread: stack shrinking in Linux and Windows

  1. #1
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts

    stack shrinking in Linux and Windows

    Hello,
    I would like to use the stack to temporary store some data.

    I just became aware that there is a need to hit (write to) the guard page to grow the stack in Windows. So growing works in 4 kilobyte steps only and not as I tried:
    esp -= 10 kilobyte
    RAM[esp] = 123
    But when I am finished I would like to free the used memory again. So I tried something like this:
    esp += 10 kilobyte
    And hoped that a task switch will trigger the deallocation of the pages. But in Windows it doesn't. I haven't tested it in Linux.

    Does someone know how to shrink the stack in Linux and Windows?

    Btw: Does growing the stack also require an incremention in 4 kilobyte steps in Linux like it's necessary in Windows?
    Last edited by just a worm; 10th December 2014 at 06:05.

  2. #2
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Are you writing this in assembly language? If you are using C or C++, it's probably a much better idea to do the allocation from inside the language rather than directly manipulating registers. Newer compilers should let you allocate dynamically-sized local arrays; failing that, there's alloca. I'm not surprised that just decrementing esp doesn't deallocate stack pages. Faulting in pages is expensive, so if the OS deallocated pages instantly as soon as the stack shrunk, it would tend to carry an excessive risk of thrashing. If memory were to get extremely tight, Windows might reclaim that page, but my guess is that it would get swapped and not just discarded, since now that you've touched it, it's considered dirty.

    My sense is that if you believe you need to shrink the stack, you are probably doing something wrong.

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I also think that stack can't be shrunk as OS doesn't know whether data will be used or not. The only exception is when thread ends. So you can spawn new thread, wait for it to end, receive data and then continue. But would that be faster than allocation on heap? I doubt. You can preallocate chunks of memory, use custom allocators and placement new (in place constructors).

  4. #4
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by nburns
    Are you writing this in assembly language? If you are using C or C++ ...
    An assembly-a-like language. No C(++) is involved.

    Quote Originally Posted by nburns
    I'm not surprised that just decrementing esp doesn't deallocate stack pages. Faulting in pages is expensive, so if the OS deallocated pages instantly as soon as the stack shrunk, it would tend to carry an excessive risk of thrashing.
    So what's your idea how stack shrinking works? As a one way trip?: Once it has grown it never shrinks again? It looks like this is actually the way it is implemented in Windows. I hope that the folks who programmed Linux implemented something less memory consuming.

    Well I am surprised that incrementing esp doesn't deallocate stack pages. But the deallocation wouldn't be instantly anyway because the OS isn't intervening after every single instruction is executed anyway. In the x86 architecture the register is defined to point to the top of the stack. So in my opinion the operating system is free to assume that the programmer isn't using this register for a different purpose as a default. Especially because for example you can't use the push and pop instructions anymore if the stored address isn't even dividable through 4 anymore. But of course as an operating system you can't strictly assume that this is always the case so I would expect a fall back function to deactivate the automatic stack shrinking by determining the top by the value stored in esp. If the stack grows then this is less expensive if you add a threshold like assuming that the data to the address "esp - 8 kilobytes" shouldn't be discarded. You would need a threshold anyway because quite some assembler programmers use RAM[esp - 4] and maybe even RAM[esp - 8] as a temporary storage.

    Quote Originally Posted by nburns
    but my guess is that it would get swapped and not just discarded
    Yes you are right about that. Windows doesn't discarded it but swaps it out to disk.

    Quote Originally Posted by Piotr Tarsa
    I also think that stack can't be shrunk as OS doesn't know whether data will be used or not.
    The OS would need to define something on which it can relies on. Something like "The operating system is free to discard the applications stack data beyond esp - 8 kilobyte." But I haven't found something like this.

    Well ok. I will try to reduce the memory consumption so that the stack doesn't need to grow at all. If this fails then I will do a brutal force stack shrinking I read somewhere else. Which is using the function VirtualFree from the kernel32.dll to get rid of the unneccessary pages and VirtualAlloc to set the guard page to the correct place. I don't like having memory leaks. Allocating memory and not deallocating it anymore isn't clean in my opinion either. But I agree that this whole issue is somewhat dirty programming. But in my opinion it's due to a design bug in the memory management of the stack from Linux and Windows.
    Last edited by just a worm; 14th December 2014 at 19:23.

  5. #5
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by just a worm View Post
    So what's your idea how stack shrinking works? As a one way trip?: Once it has grown it never shrinks again? It looks like this is actually the way it is implemented in Windows. I hope that the folks who programmed Linux implemented something less memory consuming.
    It used to be very hard to deallocate pages even in the heap. I remember reading internet discussions about this a while back. I think that may have changed. As you know, virtual pages don't correspond 1:1 to physical pages, so how this relates to physical memory usage is not straightforward.

    The OS would need to define something on which it can relies on. Something like "The operating system is free to discard the applications stack data beyond esp - 8 kilobyte." But I haven't found something like this.

    Well ok. I will try to reduce the memory consumption so that the stack doesn't need to grow at all. If this fails then I will do a brutal force stack shrinking I read somewhere else. Which is using the function VirtualFree from the kernel32.dll to get rid of the unneccessary pages and VirtualAlloc to set the guard page to the correct place. I don't like having memory leaks. Allocating memory and not deallocating it anymore isn't clean in my opinion either. But I agree that this whole issue is somewhat dirty programming. But in my opinion it's due to a design bug in the memory management of the stack from Linux and Windows.
    The system in place works pretty well for the vast majority of applications. The way to force different behavior would probably be through system calls. It sounds like you may have found the ones you want.

  6. #6
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by nburns
    It used to be very hard to deallocate pages even in the heap.
    The heap is more expensive regarding speed than the stack is. Especially because the memory management of Windows is pretty sophisticated. The system calls of unix are more direct and faster but the standard c library adds an additional speed penality, too. So I agree with your strategy in principle that beeing more easy with changing the heap size is a good way to go for.

    Quote Originally Posted by nburns
    As you know, virtual pages don't correspond 1:1 to physical pages, so how this relates to physical memory usage is not straightforward.
    Well, yes. But if a once allocated not-discardable page is not in the physical memory, then it is in the page file. And the page file is usually smaller than the physical memory. So the sum of available pages is limited to physical memory + the extra you get from the page file.

    Quote Originally Posted by nburns
    It sounds like you may have found the ones you want.
    For thouse who are interested: A similar solution hopefully also works in Linux. But I haven't looked detailed enough which system calls are responsible for deallocating pages etc. and what else needs to be considered.
    Last edited by just a worm; 22nd December 2014 at 23:57.

  7. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by just a worm View Post
    The heap is more expensive regarding speed than the stack is. Especially because the memory management of Windows is pretty sophisticated. The system calls of unix are more direct and faster but the standard c library adds an additional speed penality, too. So I agree with your strategy in principle that beeing more easy with changing the heap size is a good way to go for.
    You might be right that the stack is somehow faster, but I don't really see why that must be so (maybe it's a quirk of Windows?). The heap is clearly the place where dynamic allocation was meant to be done. The heap lets you set the size explicitly (with, e.g., pbrk() or equivalent) -- actually this is a requirement.

    Well, yes. But if a once allocated not-discardable page is not in the physical memory, then it is in the page file. And the page file is usually smaller than the physical memory. So the sum of available pages is limited to physical memory + the extra you get from the page file.
    If you are running low on space in the page file, you are generally headed for trouble. I'm fairly sure that most of the time there is not much in the page file (in Windows), and it's not ordinary to expect there to be competition between running apps for space there. If that situation occurs, I'd be expecting a system crash soon.

    For thouse who are interested: A similar solution hopefully also works in Linux. But I haven't looked detailed enough which system calls are responsible for deallocating pages etc. and what else needs to be considered.

  8. #8
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    just a worm:

    Have you looked at GNU obstacks? With obstacks you have very fast stacklike allocation and deallocation, where allocation basically just bumps a pointer and does a comparison to a bounds pointer with a branch rarely taken, and deallocation is similar.

    obstacks manage memory in big chunks, and you can supply a routine to allocate pages for a new chunk, and deallocate pages for a chunk that's been emptied. You can use mmap() to allocate and munmap() to deallocate, ensuring that you immediately reclaim RAM and swap (pagefile) pages. That may be the default, but I don't know. (I used to know, but it may have changed.)

    I'm curious whether you could get the performance you want just by linking in a different allocator, such as Doug Lea's, which does deferred coalescing, and the usual allocation and deallocation for stacklike patterns is very fast. (Essentially just popping a free block from the front of a linked list for the size in question to allocate, and pushing it on the front of that list when you're done with it.)

    If your allocator is much slower than that, I'm curious which allocator it is and why it's so slow. It might be locking overhead for thread safety, which you may be able to do without if you're not running multithreaded, or which you could reduce by handling locking yourself in a cheaper way that works for your app.

  9. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Paul W. View Post
    just a worm:

    Have you looked at GNU obstacks? With obstacks you have very fast stacklike allocation and deallocation, where allocation basically just bumps a pointer and does a comparison to a bounds pointer with a branch rarely taken, and deallocation is similar.

    obstacks manage memory in big chunks, and you can supply a routine to allocate pages for a new chunk, and deallocate pages for a chunk that's been emptied. You can use mmap() to allocate and munmap() to deallocate, ensuring that you immediately reclaim RAM and swap (pagefile) pages. That may be the default, but I don't know. (I used to know, but it may have changed.)
    I love the idea of GNU obstacks, but I don't use them due to their lack of portability. They're included with glibc and, IIRC, on Ubuntu they look just like any normal part of glibc and they can be used just by including a header file (at most). I'm guessing that this is the case on pretty much all Linux systems, or at least all the major distros.

    BSD-based systems, like OSX, will generally have no trace of them in their default configuration. Code using obstacks that builds fine on Linux will not compile here.

    At least, it won't compile unless you have already taken additional steps to make it BSD-portable. GNU has also put obstacks into libiberty (http://gcc.gnu.org/onlinedocs/libiberty/index.html), and libiberty seems to be the solution GNU provides for this problem. I have never actually tried libiberty, though, because it seems like a weird idea to me and I don't trust it; and I'm not clear on what the ramifications are for licensing. Everything about libiberty smells fishy to me. I believe that GNU has good intentions and that generally their code is of the highest quality, so if I spent enough time researching libiberty, I expect that my fears may all be unfounded. But for the functionality libiberty gives you, I'd rather just avoid obstacks and recreate whatever piece I need as necessary.

    What I'd really like is to see someone reimplement the useful parts of obstacks with a very permissive license (preferably public domain). I've thought about taking on such a project, but I've been getting along so easily in the meantime without obstacks that I guess I'm not surely exactly what usage scenarios they are really good for. If I had used them more for real work, gaining experience along the way, that problem might disappear. But I always hesitate to buy into things like this just because they sound like a good idea. A lot of things sound like good ideas but aren't, and a lot of times it can be very difficult to understand why they failed. They just fail because no one uses them, and IMO this crowd judgment is virtually never wrong (you can tell, because when the unpopular thing that seemed like a good idea goes away, you will never miss it). The indications are that GNU obstacks are not very popular.
    Last edited by nburns; 25th December 2014 at 23:24.

  10. The Following User Says Thank You to nburns For This Useful Post:

    just a worm (27th December 2014)

  11. #10
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by just a worm
    Quote Originally Posted by nburns
    It used to be very hard to deallocate pages even in the heap.
    The heap is more expensive regarding speed than the stack is. Especially because the memory management of Windows is pretty sophisticated. The system calls of unix are more direct and faster but the standard c library adds an additional speed penality, too. So I agree with your strategy in principle that beeing more easy with changing the heap size is a good way to go for.
    I am sorry if my response is a little wired. I read "I used to be ...". Instead you where talking how Linux does it.

    Quote Originally Posted by Paul W.
    Have you looked at GNU obstacks?
    No, I haven't. Till now I was avoiding the whole "libc"-library and used the system calls directly. That is mainly because it's easier + I had no use for it. For you c programmers it's probably easier to use a library than system calls.

    Now, from your description and nburns additional comments + the documentation I found on the GNU website at http://www.gnu.org/software/libc/man.../Obstacks.html I still got just a rough idea of the whole issue. I will try to summarize it with the hope that someone can confirm or contradict my understanding:
    ---
    Obstacks is a concept to manage memory and also an implementation which exists in the "glibc"-library. It does not exist by default in other Unix based operating systems but Linux due it's existence is not mentioned in the original description of Unix.

    Managing the memory works by using the functions from the library. The library itself uses functions that need to be defined (given) by the caller to allocate and deallocate the necessary memory. The caller can give functions which allocate memory from the stack or the heap - depending on what he prefers. So obstacks is not for doing the allocation and deallocation but rather to manage small memory parts within big memory parts.

    The concept intends to use more memory than would be necessary to serve a single allocation request by the caller. The hope is to be able to serve several allocation requests of the caller with a single allocation request from the heap or the stack.

    The concept adds the restriction that you have to free the last allocated part of the memory first.
    ---
    Now if this understanding is correct, then it wouldn't help to solve the allocation and deallocation difficulties due to my lack of a good understanding in how Linux manages the stack. But it would be a replacement for what I am currently doing by using the heap instead.

    My current understanding of Linux's memory management of the stack is, that you are free to write everywhere on the reserved stack space. As soon as you do so Linux creates all pages from the last one which exists because of the current stack size to the accessed page. So it's not possible to create "holes". If you don't need something from the stack anymore it doesn't matter which value is store in esp: the stack just stays the same size and no automatic shrinking is beeing done by Linux - just like Windows doesn't do anything.

    But really ... folks ... the Linux community needs to write better documentations. They are just awful.

    Quote Originally Posted by nburns
    I love the idea of GNU obstacks, but I don't use them due to their lack of portability.
    Portability is a good point. But it's also a trade-off between how much do you accept to become your own work (or saving work for yourself by using others work like obstacks or malloc) against how many operating systems you would like to reach. If you like to reach Linux only then you can go with obstacks. If you like to reach more Unix based operating systems then you can still use malloc because it's defined to be a part of the "libc"-library. Just out of curiosity: What do you c programmers do if you are programming something for Windows? If malloc is a part of "libc" then it means that it's not available in Windows, right!? Or is the library included in the machine code in every program for Windows which is compiled by a c compiler?

    Anyway my fall back would be to use the "break"-system call in Unix based operating systems and the "VirtualAlloc"-function in Windows and ontop of this I would write own memory managment facilities which then are operating system independend. That is if my program should run in Linux and Windows and does a lot of allocation and deallocation. Otherwise I wouldn't take the time to write memory management facilities. At least no sophisticated ones like the heap managment of Windows or obstacks.

    Quote Originally Posted by nburns
    What I'd really like is to see someone reimplement the useful parts of obstacks with a very permissive license (preferably public domain).
    I think that one major drawback of the GNU GPL is that you can not write code that is in the public domain but depends on GPLed code because you would need to place your code under the GPL too (because it depends on GPLed code). And GPLed code cannot make use of any code which is not placed under the GPL, exept if this code is not a part of what you are distributing (because it is beeing distributed together with GPLed code). So you would need to force the user to install something extra which he needs to download from somewhere else and probably also compile first.

    If someone reimplements the useful parts of obstacks with no license but places it in the public domain then it would be a step forward in getting rid of the GPL infection. But the Linux-software-world is so heavily overrun by GPLed software so quite a lot of software couldn't make use of the public domain version. So the public domain version would mainly be for the non GPLed software, right?
    Last edited by just a worm; 27th December 2014 at 03:24.

  12. #11
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    just a worm,

    I wasn't necessarily suggesting you use GNU obstacks per se, mostly just wondering if that sort of thing would do what you need.

    I infer from things that you've said that

    (1) you have an app allocates and deallocates a whole lot of objects in a mostly stacklike way, so many of them that a few instructions extra overhead for using heap allocation/deallocation is too much, AND

    (2) the volume of allocated and deallocated objects is large relative to the amount of objects that are live (allocated but not deallocated yet) at a given time (most objects are short-lived), AND

    (3) it nonetheless at least sometimes has a whole lot of objects on that conceptual stack, so that they require a bunch of RAM to hold without paging, AND

    (4) your app has high memory demands overall---lots of data resident in memory, and you're trying to avoid paging it out, or paging too much of it out and running out of swap (pagefile) space, AND

    (5) you think that the data with stacklike extent (strictly nested lifetimes) is likely mostly garbage (already freed) when your memory demands are highest, AND

    (6) you therefore want to reclaim pages used for the stacklike data relatively promptly, after a bunch whole bunch of objects is deallocated, so that the OS knows not to keep those pages in RAM, gets the space back pretty quickly, and can avoid paging something else out.

    If any of these things are NOT true, I don't understand your problem or why you're looking at optimizing the time AND space used for your data that have stacklike extent.

    For many programs, there's a lot of short-lived data with stacklike extent (nested lifetimes), but a good general purpose allocator with deferred coalescing (using "quick lists") will usually do the trick---you can allocate and deallocate a few hundred or a few thousand objects, and all the allocator does is push and pop the lists for the relevant size classes. (It usually won't trigger the allocator's general purpose and more costly features that merge adjacent free areas together, only to carve them up again later when more objects like that are created.)

    The quick list scheme acts as a buffer in front of the general-purpose allocator---you can allocate a small or moderate amount of stuff and free it, and the allocator will mostly just pop and push free lists, but if you allocate a huge volume of stuff that's all live at once, and then free it all, you'll pay a higher price. (You may be able to adjust the allocator to avoid that, setting a high threshold for coalescing adjacent free blocks by letting the quick lists get pretty long.)

    --

    It's not particularly difficult to implement a simple obstack-like scheme yourself, using mmap and munmap (or Windows rough equivalents).

    It's pretty easy if you can memory-map big enough contiguous chunks of address space to hold your maximum volume of live stacklike data.

    There's a subtlety to this, though. In general you don't want to unmap the pages holding the top of your stack as soon as the data are popped. You want to hang onto those pages for a while, in case the stack immediately grows right back up into that area. You want some hysteresis (delayed response) to avoid uselessly unmapping pages only to have to map them again right away, because the mmap operations likely take thousands of instruction cycles.

    Unless you have the right combination of features in your program, it may not be worth mapping and unmapping pages. You may just need more memory, and if so, you may be better off just linking to a good allocator like Doug's.

    Are all of your objects with stacklike extent of the same size, or a few sizes? Are the sizes strongly correlated (e.g., always allocating two objects of size 24 bytes and then one of size 16 bytes)? Or are they a a whole bunch of sizes, or unpredictably interleaved sizes, or what?

    That sort of thing can be important in determining the easiest way to optimize allocation, beyond linking with a well-behaved allocator.

  13. #12
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    BTW, anybody interested in these kinds of issues should probably read our 1995 survey on memory allocation, which is still the standard reference for more or less single-threaded allocation and deallocation:

    ftp://ftp.cs.utexas.edu/pub/garbage/allocsurv.ps

    It covers qualitatively different lifetime patterns, why a good general-purpose allocator usually beats custom allocation and when it doesn't, and the insufficiency of Markov models for studying the intensely regular allocation/deallocation patterns most real programs exhibit.

    Here's a later critique of custom memory allocators, showing why most of them are a mistake vs. a well-designed general-purpose allocator, by a former student and some other folks:

    http://people.cs.umass.edu/~emery/pu...oopsla2002.pdf

    If people are interested in issues around multithreaded/multiprocessor allocation, check out Emery Berger's Hoard allocator. (On which the default Windows allocator is (or was) based, but I haven't kept up with recent developments.)

  14. #13
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by Paul W.
    I infer from things that you've said that
    (1) ...
    (2) ...
    Then I gave a wrong impression. I am working on an algorithm which needs the following memory buffers:
    (a) (4 or 6) * size of the payload
    (b) 2 * 256 kilobytes
    (c) (3 or 4) * 1 kilobyte
    The algorithm is implemented in an operating system independend library. That is a *.dll file for Windows and a *.so file for Linux but always containing exactly the same machine code, exactly the same function names, the same parameters etc. Only the file format is different. So the caller has to provide all buffers (= parts of the memory where the function is allowed to write to) because the library can't call kernel32.dll.HeapAlloc or an other operating system specific function if the same machine code is also executed in Linux. But I would like to keep the parameter count low so my intention was to only request buffer "(a)". Buffer "(c)" is not necessary because I just use the stack the data. But buffer "(b)" was the trigger for my wish to shrink the stack again.

    I found a solution by now to get the memory consumption down so that buffer "(b)" is not necessary anymore.

    But since the question arised to me I was still interested in how the memory management of the stack works. But it seems as nowadays the most programmers aren't familiar with the low level parts of programming anymore. It probably was more common back in the time when operating systems like Windows 95 just came on the market, people where still using DOS and Linux just learned walking.

    Anyway I think that I got the most of the necessary information together by now.

    Thanks for your contributions.
    Last edited by just a worm; 29th December 2014 at 03:46.

  15. #14
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Paul W. View Post
    It's not particularly difficult to implement a simple obstack-like scheme yourself, using mmap and munmap (or Windows rough equivalents).

    It's pretty easy if you can memory-map big enough contiguous chunks of address space to hold your maximum volume of live stacklike data.

    There's a subtlety to this, though. In general you don't want to unmap the pages holding the top of your stack as soon as the data are popped. You want to hang onto those pages for a while, in case the stack immediately grows right back up into that area. You want some hysteresis (delayed response) to avoid uselessly unmapping pages only to have to map them again right away, because the mmap operations likely take thousands of instruction cycles
    That would be the same reason the OS kernel doesn't immediately deallocate pages when the program stack shrinks.

    Unless you have the right combination of features in your program, it may not be worth mapping and unmapping pages. You may just need more memory, and if so, you may be better off just linking to a good allocator like Doug's.
    The malloc implementation in glibc uses mmap automatically when you request large chunks. So you can probably trust malloc to get the memory in a sensible way.

    Are all of your objects with stacklike extent of the same size, or a few sizes? Are the sizes strongly correlated (e.g., always allocating two objects of size 24 bytes and then one of size 16 bytes)? Or are they a a whole bunch of sizes, or unpredictably interleaved sizes, or what?

    That sort of thing can be important in determining the easiest way to optimize allocation, beyond linking with a well-behaved allocator.

  16. #15
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Quote Originally Posted by nburns View Post
    That would be the same reason the OS kernel doesn't immediately deallocate pages when the program stack shrinks.
    It's one reason, but last I checked (many years ago now), some OS's didn't handle mmapping and unmapping or mprotecting stack regions reliably, if at all. Interrupt handling code may not work right if you try to save state into a protected stack page, or that might work but saving into an unmapped page may not. (Especially on SPARCs, where the register window hardware hardware makes getting it right very tricky. You're likely to crash the whole OS, not just your process, and really annoy other users if it's a shared machine... which it was for me, not having my own SPARCs at the time.)

    OS maintainers often ignore bug reports about such things---I've submitted a few---because very few programs actually do this. And of course, very few programs actually do this, because OS maintainers won't fix the bugs that make it NOT WORK.

    My information may be way out of date now, but a decade or so ago, I got disgusted with the bugs in Linux and Mach that made interesting uses of mmap not work as they were supposed to. Maybe mmap and munmap work right on stack segments now, but I wouldn't count on it.

    For a long time, you couldn't even count on fine-grained uses of mprotect not being stupidly slow, in both Linux and Mach, even for non-stack pages. Both kernels used linear lists to manage contiguous ranges of address space with the same mapping and protections, which of course gets slow if you have thousands of unmapped or protected pages scattered around.

    Again, the OS maintainers didn't generally care much---most programs only have at most a dozen few contiguous ranges of same-mapping-and same-protection pages, so it doesn't matter much.

    (Eventually Linux got AVL trees for mappings, but then Linus took them out because they didn't matter to benchmarks, and replaced them with linear lists, but eventually enough people squealed that he put them back in, so people could use lots of guard pages to detect buffer overruns, etc. I think they've stayed in since then, but I haven't checked lately...)

    That sort of thing made me give up on developing a library that could virtualize virtual memory at the user level. (So that programs could cleanly do things like collecting page reference traces of themselves, or manage distributed paging and process migration, or whatever.) Bummer.

  17. #16
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Paul W. View Post
    That sort of thing made me give up on developing a library that could virtualize virtual memory at the user level. (So that programs could cleanly do things like collecting page reference traces of themselves, or manage distributed paging and process migration, or whatever.) Bummer.
    I'm not sure, but that may be what libsigsegv is for.

  18. #17
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    libsigsegv does some similar stuff. I hadn't looked at that project in a long time---thanks for the link.

    We were doing some similar stuff, but more cleanly stackably (unless libsigsegv is doing more than I think), so that you could have multiple layers of memory abstractions, e.g. a checkpointing layer stacked on top of a self-tracing layer, or vice-versa. (Depending on whether you wanted to checkpoint a traced process or trace a checkpointed process). We were also doing some stupid linker tricks to avoid most common gotchas---e.g., system calls like read() writing to protected pages and having that Just Not Work. (The libsigsegv page mentions that problem, which is a biggie, and it doesn't sound like they've fixed it.) We'd patch system calls to call our own wrappers, which played nice with the underlying virtual memory and signal-handling stuff. (E.g., making sure that buffers about to be written to were not protected when actually written to.) The idea was to provide layerable abstractions of memory (and related stuff about low-level libraries), like "mixin layers" in generative programming. (Mixin layers are a structured and better-behaved alternative to multiple inheritance, using parameterization and single inheritance to make clear what transformations are being applied in what order.)

    These days it'd be somewhat easier---just target x86 Linux and its usual linkers and libraries, and you can go a long way.



    Eventually it got too complicated and frustrating, given the different kernels and compilers and libraries and linkers we were dealing with, and all their unadvertised limitations and bugs, and we gave up. Like Sartre said, Hell is other people. (At least if you're dealing with mostly other people's somewhat broken code, and they won't fix it.)

    IMO that's the worst thing about systems research. You can invent something new and different and better, but it doesn't matter if you can't interface it to other people's crufty, broken, old-school messes, at least enough to run a few benchmarks with Real Programs, and preferably well enough that people can really use your stuff and demonstrate its real-world value. So you spend ten percent of your time developing something new and cool and better, and 90 percent of your time making it compatible with ugly, obsolete, and/or broken stuff.

Similar Threads

  1. Call stack compression
    By Piotr Tarsa in forum Data Compression
    Replies: 0
    Last Post: 15th November 2012, 13:57
  2. Windows 7 SP1 (Feb-22-2011)
    By Raymond_NGhM in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 24th February 2011, 09:52
  3. Non Windows or Linux compressors
    By Earl Colby Pottinger in forum Data Compression
    Replies: 6
    Last Post: 8th April 2010, 16:26
  4. GCC 4.4.1 for Windows
    By Bulat Ziganshin in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 16th January 2010, 00:39
  5. RINGS 1.6 [Windows + Linux] version !
    By Nania Francesco in forum Data Compression
    Replies: 2
    Last Post: 17th August 2009, 02:38

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •