“Doctor, my stack is overflowing! What does it mean?! Is there a cure?!” The Doctor frequently sees questions like these, and he realizes it’s time for a general article on the subject of stack allocation, as well as the other memory allocation types, static and dynamic.
“Everybody’s got to be somewhere,” the saying goes. And so it is with your program’s data – it has to live somewhere in memory while it is being referenced (registers are a special kind of memory we won’t get into here.) The compiler, linker and operating system work together to determine exactly where in memory a piece of data is to reside. The simplest method of assigning locations is “static allocation”, where the data is assigned a fixed (static) address by the compiler and linker in the executable image (EXE). For example, if variable X is statically allocated at address 4000, it is always at address 4000 when that EXE is run, no matter what else is going on in the system. (DLLs can also have static data – it is allocated at a fixed offset from the base address where the DLL gets loaded.)
Static allocation is simple from the compiler’s perspective because all that is needed is to create a list of variables that need allocation, and lay them down in memory one after the other. A run-time advantage of static allocation is that it is usually easy and fast to access a fixed address and statically allocated data can be used from anywhere in the program. But static allocation has disadvantages too. First, if you have any reentrant or parallel code, the multiple code streams are both trying to use the same data, which may not be wanted. Second, if you have many routines which need a lot of memory just while they’re executing, the available address space can fill up quickly (for example, ten routines each of which declares a 1000×1000 REAL(8) array need a total of 80,000,000 bytes just for those arrays.) And perhaps most important, with static allocation you must know at compile-time how much memory you will want.
Up through Fortran 77, the Fortran standard was carefully written in a way so that static allocation was the only method needed. Even today, static allocation is the most widely used method – in Visual Fortran, COMMON blocks and most variables with the SAVE attribute are allocated statically. (Note that Compaq Visual Fortran, by default, implies SAVE for local routine variables unless it can see that the variable is always written before it is read.) [Intel Visual Fortran does not imply SAVE for local variables – you must specify that with a SAVE statement or use the /Qsave option if you want that.]
Dynamic allocation is the complete opposite of static allocation. With dynamic allocation, the running application must call a system routine to request a particular amount of memory (for example, 1000 bytes). The system routine looks to see if that request size is available in the collection (“heap”) of memory segments it has available. If the request can be satisfied, a range of memory addresses is marked as used and the starting address is returned to the program. If the heap is empty, the operating system expands the virtual address space of the process to replenish the heap, stopping only if there is no more virtual memory available. The program stores the base address in a pointer variable and then can access the memory. When the program no longer needs the memory, another system routine is called to “free” it – return it to the heap so that it can be used again by a future allocate call. You can think of this as similar to borrowing money from a bank, and then later paying it back (except that there’s no interest!)
The big advantage of dynamic allocation is that the program can decide at run-time how much memory to get, making it possible to create programs that can accommodate problems of any size. You are limited only by the total amount of virtual memory available to your process (a little less than 2GB in 32-bit Windows) and, as long as you keep your pointers separate, your allocation is separate from others in the application. However, if your program “forgets” to free the allocated memory, and no longer has the pointer through which it is referenced, the allocated memory becomes unusable until the program exits – a “memory leak”. Also, the allocate/free process can be slow, and accessing data through pointers can itself reduce run-time performance somewhat.
In Fortran, the ALLOCATE statement performs dynamic allocation, with DEALLOCATE being the “free” operation. In Visual
Fortran, one can use dynamic allocation in other ways, such as the C-style malloc/free routines, or by calling Win32 API routines to allocate memory.
Stack allocation appears to be the least understood of the three models. The “stack” is a contiguous section of memory assigned by the linker. The “stack pointer” is a register (ESP in the X86 architecture) which holds the current position in the stack. When a program starts executing, the stack pointer points to the top of the stack (just above the highest-addressed location in the stack. As routines are called, the stack pointer is decremented (subtracted from) to point to a section of the stack that the routine can use for temporary storage. (The previous value of the stack pointer is saved.) The routine can call other routines, which in turn create stack space for themselves by decrementing the stack pointer. When a routine returns to its caller, it cleans up by simply restoring the saved stack pointer value.
The stack is an extremely efficient way of creating “scratch space” for a routine, and the stack plays a prominent role in the mechanism of calling and passing arguments to routines. Visual Fortran uses the stack to create space for automatic arrays (local arrays whose size is based on a routine argument) and for temporary copies of arrays used in array expressions or when a contiguous copy of an array section must be passed to another routine. The problem is, however, that the total amount of stack space is fixed by the linker [on Windows], and if a routine tries to allocate more space than the stack can hold, the dreaded “stack overflow” error occurs.
On some other operating systems, OpenVMS for example, the OS can extend the stack as needed, limited only by the total
amount of virtual address space available. On Windows, however, the stack allocation is determined by the linker and defaults to a paltry 1MB in the Microsoft linker. You can change the allocation – for details, see the on-disk documentation topic “Stack, linker option setting size of” – but this works only for executable images (EXEs.) If you are building a DLL, it
doesn’t matter what you set the stack size to – it is the size specified by the EXE that calls your DLL, (for example, VB.EXE), which is used.
So, what can you do if changing the stack size is not an option? Reduce your code’s use of the stack. Replace automatic
arrays with allocatable arrays and ALLOCATE them to the desired size at the start of the routine (they will be automatically
deallocated on routine exit unless marked SAVE.) If passing a noncontiguous array section to another routine, have the called routine accept it as a deferred-shape array (an explicit interface is required). Future versions of Visual Fortran may allocate large temporary values dynamically rather than using the stack, but for now, being aware of the limits of
stack allocation is important.
[Edit January 11, 2008] An update to Intel Visual Fortran 9.1 added the /heap-arrays option which tells the compiler to use the heap (dynamic allocation) for arrays that it would otherwise put on the stack. This can be handy if you can’t avoid the stack arrays otherwise. It does add a slight performance penalty for the allocate and deallocate, but applications processing large arrays probably would not notice. See the documentation for more details.]
(From Intel Developer Zone, copied with permission)
Thanks Steve for the valuable post. You should consider turning these posts into a book with a title like “Dr. Fortran’s recommendations for code health and longevity”.
Also a question: Suppose there is a procedure that has to do a fairly large amount (100-1000MB) of derived-type component allocations and this procedure is to be called on the order of 100-1000 times. Is it more sensible to write a class with a constructor to allocate the components once for the lifetime of the program and use the allocated components inside the class methods for any future calculations? or keep the OOP out and implement a function that returns each time a derived type object with all the allocations being done inside the function every time it is called?
I tried and compared both approaches with Intel ifort but did not notice any major differences between the two (functional approach was faster a bit). I am curious to know your thoughts or any recommendations you may have. Thanks in advance.
Which way is clearer to you? Do it that way. As my longtime readers know, I am not a fan of contorting a program design because one THINKS it may be faster. Code it the way that makes sense, then test the performance. Is it adequate? You’re done.
If not, use a profiling tool such as Intel VTune and see where the program is spending a lot of time – it is probably not where you think.
Of course, you should keep in mind basic tenets, such as keeping memory accesses close together (column-major indexing, Structure-Of-Arrays or Array-Of-Structures), avoid the need for temporary copies, etc. For small numbers of calls such as you describe, allocation overhead is noise.
As you gain experience, you’ll develop a sense of what tends to work best, but always keep maintaibility in mind.