I have always wanted to know this. How does directory recursion work without overflowing the stack, as the directory can have n levels of depth?
Because we have 8GB of RAM on most machines these days, the stack won't really run out of memory until you fill that up.
The OS does a bit of virtual-memory magic to move the stack around physically, while the programmer only has to think of logical code addresses. But ultimately, the OS can move the stack around (even to the hard drive temporarily with "swap"). And the OS is very good at moving memory around and using it efficiently, be it Linux or Windows.
Well, yes and no. Maximum stack size is typically capped much lower, but a user with sufficient permissions can raise the cap. On the Debian system I am posting this from, it appears the default stack size limit is 8MB.
You would need a very deep directory hierarchy to exceed 8MB in a recursive traversal. Each additional level is probably going to add only a few hundred bytes to your stack at most. Good C/C++ programmers will avoid creating large data structures on the stack; if you need to allocate something big, that should go on the heap.
The stack in a function is the register space and the stack outside the function (global variables) is the heap, correct?
No, there is A register (the stack pointer) which keeps track of your current location in the stack, but the stack itself is stored in RAM. The heap is a subset of your global variables; not all global variables are part of the heap.
The difference between stack and heap is that the stack is accessed last-in, first-out (via the above mentioned stack pointer), whereas Items on the heap are created and destroyed dynamically as the program runs, and are accessed randomly via ad-hoc pointers. Non-heap global variables typically have a fixed location in memory, and exist for the entire lifetime of the program.
It depends on the language also. In C, that sounds correct. Variables declared within a function are stored on the stack while globals are stored on the heap. Additional memory you request from the OS typically comes from the heap as well. On what device that data is physically being stored(main memory vs hard drives vs toaster ovens) is an implementation detail. Compare that to a language like Python which does memory management for you and things may be different. In all of the Python apps I've written over the past few months my data could be on the stack, or the heap, or some combination of both. I've got no idea.
That's not to say the inner workings of a high level language like Python are unimportant. It may be required of you to know that stuff in order for your application to behave as expected, but it's not necessarily required by the language just to get things up and running.
While you are not accessing the "bare metal" stack and heap directly in Python, the interpreter does have its own stack and heap abstractions. So most of the same concepts still apply, but the implementation is "virtual" instead of mapping directly to hardware. By doing this, the Python interpreter can protect you from common C/C++ coding errors like uninitialized pointers, heap memory leaks and double-frees, etc... and you get helpful diagnostics when things do go wrong, instead of cryptic UAEs/segfaults.
I was under the impression that local variables meant faster access to data and global variables meant higher latencies. I've made a lot of global variables in my C programs for the ease of accessing them from any function without having to pass values but if my code were reviewed, it would be declared a big heap of stink coz you are not supposed to go global unless absolutely necessary.
It is somewhat CPU architecture dependent, but in general, no there should not be a significant performance penalty for global variables. Assuming we're talking about a compiled language like C/C++, a local variable is typically accessed at a constant offset from the current stack pointer value. A global variable is accessed at a constant offset from the program's data segment starting address. Either way, CPU is loading the offset value and adding it to the base address to get the address of the variable. Global might be SLIGHTLY slower if the offset value is wider (i.e. 64-bit vs. 32-bit), or if it results in a cache miss (unless the global is a frequently accessed one, items near the top of the stack are more likely to be in cache), but the difference is going to be negligible in most cases.
The reason use of globals is discouraged is not because of efficiency, it it is because in larger programs it results in code which is harder to debug, maintain, and re-use. Think of it this way: every global variable is a potential input or output to every function. Hidden side effects which are not related to the function arguments or return value make program behavior more difficult to analyze.
Edit: In a multi-threaded program, lock contention for globals can potentially become a serious performance killer, but this isn't directly related to how the variable is accessed at the hardware level.