Memory Regions - Heap
#memory | #lowlevel | #cache
The heap is used for dynamic memory allocation. This is where memory is allocated during runtime using languages' allocation mechanisms (e.g., new
in Java, malloc()
, calloc()
, realloc()
in C). Unlike the stack—which automatically manages plates, you must manually request and release memory space. If you don't avoid extra excessive allocations or if you forget to clean up, you'll ended up understanding what they mean by memory leak.
Key characteristics
- Manual allocation and deallocation if a garbage collector is not available;
- Larger and more flexible than the stack, but slower to access;
- Memory is allocated dynamically and can grow as needed (within system limits);
- Objects or data allocated on the heap stay there until they are explicitly freed;
- When a program exits, the operating system automatically reclaims all allocated heap memory;
Limitations
- The objects in the heap are not automatically freed if a garbage collector is not available;
- Risk of memory leak exception if used size excceds available memory;
- Possibly the slowest memory region to work with, not counting virtual memory;
- High pressure on a garbage collector (when available) may also be a problem;
Example 1
#include <stdio.h>
#include <stdlib.h>
int main() {
int *ptr = (int *)malloc(sizeof(int)); // Allocate memory for an int
if (ptr == NULL) {
printf("Memory allocation failed!\n");
return 1;
}
*ptr = 42; // Assign a value
printf("Heap value: %d\n", *ptr);
free(ptr); // Free memory
return 0;
}
How to prove?
To demonstrate how memory is allocated, we can use gdb. While we could simply set breakpoints at lines 5 and 12 and inspect the value of the ptr
pointer at those points, let's take a different approach. Instead of just observing values, we'll dig a bit deeper and have some fun exploring lower-level concepts.
That said, we won't be breaking into malloc
or free
to investigate language internals—those are beyond the scope of this blog post. For now, we'll focus on application-level memory management concepts.
Compile the program with the debug symbols -g
and no optimizations -O0
;
gcc -g -O0 heap.c -o heap
Now, run the program with gdb by doing gdb heap
and set a breakpoint at the main
function. You can use break main
or b main
command after starting the gdb session. Then, run the program with the run
command.
After the program stops at the breakpoint, you can disassemble the main
function using the disassemble
or disas
command. This will show you the assembly code for the main
function, including the calls to malloc
and free
.
(gdb) disas
Dump of assembler code for function main:
0x0000aaaad8d50854 <+0>: stp x29, x30, [sp, #-32]!
0x0000aaaad8d50858 <+4>: mov x29, sp
=> 0x0000aaaad8d5085c <+8>: mov x0, #0x4 // #4
0x0000aaaad8d50860 <+12>: bl 0xaaaad8d506c0 <malloc@plt>
0x0000aaaad8d50864 <+16>: str x0, [sp, #24]
0x0000aaaad8d50868 <+20>: ldr x0, [sp, #24]
0x0000aaaad8d5086c <+24>: cmp x0, #0x0
0x0000aaaad8d50870 <+28>: b.ne 0xaaaad8d50888 <main+52> // b.any
0x0000aaaad8d50874 <+32>: adrp x0, 0xaaaad8d50000
0x0000aaaad8d50878 <+36>: add x0, x0, #0x8e0
0x0000aaaad8d5087c <+40>: bl 0xaaaad8d506f0 <puts@plt>
0x0000aaaad8d50880 <+44>: mov w0, #0x1 // #1
0x0000aaaad8d50884 <+48>: b 0xaaaad8d508b8 <main+100>
0x0000aaaad8d50888 <+52>: ldr x0, [sp, #24]
0x0000aaaad8d5088c <+56>: mov w1, #0x2a // #42
0x0000aaaad8d50890 <+60>: str w1, [x0]
0x0000aaaad8d50894 <+64>: ldr x0, [sp, #24]
0x0000aaaad8d50898 <+68>: ldr w0, [x0]
0x0000aaaad8d5089c <+72>: mov w1, w0
0x0000aaaad8d508a0 <+76>: adrp x0, 0xaaaad8d50000
0x0000aaaad8d508a4 <+80>: add x0, x0, #0x900
0x0000aaaad8d508a8 <+84>: bl 0xaaaad8d50710 <printf@plt>
0x0000aaaad8d508ac <+88>: ldr x0, [sp, #24]
0x0000aaaad8d508b0 <+92>: bl 0xaaaad8d50700 <free@plt>
0x0000aaaad8d508b4 <+96>: mov w0, #0x0 // #0
0x0000aaaad8d508b8 <+100>: ldp x29, x30, [sp], #32
0x0000aaaad8d508bc <+104>: ret
End of assembler dump.
Before we begin analyzing the disassemble of this program, it's important to clarify that the memory regions used to store these variables are dynamic. These regions are part of what's known as User Space. For a visual reference, please consult the ASCII memory layout diagram in the first blog of this series.
Dynamic memory regions are allocated at runtime, meaning they do not exist at compile time. These variables are not stored in the typical static memory sections such as .bss
, .data
, or .rodata
. Instead, they are created and managed dynamically during the program's execution, often via heap or stack allocation.
Additionally, in many cases, especially for temporary or frequently accessed data, the CPU's registers (which we'll explore in more detail in upcoming posts) are used to store these values. Registers provide the fastest access and are heavily utilized during function calls, arithmetic operations, and memory address calculations.
In contrast, static memory regions (which we'll also explore in more detail in upcoming posts) are determined at compile time and reside in a different part of the process memory. These sections typically cannot be modified during execution and serve different purposes, such as storing global variables, constants, or uninitialized data.
Understanding the distinction between dynamic and static memory spaces is crucial for interpreting program behavior, debugging, and writing efficient code.
Disassembly overview
Notice that I'm doing this disassemble from an Apple Silicon Mac M1, which uses the ARM64 (AArch64) architecture. The instructions may vary slightly on different architectures, but the general idea remains the same.
Let's break the disassemble down and walk through the relevant part of the disassembly with *ptr
variable and its values in mind. And "values" with the "s" because we're talking about dynamic memory allocation, which means that the value of *ptr
can change during the program's execution.
0x0000aaaad8d50854 <+0>: stp x29, x30, [sp, #-32]!
0x0000aaaad8d50858 <+4>: mov x29, sp
This is the function prologue, standard in ARM64. The stp x29, x30, [sp, #-32]!
allocates 32 bytes of stack space, stores the old frame pointer and return address. The mov x29, sp
sets up the new frame pointer. This is when stack space is allocated for local variables—including the int *ptr
.
0x0000aaaad8d5085c <+8>: mov x0, #0x4
0x0000aaaad8d50860 <+12>: bl 0xaaaad8d506c0 <malloc@plt>
0x0000aaaad8d50864 <+16>: str x0, [sp, #24]
Here, the instruction mov x0, #0x4
prepares the argument for the malloc
function, indicating that we want to allocate 4 bytes of memory (the default size of an int
). The call to malloc
is made via the bl
instruction (branch with link), which saves the return address in x30
and jumps to the function.
The return value of malloc
(a pointer to the newly allocated memory) is stored in register x0
, then saved onto the stack at [sp + 24]
. This marks the moment the pointer is stored, but the memory is already reserved before this point, right when malloc
returns.
So, the space for the int
is reserved at runtime during the malloc
call, which dynamically allocates a 4-byte block on the heap. The pointer to this memory is then stored in a local variable (ptr
), backed by the stack frame at [sp + 24]
.
Utilized regions
Memory Region | Purpose | Example |
---|---|---|
Heap | Dynamic memory | malloc(4) result stored in x0 |
Stack | Local pointer storage | str x0, [sp, #24] |
Registers | Temporary values | mov x0, #4 , mov w1, #0x2a , etc. |
In this example, the heap is used to dynamically allocate memory for an int
variable. The stack stores the pointer to this allocated memory, while registers hold temporary values and function arguments. This demonstrates how different memory regions are used together in a typical C program.
The memory referenced by x0
points to the heap block, which must be managed manually. Heap memory is not automatically released and can persist beyond the function scope unless explicitly freed. Registers, on the other hand, are used for fast, short-lived data access. They are the fastest form of storage, but are limited in number and their contents are volatile.
Value assignment
Now, let's look at the part of the disassembly where the value is assigned to the dynamically allocated memory. This happens after the malloc
call, when we have a valid pointer to the allocated memory block.
0x0000aaaad8d50888 <+52>: ldr x0, [sp, #24]
0x0000aaaad8d5088c <+56>: mov w1, #0x2a // #42
0x0000aaaad8d50890 <+60>: str w1, [x0]
The value 42
is written into the heap-allocated memory through a series of instructions:
Step | Code / Instruction | What happens |
---|---|---|
1 | ldr x0, [sp, #24] |
Loads the pointer from the stack into x0 . This pointer references the memory block previously allocated by malloc . |
2 | mov w1, #0x2a |
Loads the integer constant 42 into register w1 . This value is intended to be stored in the heap block. |
3 | str w1, [x0] |
Stores the value 42 from w1 into the memory location pointed to by x0 , completing the heap write. |
At this point, the dynamically allocated heap memory now contains the value 42
, and *ptr
(the dereferenced pointer) reflects this assignment. So the assignment of 42 to the allocated int occurs explicitly in this str
instruction.
Step | Code / Instruction | What happens |
---|---|---|
1 | mov x0, #0x4 |
Prepares the size for allocation |
2 | bl malloc@plt |
Reserves 4 bytes on the heap, returns pointer in x0 |
3 | str x0, [sp, #24] |
Stores pointer to stack (as ptr ) |
4 | ldr x0, [sp, #24] |
Loads pointer back into register |
5 | mov w1, #0x2a |
Loads 42 into register |
6 | str w1, [x0] |
Writes 42 to the heap-allocated memory |
Cleaning up the heap
After using dynamically allocated memory, it's essential to free it to avoid memory leaks. In languages like C, memory management is manual—you must explicitly release heap memory using free
. In contrast, languages with a garbage collector (like Java, Go, or Python) automatically reclaim unused memory, making manual cleanup unnecessary in most cases.
0x0000aaaad8d508ac <+88>: ldr x0, [sp, #24]
0x0000aaaad8d508b0 <+92>: bl 0xaaaad8d50700 <free@plt>
0x0000aaaad8d508b4 <+96>: mov w0, #0x0 // #0
0x0000aaaad8d508b8 <+100>: ldp x29, x30, [sp], #32
0x0000aaaad8d508bc <+104>: ret
The following table shows the cleanup process for the heap memory allocated earlier in the program in detail. This is crucial to prevent memory leaks and ensure efficient memory usage.
Step | Code / Instruction | What happens |
---|---|---|
1 | ldr x0, [sp, #24] |
Loads the pointer to the heap-allocated memory (previously stored) from the stack into register x0 . |
2 | bl free@plt |
Calls the free function, passing the pointer in x0 to deallocatethe heap memory. |
3 | mov w0, #0 |
Prepares the return value of the main function.Returning 0 indicates successful execution. |
4 | ldp x29, x30, [sp], #32 |
Restores the frame pointer (x29 ) and return address (x30 )from the stack frame created during the function prologue at the beginning, and adjusts the stack pointer to unwind the stack frame. |
5 | ret |
Returns control to the caller using the return address stored in x30 . |
The free
function deallocates the memory previously allocated by malloc
. This is crucial to prevent memory leaks, which can occur if you forget to free memory that is no longer needed.
Warn
In manual memory-managed languages like C, for every malloc
(or calloc
, realloc
), there must be a corresponding free
. Failing to do so results in memory leaks, which can degrade performance or crash long-running programs.
The cost of convenience
In managed languages like Java, C#, Go, and Python, memory management is largely handled by an automatic Garbage Collector (GC). This system tracks which objects are still being used and automatically reclaims memory from those that are no longer reachable. This removes the need for manual memory deallocation, making programming much safer and more convenient—especially in large or complex projects.
Automatic memory management dramatically reduces the risk of memory leaks and pointer-related bugs, which are common pitfalls in manual environments like C. Developers can focus on business logic rather than wrestling with memory lifecycles. However, this convenience comes at a cost: garbage collectors introduce runtime overhead. They periodically pause the program to identify unused memory, which can impact performance, especially in systems with real-time constraints or limited resources.
Similarly, object-oriented programming (OOP), which is often used in these managed languages, introduces a layer of abstraction that simplifies design and code reuse. Concepts like inheritance, polymorphism, and dynamic dispatch enable powerful software architecture patterns. But these features also add small runtime costs, such as virtual table lookups and slightly higher memory usage per object. Compared to C's imperative style—which maps directly to the hardware and has minimal abstraction—OOP can be slightly heavier in both memory and CPU cycles.
For general-purpose computing, the trade-off is well worth it. But in memory-constrained or performance-critical systems, such as embedded devices, this overhead matters. That's why low-level languages like C (and newer alternatives like Rust) are preferred in those environments—they offer precise control with minimal runtime cost, though they require the developer to manage memory and structure more manually.
Example 2
// WARNING: Bad code - segmentation fault
#include <stdio.h>
int main() {
char *ptr = "hello";
ptr[0] = 'H'; // Attempt to modify a string literal
printf("%s\n", ptr);
return 0;
}
This is a classic "gotcha" for beginners messing around with strings in C. At first glance, it might look like we're working with the heap here—but that's not the case. The line char *ptr = "hello";
makes it seem like ptr
points to a regular, writable string. But under the hood, that string literal is actually stored in a read-only section of memory, not on the heap.
So when the next line tries to change 'h'
to 'H'
, the program crashes with a segmentation fault. Why? Because you're trying to write to memory that was never meant to be changed. It's like trying to scribble on a laminated sign—you just can't.
This kind of bug can be super confusing, especially when you're still wrapping your head around how memory works in C. Heap, stack, data, text… it's a lot! And if you don't know how the program's memory layout is organized, it's easy to assume all pointers are pointing to similar types of memory.
If you've ever been bitten by this, don't worry—it happens to almost everyone learning C. And once you understand what's going on under the hood, it starts to make a lot more sense.
What's even cooler? There's a slick little trick you can do with gdb
to make this exact program work—without touching a single line of the source code. I shared the full details in this tweet, but here's the gist: by manually editing the pointer on the stack before the program tries to write to it, you can redirect it to a malloc
-allocated (and writable!) buffer instead.
Fun fact
This is a classic example of how powerful low-level debugging can be. By understanding the memory layout and using tools like gdb
, you can fix bugs on the fly without needing to recompile or change the source code. Very useful for quick fixes in embedded systems or when working with legacy code where you can't easily change the source.
In Brazilian Portuguese, there is a popular slang term for "stolen" called "malocado", which is very similar to the pronunciation of this "malloc-allocated". I often use it to make a joke about a bug that was fixed by a low-level hacker like this one. So, you could say that this bug was "malocado"—fixed with a little heap magic :)
Instead of crashing when trying to modify a string literal in read-only memory, the program runs just fine. It thinks it's editing the original string, but it's actually writing to safe, heap-allocated memory. All done live at runtime with the help of gdb
. No rebuilds. No source code changes. Just clean debugger magic.
It's a neat little hack that not only fixes the bug but also shows how powerful low-level debugging can be when you really understand how memory works. Definitely check it out if you want to learn something cool today.
Final thoughts
Programming is a bit like cooking—knowing your ingredients (stack, heap, registers) and when to use them makes all the difference. Whether you're juggling pointers in C, trusting the garbage collector in a managed language, or hacking your way around memory for fun (yes, we saw that!), understanding how things work under the hood gives you a superpower most devs only dream of.
So keep exploring, keep breaking (and fixing!) things. And if you do end up with a segfault, just remember: it's not a bug, it's a feature! A feature that teaches you more about memory management, pointers, and the inner workings of your programming language. Embrace it, learn from it, and move on. Happy coding!