Feb 21, 2025

Memory Regions - Heap

The heap is used for dynamic memory allocation. This is where memory is allocated during runtime using languages' allocation mechanisms (e.g., new in Java, malloc(), calloc(), realloc() in C). Unlike the stack—which automatically manages plates, you must manually request and release memory space. If you don't avoid extra excessive allocations or if you forget to clean up, you'll ended up understanding what they mean by memory leak.

Key characteristics

Manual allocation and deallocation if a garbage collector is not available;
Larger and more flexible than the stack, but slower to access;
Memory is allocated dynamically and can grow as needed (within system limits);
Objects or data allocated on the heap stay there until they are explicitly freed;
When a program exits, the operating system automatically reclaims all allocated heap memory;

Limitations

The objects in the heap are not automatically freed if a garbage collector is not available;
Risk of memory leak exception if used size excceds available memory;
Possibly the slowest memory region to work with, not counting virtual memory;
High pressure on a garbage collector (when available) may also be a problem;

Example 1


          
      
#include <stdio.h>
#include <stdlib.h>

int main() {
    int *ptr = (int *)malloc(sizeof(int)); // Allocate memory for an int

    if (ptr == NULL) {
        printf("Memory allocation failed!\n");
        return 1;
    }
    
    *ptr = 42; // Assign a value
    printf("Heap value: %d\n", *ptr);

    free(ptr); // Free memory

    return 0;
}

How to prove?

To demonstrate how memory is allocated, we can use gdb. While we could simply set breakpoints at lines 5 and 12 and inspect the value of the ptr pointer at those points, let's take a different approach. Instead of just observing values, we'll dig a bit deeper and have some fun exploring lower-level concepts.

That said, we won't be breaking into malloc or free to investigate language internals—those are beyond the scope of this blog post. For now, we'll focus on application-level memory management concepts.

Compile the program with the debug symbols -g and no optimizations -O0;


    
gcc -g -O0 heap.c -o heap

Now, run the program with gdb by doing gdb heap and set a breakpoint at the main function. You can use break main or b main command after starting the gdb session. Then, run the program with the run command.

After the program stops at the breakpoint, you can disassemble the main function using the disassemble or disas command. This will show you the assembly code for the main function, including the calls to malloc and free.


          
      
(gdb) disas

Raw output (click to expand)

Dump of assembler code for function main:
   0x0000aaaad8d50854 <+0>:     stp     x29, x30, [sp, #-32]!
   0x0000aaaad8d50858 <+4>:     mov     x29, sp
=> 0x0000aaaad8d5085c <+8>:     mov     x0, #0x4                        // #4
   0x0000aaaad8d50860 <+12>:    bl      0xaaaad8d506c0 <malloc@plt>
   0x0000aaaad8d50864 <+16>:    str     x0, [sp, #24]
   0x0000aaaad8d50868 <+20>:    ldr     x0, [sp, #24]
   0x0000aaaad8d5086c <+24>:    cmp     x0, #0x0
   0x0000aaaad8d50870 <+28>:    b.ne    0xaaaad8d50888 <main+52>  // b.any
   0x0000aaaad8d50874 <+32>:    adrp    x0, 0xaaaad8d50000
   0x0000aaaad8d50878 <+36>:    add     x0, x0, #0x8e0
   0x0000aaaad8d5087c <+40>:    bl      0xaaaad8d506f0 <puts@plt>
   0x0000aaaad8d50880 <+44>:    mov     w0, #0x1                        // #1
   0x0000aaaad8d50884 <+48>:    b       0xaaaad8d508b8 <main+100>
   0x0000aaaad8d50888 <+52>:    ldr     x0, [sp, #24]
   0x0000aaaad8d5088c <+56>:    mov     w1, #0x2a                       // #42
   0x0000aaaad8d50890 <+60>:    str     w1, [x0]
   0x0000aaaad8d50894 <+64>:    ldr     x0, [sp, #24]
   0x0000aaaad8d50898 <+68>:    ldr     w0, [x0]
   0x0000aaaad8d5089c <+72>:    mov     w1, w0
   0x0000aaaad8d508a0 <+76>:    adrp    x0, 0xaaaad8d50000
   0x0000aaaad8d508a4 <+80>:    add     x0, x0, #0x900
   0x0000aaaad8d508a8 <+84>:    bl      0xaaaad8d50710 <printf@plt>
   0x0000aaaad8d508ac <+88>:    ldr     x0, [sp, #24]
   0x0000aaaad8d508b0 <+92>:    bl      0xaaaad8d50700 <free@plt>
   0x0000aaaad8d508b4 <+96>:    mov     w0, #0x0                        // #0
   0x0000aaaad8d508b8 <+100>:   ldp     x29, x30, [sp], #32
   0x0000aaaad8d508bc <+104>:   ret
End of assembler dump.

Before we begin analyzing the disassemble of this program, it's important to clarify that the memory regions used to store these variables are dynamic. These regions are part of what's known as User Space. For a visual reference, please consult the ASCII memory layout diagram in the first blog of this series.

Dynamic memory regions are allocated at runtime, meaning they do not exist at compile time. These variables are not stored in the typical static memory sections such as .bss, .data, or .rodata. Instead, they are created and managed dynamically during the program's execution, often via heap or stack allocation.

Additionally, in many cases, especially for temporary or frequently accessed data, the CPU's registers (which we'll explore in more detail in upcoming posts) are used to store these values. Registers provide the fastest access and are heavily utilized during function calls, arithmetic operations, and memory address calculations.

In contrast, static memory regions (which we'll also explore in more detail in upcoming posts) are determined at compile time and reside in a different part of the process memory. These sections typically cannot be modified during execution and serve different purposes, such as storing global variables, constants, or uninitialized data.

Understanding the distinction between dynamic and static memory spaces is crucial for interpreting program behavior, debugging, and writing efficient code.

Disassembly overview

Notice that I'm doing this disassemble from an Apple Silicon Mac M1, which uses the ARM64 (AArch64) architecture. The instructions may vary slightly on different architectures, but the general idea remains the same.

Let's break the disassemble down and walk through the relevant part of the disassembly with *ptr variable and its values in mind. And "values" with the "s" because we're talking about dynamic memory allocation, which means that the value of *ptr can change during the program's execution.

Raw output (click to expand)

0x0000aaaad8d50854 <+0>:     stp     x29, x30, [sp, #-32]!
0x0000aaaad8d50858 <+4>:     mov     x29, sp

This is the function prologue, standard in ARM64. The stp x29, x30, [sp, #-32]! allocates 32 bytes of stack space, stores the old frame pointer and return address. The mov x29, sp sets up the new frame pointer. This is when stack space is allocated for local variables—including the int *ptr.

Raw output (click to expand)

0x0000aaaad8d5085c <+8>:    mov     x0, #0x4
0x0000aaaad8d50860 <+12>:   bl      0xaaaad8d506c0 <malloc@plt>
0x0000aaaad8d50864 <+16>:   str     x0, [sp, #24]

Here, the instruction mov x0, #0x4 prepares the argument for the malloc function, indicating that we want to allocate 4 bytes of memory (the default size of an int). The call to malloc is made via the bl instruction (branch with link), which saves the return address in x30 and jumps to the function.

The return value of malloc (a pointer to the newly allocated memory) is stored in register x0, then saved onto the stack at [sp + 24]. This marks the moment the pointer is stored, but the memory is already reserved before this point, right when malloc returns.

So, the space for the int is reserved at runtime during the malloc call, which dynamically allocates a 4-byte block on the heap. The pointer to this memory is then stored in a local variable (ptr), backed by the stack frame at [sp + 24].

Utilized regions

Memory Region	Purpose	Example
Heap	Dynamic memory	`malloc(4)` result stored in `x0`
Stack	Local pointer storage	`str x0, [sp, #24]`
Registers	Temporary values	`mov x0, #4`, `mov w1, #0x2a`, etc.

In this example, the heap is used to dynamically allocate memory for an int variable. The stack stores the pointer to this allocated memory, while registers hold temporary values and function arguments. This demonstrates how different memory regions are used together in a typical C program.

The memory referenced by x0 points to the heap block, which must be managed manually. Heap memory is not automatically released and can persist beyond the function scope unless explicitly freed. Registers, on the other hand, are used for fast, short-lived data access. They are the fastest form of storage, but are limited in number and their contents are volatile.

Value assignment

Now, let's look at the part of the disassembly where the value is assigned to the dynamically allocated memory. This happens after the malloc call, when we have a valid pointer to the allocated memory block.

Raw output (click to expand)

0x0000aaaad8d50888 <+52>:    ldr     x0, [sp, #24]
0x0000aaaad8d5088c <+56>:    mov     w1, #0x2a                       // #42
0x0000aaaad8d50890 <+60>:    str     w1, [x0]

The value 42 is written into the heap-allocated memory through a series of instructions:

Step	Code / Instruction	What happens
1	`ldr x0, [sp, #24]`	Loads the pointer from the stack into `x0`. This pointer references the memory block previously allocated by `malloc`.
2	`mov w1, #0x2a`	Loads the integer constant `42` into register `w1`. This value is intended to be stored in the heap block.
3	`str w1, [x0]`	Stores the value `42` from `w1` into the memory location pointed to by `x0`, completing the heap write.

At this point, the dynamically allocated heap memory now contains the value 42, and *ptr (the dereferenced pointer) reflects this assignment. So the assignment of 42 to the allocated int occurs explicitly in this str instruction.

Step	Code / Instruction	What happens
1	`mov x0, #0x4`	Prepares the size for allocation
2	`bl malloc@plt`	Reserves 4 bytes on the heap, returns pointer in `x0`
3	`str x0, [sp, #24]`	Stores pointer to stack (as `ptr`)
4	`ldr x0, [sp, #24]`	Loads pointer back into register
5	`mov w1, #0x2a`	Loads `42` into register
6	`str w1, [x0]`	Writes `42` to the heap-allocated memory

Cleaning up the heap

After using dynamically allocated memory, it's essential to free it to avoid memory leaks. In languages like C, memory management is manual—you must explicitly release heap memory using free. In contrast, languages with a garbage collector (like Java, Go, or Python) automatically reclaim unused memory, making manual cleanup unnecessary in most cases.

Raw output (click to expand)

0x0000aaaad8d508ac <+88>:    ldr     x0, [sp, #24]
0x0000aaaad8d508b0 <+92>:    bl      0xaaaad8d50700 <free@plt>
0x0000aaaad8d508b4 <+96>:    mov     w0, #0x0                        // #0
0x0000aaaad8d508b8 <+100>:   ldp     x29, x30, [sp], #32
0x0000aaaad8d508bc <+104>:   ret

The following table shows the cleanup process for the heap memory allocated earlier in the program in detail. This is crucial to prevent memory leaks and ensure efficient memory usage.

Step	Code / Instruction	What happens
1	`ldr x0, [sp, #24]`	Loads the pointer to the heap-allocated memory (previously stored) from the stack into register `x0`.
2	`bl free@plt`	Calls the `free` function, passing the pointer in `x0` to deallocate the heap memory.
3	`mov w0, #0`	Prepares the return value of the `main` function. Returning `0` indicates successful execution.
4	`ldp x29, x30, [sp], #32`	Restores the frame pointer (`x29`) and return address (`x30`) from the stack frame created during the function prologue at the beginning, and adjusts the stack pointer to unwind the stack frame.
5	`ret`	Returns control to the caller using the return address stored in `x30`.

The free function deallocates the memory previously allocated by malloc. This is crucial to prevent memory leaks, which can occur if you forget to free memory that is no longer needed.

Warn

In manual memory-managed languages like C, for every malloc (or calloc, realloc), there must be a corresponding free. Failing to do so results in memory leaks, which can degrade performance or crash long-running programs.

The cost of convenience

In managed languages like Java, C#, Go, and Python, memory management is largely handled by an automatic Garbage Collector (GC). This system tracks which objects are still being used and automatically reclaims memory from those that are no longer reachable. This removes the need for manual memory deallocation, making programming much safer and more convenient—especially in large or complex projects.

Automatic memory management dramatically reduces the risk of memory leaks and pointer-related bugs, which are common pitfalls in manual environments like C. Developers can focus on business logic rather than wrestling with memory lifecycles. However, this convenience comes at a cost: garbage collectors introduce runtime overhead. They periodically pause the program to identify unused memory, which can impact performance, especially in systems with real-time constraints or limited resources.

Similarly, object-oriented programming (OOP), which is often used in these managed languages, introduces a layer of abstraction that simplifies design and code reuse. Concepts like inheritance, polymorphism, and dynamic dispatch enable powerful software architecture patterns. But these features also add small runtime costs, such as virtual table lookups and slightly higher memory usage per object. Compared to C's imperative style—which maps directly to the hardware and has minimal abstraction—OOP can be slightly heavier in both memory and CPU cycles.

For general-purpose computing, the trade-off is well worth it. But in memory-constrained or performance-critical systems, such as embedded devices, this overhead matters. That's why low-level languages like C (and newer alternatives like Rust) are preferred in those environments—they offer precise control with minimal runtime cost, though they require the developer to manage memory and structure more manually.

Example 2


          
      
// WARNING: Bad code - segmentation fault
#include <stdio.h>

int main() {
    char *ptr = "hello";
    ptr[0] = 'H'; // Attempt to modify a string literal
    printf("%s\n", ptr);
    return 0;
}

This is a classic "gotcha" for beginners messing around with strings in C. At first glance, it might look like we're working with the heap here—but that's not the case. The line char *ptr = "hello"; makes it seem like ptr points to a regular, writable string. But under the hood, that string literal is actually stored in a read-only section of memory, not on the heap.

So when the next line tries to change 'h' to 'H', the program crashes with a segmentation fault. Why? Because you're trying to write to memory that was never meant to be changed. It's like trying to scribble on a laminated sign—you just can't.

This kind of bug can be super confusing, especially when you're still wrapping your head around how memory works in C. Heap, stack, data, text… it's a lot! And if you don't know how the program's memory layout is organized, it's easy to assume all pointers are pointing to similar types of memory.

If you've ever been bitten by this, don't worry—it happens to almost everyone learning C. And once you understand what's going on under the hood, it starts to make a lot more sense.

What's even cooler? There's a slick little trick you can do with gdb to make this exact program work—without touching a single line of the source code. I shared the full details in this tweet, but here's the gist: by manually editing the pointer on the stack before the program tries to write to it, you can redirect it to a malloc-allocated (and writable!) buffer instead.

Fun fact

This is a classic example of how powerful low-level debugging can be. By understanding the memory layout and using tools like gdb, you can fix bugs on the fly without needing to recompile or change the source code. Very useful for quick fixes in embedded systems or when working with legacy code where you can't easily change the source.

In Brazilian Portuguese, there is a popular slang term for "stolen" called "malocado", which is very similar to the pronunciation of this "malloc-allocated". I often use it to make a joke about a bug that was fixed by a low-level hacker like this one. So, you could say that this bug was "malocado"—fixed with a little heap magic :)

Instead of crashing when trying to modify a string literal in read-only memory, the program runs just fine. It thinks it's editing the original string, but it's actually writing to safe, heap-allocated memory. All done live at runtime with the help of gdb. No rebuilds. No source code changes. Just clean debugger magic.

It's a neat little hack that not only fixes the bug but also shows how powerful low-level debugging can be when you really understand how memory works. Definitely check it out if you want to learn something cool today.

Final thoughts

Programming is a bit like cooking—knowing your ingredients (stack, heap, registers) and when to use them makes all the difference. Whether you're juggling pointers in C, trusting the garbage collector in a managed language, or hacking your way around memory for fun (yes, we saw that!), understanding how things work under the hood gives you a superpower most devs only dream of.

So keep exploring, keep breaking (and fixing!) things. And if you do end up with a segfault, just remember: it's not a bug, it's a feature! A feature that teaches you more about memory management, pointers, and the inner workings of your programming language. Embrace it, learn from it, and move on. Happy coding!

Read other posts

Memory Regions - Stack

Memory Regions - Static