Memory Regions - Code Segment
#memory | #lowlevel | #cache
The code segment—or .text segment, as you prefer—is a read-only region of memory that contains the compiled machine code of a program—i.e., the executable instructions generated from source code. These instructions are loaded into memory at program startup and executed by the CPU to perform the program's logic.
As a read-only section, the .text segment cannot be modified at runtime, which helps protect against accidental or malicious changes to the program's behavior. It typically appears near the beginning of a binary file.
In the case of WebAssembly, the .text segment is where the compiled WebAssembly bytecode resides. This bytecode is a low-level representation of the original source code, optimized for execution in a virtual machine or browser environment.
Tip
Reducing the number of instructions the CPU must execute can directly lower energy consumption—an important consideration in embedded systems where power is limited. For simple operations like addition, it's often better to precompute values at compile time. Instead of calculating 1 + 1 at runtime, just store 2 directly in the .text segment. This not only saves power but also speeds up execution. In most cases, the compiler will take care of this for you.
Key characteristics
- Generally read-only to prevent modification of executable instructions;
- Includes the actual code of the program the way it was compiled or built;
- The binary instructions that make up your compiled program are stored in this section;
Limitations
- Read-only data saved in the
.textsegment cannot be changed; - Once the program is loaded into memory, data cannot be freed at runtime;
- Cannot be dynamically modified, except through special techniques like JIT compilation;
Example
#include <stdio.h>
int main() {
printf("Hello, .text segment!\n");
return 0;
}
How to prove?
This is a simple 81 bytes C program that prints a message to the console. When compiled, the machine code generated from this program will be stored in the .text segment of the executable file.
This time, I'm gonna use the objdump command rather than gdb to inspect the contents of the .text segment in the compiled binary.
Info
The objdump is part of the GNU Binutils package and is used to display information about ELF (Executable and Linkable Format) files. It can be used to examine various aspects of ELF files, including headers, sections, segments, symbols, and more. The objdump command is particularly useful for developers and system administrators who need to analyze or debug ELF binaries.
The objdump command is available on most Linux distributions and can be installed as part of the GNU Binutils package.
objdump --version
Raw output (click to expand)
GNU objdump (GNU Binutils for Ubuntu) 2.38 Copyright (C) 2022 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) any later version. This program has absolutely no warranty.
Let's compile the C program to an executable binary file. The -o option specifies the output file name, which in this case is code_segment. The -g option generates debug information in the executable file, which can be useful for debugging purposes. The -O0 option disables optimization, ensuring that the generated code closely resembles the original source code. The -no-pie option disables position-independent executables (PIE), which is a security feature that randomizes the memory address of the executable code to make it harder for attackers to exploit vulnerabilities. This option is not strictly necessary for this example, but it can be useful in certain situations.
gcc -g -O0 -no-pie code_segment.c -o code_segment
Once the program is compiled, the executable binary file code_segment is created. You can check the .text segment in the binary file using the objdump command with the -d option to disassemble the binary file and display the assembly code. The -d option tells objdump to disassemble the contents of the binary file. The .text section contains the machine code instructions generated from the C program.
objdump -d code_segment
Raw output (click to expand)
code_segment: file format elf64-x86-64 [...] Disassembly of section .text: 0000000000401050 <_start>: 401050: f3 0f 1e fa endbr64 401054: 31 ed xor %ebp,%ebp 401056: 49 89 d1 mov %rdx,%r9 401059: 5e pop %rsi 40105a: 48 89 e2 mov %rsp,%rdx 40105d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp [...]
The objdump command disassembles the binary file and displays all the contents of the code segment, showing the machine code instructions along with their corresponding addresses. Note that the output contains not only .text segment but also other segments like .init, .fini, and .plt, which are part of the executable file.
What is in .text?
The .text segment contains:
- The
_startfunction (entry point from libc/startup code); - Possibly inlined or compiler-generated helpers;
- Your
main()function; - Any other functions you wrote;
It does not contains:
- Strings (
"Hello, world!\n"lives in.rodata); - Global variables (live in
.dataor.bss); - Debug symbols (in
.debug_*sections);
The disassembly
Let's decode the first few instructions of the .text segment:
| Address | Bytes | Assembly | Description |
|---|---|---|---|
0x1060 |
f3 0f 1e fa |
endbr64 |
Indirect branch target for CET |
0x1064 |
31 ed |
xor %ebp, %ebp |
Zero out ebp |
0x1066 |
49 89 d1 |
mov %rdx, %r9 |
Move rdx into r9 |
0x1069 |
5e |
pop %rsi |
Restore value into rsi |
0x106a |
48 89 e2 |
mov %rsp, %rdx |
Move rsp into rdx |
0x106d |
48 83 e4 f0 |
and $0xfffffffffffffff0, %rsp |
Align stack |
As you can see, the first instruction is endbr64, which is an indirect branch target for CET (Control-flow Enforcement Technology). This instruction is used to help prevent control-flow hijacking attacks by ensuring that indirect branches target valid locations in the code. The next instruction, xor %ebp, %ebp, zeroes out the ebp register, which is commonly used as a frame pointer in function calls. The subsequent instructions set up the stack and prepare for the execution of the main function.
While this is a crucial part of your program, this is still not the code you wrote. The code you wrote is in the main function, which is also part of the .text segment. To find the code you wrote, you can search for the main function in the disassembled output.
Filtering symbols
If you want to filter just what is inside the .text segment, you can also use the nm command to list the symbols in the binary file.
The nm command is used to display the symbol table of an object file or executable. It provides information about the symbols defined and referenced in the file, including their addresses and types.
The -n option sorts the symbols by their addresses in ascending order. The grep ' T ' command filters the output to show only the symbols that are defined in the .text segment (indicated by the letter T). If you have more functions in your code, they will also be listed here.
nm -n code_segment | grep ' T '
Raw output (click to expand)
0000000000401000 T _init 0000000000401050 T _start 0000000000401080 T _dl_relocate_static_pie 0000000000401136 T main 0000000000401154 T _fini
Main function
At last, but not least, let's take a look at the disassembly of the main function. The actual code you wrote is in the main function, which is also part of the .text segment.
objdump -d code_segment
Raw output (click to expand)
code_segment: file format elf64-x86-64 [...] 0000000000401136 <main>: 401136: f3 0f 1e fa endbr64 40113a: 55 push %rbp 40113b: 48 89 e5 mov %rsp,%rbp 40113e: 48 8d 05 bf 0e 00 00 lea 0xebf(%rip),%rax # 402004 <_IO_stdin_used+0x4> 401145: 48 89 c7 mov %rax,%rdi 401148: e8 f3 fe ff ff call 401040 <puts@plt> 40114d: b8 00 00 00 00 mov $0x0,%eax 401152: 5d pop %rbp 401153: c3 ret
Here is the simplified explanation of the main function disassembly:
| Address | Bytes | Assembly | Description |
|---|---|---|---|
0x1060 |
f3 0f 1e fa |
endbr64 |
Indirect branch target for CET |
0x1064 |
55 |
push %rbp |
Save base pointer |
0x1065 |
48 89 e5 |
mov %rsp, %rbp |
Set up new stack frame |
0x1068 |
48 8d 05 bf 0e 00 00 |
lea 0xebf(%rip), %rax |
Load address of string into %rax |
0x106f |
48 89 c7 |
mov %rax, %rdi |
Set up argument for puts() |
0x1072 |
e8 f3 fe ff ff |
call 0x1040 |
Call puts function (via PLT) |
0x1077 |
b8 00 00 00 00 |
mov $0x0, %eax |
Return 0 (success) |
0x107c |
5d |
pop %rbp |
Restore base pointer |
0x107d |
c3 |
ret |
Return from function |
And here is a more detailed explanation of the disassembly:
endbr64— Intel CET (Control-flow Enforcement Technology)- Function Prologue (Setting up the stack frame)
- Load address of string into register
- Move string address into argument register
- Call the
putsfunction - Set return value to 0 (normal exit)
- Function Epilogue
401136: f3 0f 1e fa endbr64
A special instruction inserted for indirect branch validation (security feature);
Can usually be ignored when analyzing logic — it's for security, not logic;
40113a: 55 push %rbp
40113b: 48 89 e5 mov %rsp, %rbp
push %rbp saves the old base pointer;
mov %rsp, %rbp sets up a new stack frame;
Standard function prologue in x86_64 calling convention;
40113e: 48 8d 05 bf 0e 00 00 lea 0xebf(%rip), %rax # => 0x402004
LEA (Load Effective Address) is used to load the address of the string;
%rip-relative addressing: 0xebf + current address (rip) gives address 0x402004;
This points to a string constant — most likely "Hello, world!" or whatever your puts() call is printing;
401145: 48 89 c7 mov %rax, %rdi
First argument in the x86_64 System V calling convention goes into %rdi;
So, you’re setting up puts(<string_addr>);
401148: e8 f3 fe ff ff call 401040 <puts@plt>
This calls the PLT (Procedure Linkage Table) entry for puts;
The PLT manages dynamic linking (i.e., runtime resolution of external symbols);
Eventually jumps into libc's puts;
40114d: b8 00 00 00 00 mov $0x0, %eax
Return value from main() goes in %eax (lower 32 bits of %rax);
0 typically means "success";
401152: 5d pop %rbp
401153: c3 ret
Restores the old base pointer;
ret returns from the function — control goes back to _start / libc_start_main;