Feb 23, 2025

Memory Regions - Code Segment

The code segment—or .text segment, as you prefer—is a read-only region of memory that contains the compiled machine code of a program—i.e., the executable instructions generated from source code. These instructions are loaded into memory at program startup and executed by the CPU to perform the program's logic.

As a read-only section, the .text segment cannot be modified at runtime, which helps protect against accidental or malicious changes to the program's behavior. It typically appears near the beginning of a binary file.

In the case of WebAssembly, the .text segment is where the compiled WebAssembly bytecode resides. This bytecode is a low-level representation of the original source code, optimized for execution in a virtual machine or browser environment.

Tip

Reducing the number of instructions the CPU must execute can directly lower energy consumption—an important consideration in embedded systems where power is limited. For simple operations like addition, it's often better to precompute values at compile time. Instead of calculating 1 + 1 at runtime, just store 2 directly in the .text segment. This not only saves power but also speeds up execution. In most cases, the compiler will take care of this for you.

Key characteristics

Generally read-only to prevent modification of executable instructions;
Includes the actual code of the program the way it was compiled or built;
The binary instructions that make up your compiled program are stored in this section;

Limitations

Read-only data saved in the .text segment cannot be changed;
Once the program is loaded into memory, data cannot be freed at runtime;
Cannot be dynamically modified, except through special techniques like JIT compilation;

Example


          
      
#include <stdio.h>

int main() {
    printf("Hello, .text segment!\n");
    return 0;
}

How to prove?

This is a simple 81 bytes C program that prints a message to the console. When compiled, the machine code generated from this program will be stored in the .text segment of the executable file.

This time, I'm gonna use the objdump command rather than gdb to inspect the contents of the .text segment in the compiled binary.

Info

The objdump is part of the GNU Binutils package and is used to display information about ELF (Executable and Linkable Format) files. It can be used to examine various aspects of ELF files, including headers, sections, segments, symbols, and more. The objdump command is particularly useful for developers and system administrators who need to analyze or debug ELF binaries.

The objdump command is available on most Linux distributions and can be installed as part of the GNU Binutils package.


    
objdump --version

Raw output (click to expand)

GNU objdump (GNU Binutils for Ubuntu) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.

Let's compile the C program to an executable binary file. The -o option specifies the output file name, which in this case is code_segment. The -g option generates debug information in the executable file, which can be useful for debugging purposes. The -O0 option disables optimization, ensuring that the generated code closely resembles the original source code. The -no-pie option disables position-independent executables (PIE), which is a security feature that randomizes the memory address of the executable code to make it harder for attackers to exploit vulnerabilities. This option is not strictly necessary for this example, but it can be useful in certain situations.


    
gcc -g -O0 -no-pie code_segment.c -o code_segment

Once the program is compiled, the executable binary file code_segment is created. You can check the .text segment in the binary file using the objdump command with the -d option to disassemble the binary file and display the assembly code. The -d option tells objdump to disassemble the contents of the binary file. The .text section contains the machine code instructions generated from the C program.


    
objdump -d code_segment

Raw output (click to expand)

code_segment:     file format elf64-x86-64

[...]

Disassembly of section .text:

0000000000401050 <_start>:
  401050:       f3 0f 1e fa             endbr64
  401054:       31 ed                   xor    %ebp,%ebp
  401056:       49 89 d1                mov    %rdx,%r9
  401059:       5e                      pop    %rsi
  40105a:       48 89 e2                mov    %rsp,%rdx
  40105d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp

[...]

The objdump command disassembles the binary file and displays all the contents of the code segment, showing the machine code instructions along with their corresponding addresses. Note that the output contains not only .text segment but also other segments like .init, .fini, and .plt, which are part of the executable file.

What is in `.text`?

The `.text` segment contains:

The _start function (entry point from libc/startup code);
Possibly inlined or compiler-generated helpers;
Your main() function;
Any other functions you wrote;

It does not contains:

Strings ("Hello, world!\n" lives in .rodata);
Global variables (live in .data or .bss);
Debug symbols (in .debug_* sections);

The disassembly

Let's decode the first few instructions of the .text segment:

Address	Bytes	Assembly	Description
`0x1060`	`f3 0f 1e fa`	`endbr64`	Indirect branch target for CET
`0x1064`	`31 ed`	`xor %ebp, %ebp`	Zero out `ebp`
`0x1066`	`49 89 d1`	`mov %rdx, %r9`	Move `rdx` into `r9`
`0x1069`	`5e`	`pop %rsi`	Restore value into `rsi`
`0x106a`	`48 89 e2`	`mov %rsp, %rdx`	Move `rsp` into `rdx`
`0x106d`	`48 83 e4 f0`	`and $0xfffffffffffffff0, %rsp`	Align stack

As you can see, the first instruction is endbr64, which is an indirect branch target for CET (Control-flow Enforcement Technology). This instruction is used to help prevent control-flow hijacking attacks by ensuring that indirect branches target valid locations in the code. The next instruction, xor %ebp, %ebp, zeroes out the ebp register, which is commonly used as a frame pointer in function calls. The subsequent instructions set up the stack and prepare for the execution of the main function.

While this is a crucial part of your program, this is still not the code you wrote. The code you wrote is in the main function, which is also part of the .text segment. To find the code you wrote, you can search for the main function in the disassembled output.

Filtering symbols

If you want to filter just what is inside the .text segment, you can also use the nm command to list the symbols in the binary file.

The nm command is used to display the symbol table of an object file or executable. It provides information about the symbols defined and referenced in the file, including their addresses and types.

The -n option sorts the symbols by their addresses in ascending order. The grep ' T ' command filters the output to show only the symbols that are defined in the .text segment (indicated by the letter T). If you have more functions in your code, they will also be listed here.


    
nm -n code_segment | grep ' T '

Raw output (click to expand)

0000000000401000 T _init
0000000000401050 T _start
0000000000401080 T _dl_relocate_static_pie
0000000000401136 T main
0000000000401154 T _fini

Main function

At last, but not least, let's take a look at the disassembly of the main function. The actual code you wrote is in the main function, which is also part of the .text segment.


    
objdump -d code_segment

Raw output (click to expand)

code_segment:     file format elf64-x86-64

[...]
0000000000401136 <main>:
401136:       f3 0f 1e fa             endbr64
40113a:       55                      push   %rbp
40113b:       48 89 e5                mov    %rsp,%rbp
40113e:       48 8d 05 bf 0e 00 00    lea    0xebf(%rip),%rax  # 402004 <_IO_stdin_used+0x4>
401145:       48 89 c7                mov    %rax,%rdi
401148:       e8 f3 fe ff ff          call   401040 <puts@plt>
40114d:       b8 00 00 00 00          mov    $0x0,%eax
401152:       5d                      pop    %rbp
401153:       c3                      ret

Here is the simplified explanation of the main function disassembly:

Address	Bytes	Assembly	Description
`0x1060`	`f3 0f 1e fa`	`endbr64`	Indirect branch target for CET
`0x1064`	`55`	`push %rbp`	Save base pointer
`0x1065`	`48 89 e5`	`mov %rsp, %rbp`	Set up new stack frame
`0x1068`	`48 8d 05 bf 0e 00 00`	`lea 0xebf(%rip), %rax`	Load address of string into %rax
`0x106f`	`48 89 c7`	`mov %rax, %rdi`	Set up argument for puts()
`0x1072`	`e8 f3 fe ff ff`	`call 0x1040`	Call puts function (via PLT)
`0x1077`	`b8 00 00 00 00`	`mov $0x0, %eax`	Return 0 (success)
`0x107c`	`5d`	`pop %rbp`	Restore base pointer
`0x107d`	`c3`	`ret`	Return from function

And here is a more detailed explanation of the disassembly:

endbr64 — Intel CET (Control-flow Enforcement Technology)

401136: f3 0f 1e fa endbr64

A special instruction inserted for indirect branch validation (security feature);

Can usually be ignored when analyzing logic — it's for security, not logic;

Function Prologue (Setting up the stack frame)

40113a: 55 push %rbp

40113b: 48 89 e5 mov %rsp, %rbp

push %rbp saves the old base pointer;

mov %rsp, %rbp sets up a new stack frame;

Standard function prologue in x86_64 calling convention;

Load address of string into register

40113e: 48 8d 05 bf 0e 00 00 lea 0xebf(%rip), %rax # => 0x402004

LEA (Load Effective Address) is used to load the address of the string;

%rip-relative addressing: 0xebf + current address (rip) gives address 0x402004;

This points to a string constant — most likely "Hello, world!" or whatever your puts() call is printing;

Move string address into argument register

401145: 48 89 c7 mov %rax, %rdi

First argument in the x86_64 System V calling convention goes into %rdi;

So, you’re setting up puts(<string_addr>);

Call the puts function

401148: e8 f3 fe ff ff call 401040 <puts@plt>

This calls the PLT (Procedure Linkage Table) entry for puts;

The PLT manages dynamic linking (i.e., runtime resolution of external symbols);

Eventually jumps into libc's puts;

Set return value to 0 (normal exit)

40114d: b8 00 00 00 00 mov $0x0, %eax

Return value from main() goes in %eax (lower 32 bits of %rax);

0 typically means "success";

Function Epilogue

401152: 5d pop %rbp

401153: c3 ret

Restores the old base pointer;

ret returns from the function — control goes back to _start / libc_start_main;

Read other posts

Memory Regions - Static

Memory Regions - Cache

Memory Regions - Code Segment

Tip

Key characteristics

Limitations

Example

How to prove?

Info

Raw output (click to expand)

Raw output (click to expand)

What is in .text?

The .text segment contains:

It does not contains:

The disassembly

Filtering symbols

Raw output (click to expand)

Main function

Raw output (click to expand)

What is in `.text`?

The `.text` segment contains: