When I hear
hello-world I imagine a trivial one-line program. However, things that are actually happening under the hood are far from being trivial: memory allocation, register and stack manipulation, and interop with an OS kernel among others. So I figured that going a couple layers down the stack to the bare assembly in order to build a minimal working program (and a couple more complex ones) might be a fun excercise.
This is the first post about assembly on MacOS X, that covers “getting started”, calling conventions and system calls. I plan to write another post with more practical examples later on.
First things first. I’ll be using
nasm and hence Intel’s assembly dialect. The code will be written for 64-bit1 MacOS X. Porting the code to Linux should be quite straightforward as the only thing that’s different is system call numbers.
nasm can be installed easily with a package manager like
The second part of the toolchain is the linker –
ld – which becomes available after installing XCode command line tools:
To make sure everything’s set, the following commands should produce a legible output:
NASM version 2.14 compiled on Nov 8 2018
@(#)PROGRAM:ld PROJECT:ld64-409.12 BUILD 17:47:51 Sep 25 2018
An awkward no-op in assembly, or how to trigger a bus error
Let’s start with a program that literally does nothing. As a first cut, we could try to only define a code segment with a global symbol that will be an entry point to our program and a single “do-nothing” instruction:
We can use the following script to compile and link this program:
nasm -f macho64 tells NASM to produce a 64bit Mach-O object file, which is then transformed into a statically linked executable using
ld -static stanza;
-e _main tells a linker the name of the symbol which will serve as an entry point to the program.
During execution, however, the program fails with an error:
 70339 bus error bin/do-nothing-incomplete
This may seem odd as we don’t access memory. So where this bus error came from?
A short session in debugger reveals the following:
(lldb) r Process 70526 launched: 'bin/do-nothing-incomplete' (x86_64) Process 70526 stopped * thread #1, stop reason = EXC_BAD_ACCESS (code=2, address=0x2000) frame #0: 0x0000000000002000 do-nothing-incomplete -> 0x2000: add al, byte ptr [rax] 0x2002: add byte ptr [rax], al 0x2004: add eax, dword ptr [rcx] 0x2006: adc byte ptr [rax], al Target 0: (do-nothing-incomplete) stopped. (lldb) dis -s 0x1fff do-nothing-incomplete`main: 0x1fff <+0>: nop 0x2000: add al, byte ptr [rax] 0x2002: add byte ptr [rax], al 0x2004: add eax, dword ptr [rcx] 0x2006: adc byte ptr [rax], al 0x2008: add byte ptr [rax], dl 0x200a: add byte ptr [rax], al 0x200c: add byte ptr [rax], al 0x200e: add byte ptr [rax], al (lldb) p/x $rax (unsigned long) $0 = 0x0000000000000000
dis -s 0x1fff shows our
nop command followed by some other commands. Since we didn’t explicitly signal to the OS that the program is ready to terminate with an
exit system call, a processor started executing whatever was next in memory, and because
rax was zero the instruction
add al, byte ptr [rax] tried to access zero byte in memory and failed with a bus error.
The cause of the problem is clear. But where did those other instructions come from? Let’s peek into the structure of our executable with
otool -l prints the load commands, which is pretty much a Table of Contents of our binary:
Load command 2 cmd LC_SEGMENT_64 cmdsize 72 segname __LINKEDIT vmaddr 0x0000000000002000 vmsize 0x0000000000001000 fileoff 4096 filesize 64 maxprot 0x00000007
We can see that at address
__LINKEDIT segment, which is something that should not be interpreted and executed as code. Thus a well-behaved program needs to signal its exit to the OS, and we, in turn, need to talk a bit about system calls and calling conventions.
System calls and SysV ABI
As long as a program is doing its own things, it can use whatever registers it wants and do whatever it wants with the stack. But collaboration with OS and other libraries requires a common set of rules that all parties need to adhere to. Such a set of rules is called “Application Binary Interface” or ABI for short.
Modern 64-bit Linux and MacOS X systems follow System V ABI2. The set of rules is very extensive, but for this post we’ll only need the following:
- For regular calls integer arguments are passed in registers
rdi, rsi, rdx, rcx, r8, r9in the specified order.
- For system calls register
r10is used instead of
rcx, while values in registers
r11are clobbered by the kernel. The number of the system call to invoke is passed in
- An integer result of the call is returned in registers
rax + rdxdepending on the size of the returned value.
rbx, rbp, rsp, r12-r15are preserved across function calls. In other words, when you want to use these registers in your subroutine, you need to save them on the stack and restore their values before returning to the caller.
- The stack should be 16 bytes aligned before the call.
The last rule is not strictly necessary for all calls. You can write and call your own subroutine and most likely things will be fine without stack alignment. However, the situation is different when calling external subroutines, which may in turn call operations that require memory alignment of their operands; we’ll get back to it later.
We know how to pass arguments and invoke a system call with a particular number3. What is missing is a list of system calls for OS X kernel.
I couldn’t find such a list in official documentation. There is a list for 32-bit4, but unfortunately it doesn’t work out of the box. Luckily someone on the internet wrote that a syscall number for 64-bit assembly is
0x0200_0000. Thus an
exit syscall which is number 1 in the list, will have number
0x0200_0001 in assembly.
A proper noop in assembly
Now we know how to do system calls, so we can properly trigger an exit, and return a meaningful exit code of
This time around we don’t need to specify the name of the entry point to the linker, as we went with the default. The code compiles and links without issues:
and the resulting binary produces the expected result:
Actually, we could omit explicitly providing an exit code and just invoke a system call with whatever is in
rdi at the moment. This way it’s possible to make a “do-nothing” program even smaller.
And although SysV ABI states in Section 3.4.1 that the state of
rdi is undefined, the kernel zeroes it out, so we get a proper 0 exit code:
So far, I’ve covered only statically linked self-contained executables. Although interesting, it’s not very practical (to the extent one can consider programming in assembly practical). So far, I plan to write one more post and cover the following topics:
- dynamic linking with C runtime,
- using external symbols from linked libraries,
- 16-byte stack alignment, SSE instructions and segfaults,
- jumps, loops, etc.
But that’s for 2019.
That fact that it’s 64-bit is important as the calling conventions for 32-bit code on MacOS are quite different.↩
Latest SysV ABI PDFs can be found on this page on Github.↩
In 64-bit assembly a syscall is triggered with a
Relevant lines from this source file.↩