Modifying Machine Code in Executables
requires xxd and objdump
We have a very simple program written in C that prints “ab” followed by a newline:
#include <stdio.h>
int main() {
putchar('a');
putchar('b');
putchar('\n');
}
Compile it:
gcc -o main main.c
Now, let’s look at the machine code of the compiled executable using objdump:
λ workspace $ objdump -d main
main: file format elf64-x86-64
Disassembly of section .init:
0000000000001000 <_init>:
1000: f3 0f 1e fa endbr64
1004: 48 83 ec 08 sub $0x8,%rsp
1008: 48 8b 05 c1 2f 00 00 mov 0x2fc1(%rip),%rax # 3fd0 <__gmon_start__@Base>
100f: 48 85 c0 test %rax,%rax
1012: 74 02 je 1016 <_init+0x16>
1014: ff d0 call *%rax
1016: 48 83 c4 08 add $0x8,%rsp
101a: c3 ret
Disassembly of section .plt:
0000000000001020 <putchar@plt-0x10>:
1020: ff 35 ca 2f 00 00 push 0x2fca(%rip) # 3ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
1026: ff 25 cc 2f 00 00 jmp *0x2fcc(%rip) # 3ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
102c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000001030 <putchar@plt>:
1030: ff 25 ca 2f 00 00 jmp *0x2fca(%rip) # 4000 <putchar@GLIBC_2.2.5>
1036: 68 00 00 00 00 push $0x0
103b: e9 e0 ff ff ff jmp 1020 <_init+0x20>
Disassembly of section .text:
0000000000001040 <_start>:
1040: f3 0f 1e fa endbr64
1044: 31 ed xor %ebp,%ebp
1046: 49 89 d1 mov %rdx,%r9
1049: 5e pop %rsi
104a: 48 89 e2 mov %rsp,%rdx
104d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1051: 50 push %rax
1052: 54 push %rsp
1053: 45 31 c0 xor %r8d,%r8d
1056: 31 c9 xor %ecx,%ecx
1058: 48 8d 3d da 00 00 00 lea 0xda(%rip),%rdi # 1139 <main>
105f: ff 15 5b 2f 00 00 call *0x2f5b(%rip) # 3fc0 <__libc_start_main@GLIBC_2.34>
1065: f4 hlt
1066: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
106d: 00 00 00
1070: 48 8d 3d a1 2f 00 00 lea 0x2fa1(%rip),%rdi # 4018 <__TMC_END__>
1077: 48 8d 05 9a 2f 00 00 lea 0x2f9a(%rip),%rax # 4018 <__TMC_END__>
107e: 48 39 f8 cmp %rdi,%rax
1081: 74 15 je 1098 <_start+0x58>
1083: 48 8b 05 3e 2f 00 00 mov 0x2f3e(%rip),%rax # 3fc8 <_ITM_deregisterTMCloneTable@Base>
108a: 48 85 c0 test %rax,%rax
108d: 74 09 je 1098 <_start+0x58>
108f: ff e0 jmp *%rax
1091: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
1098: c3 ret
1099: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10a0: 48 8d 3d 71 2f 00 00 lea 0x2f71(%rip),%rdi # 4018 <__TMC_END__>
10a7: 48 8d 35 6a 2f 00 00 lea 0x2f6a(%rip),%rsi # 4018 <__TMC_END__>
10ae: 48 29 fe sub %rdi,%rsi
10b1: 48 89 f0 mov %rsi,%rax
10b4: 48 c1 ee 3f shr $0x3f,%rsi
10b8: 48 c1 f8 03 sar $0x3,%rax
10bc: 48 01 c6 add %rax,%rsi
10bf: 48 d1 fe sar $1,%rsi
10c2: 74 14 je 10d8 <_start+0x98>
10c4: 48 8b 05 0d 2f 00 00 mov 0x2f0d(%rip),%rax # 3fd8 <_ITM_registerTMCloneTable@Base>
10cb: 48 85 c0 test %rax,%rax
10ce: 74 08 je 10d8 <_start+0x98>
10d0: ff e0 jmp *%rax
10d2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
10d8: c3 ret
10d9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10e0: f3 0f 1e fa endbr64
10e4: 80 3d 2d 2f 00 00 00 cmpb $0x0,0x2f2d(%rip) # 4018 <__TMC_END__>
10eb: 75 33 jne 1120 <_start+0xe0>
10ed: 55 push %rbp
10ee: 48 83 3d ea 2e 00 00 cmpq $0x0,0x2eea(%rip) # 3fe0 <__cxa_finalize@GLIBC_2.2.5>
10f5: 00
10f6: 48 89 e5 mov %rsp,%rbp
10f9: 74 0d je 1108 <_start+0xc8>
10fb: 48 8b 3d 0e 2f 00 00 mov 0x2f0e(%rip),%rdi # 4010 <__dso_handle>
1102: ff 15 d8 2e 00 00 call *0x2ed8(%rip) # 3fe0 <__cxa_finalize@GLIBC_2.2.5>
1108: e8 63 ff ff ff call 1070 <_start+0x30>
110d: c6 05 04 2f 00 00 01 movb $0x1,0x2f04(%rip) # 4018 <__TMC_END__>
1114: 5d pop %rbp
1115: c3 ret
1116: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
111d: 00 00 00
1120: c3 ret
1121: 0f 1f 40 00 nopl 0x0(%rax)
1125: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
112c: 00 00 00 00
1130: f3 0f 1e fa endbr64
1134: e9 67 ff ff ff jmp 10a0 <_start+0x60>
0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: bf 61 00 00 00 mov $0x61,%edi
1142: e8 e9 fe ff ff call 1030 <putchar@plt>
1147: bf 62 00 00 00 mov $0x62,%edi
114c: e8 df fe ff ff call 1030 <putchar@plt>
1151: bf 0a 00 00 00 mov $0xa,%edi
1156: e8 d5 fe ff ff call 1030 <putchar@plt>
115b: b8 00 00 00 00 mov $0x0,%eax
1160: 5d pop %rbp
1161: c3 ret
Disassembly of section .fini:
0000000000001164 <_fini>:
1164: f3 0f 1e fa endbr64
1168: 48 83 ec 08 sub $0x8,%rsp
116c: 48 83 c4 08 add $0x8,%rsp
1170: c3 ret
That’s a lot of code! The part we’re interested in is the main function starting at address 0x1139. We can focus this a bit by telling objdump to only dump the specific symbol we’re interested in (main). We also pass the -f flag to get some additional information about the file:
λ workspace $ objdump --disassemble=main -f main
main: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001040
Disassembly of section .init:
Disassembly of section .plt:
Disassembly of section .text:
0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: bf 61 00 00 00 mov $0x61,%edi
1142: e8 e9 fe ff ff call 1030 <putchar@plt>
1147: bf 62 00 00 00 mov $0x62,%edi
114c: e8 df fe ff ff call 1030 <putchar@plt>
1151: bf 0a 00 00 00 mov $0xa,%edi
1156: e8 d5 fe ff ff call 1030 <putchar@plt>
115b: b8 00 00 00 00 mov $0x0,%eax
1160: 5d pop %rbp
1161: c3 ret
Disassembly of section .fini:
The instruction at address 0x1142 is responsible for printing the character ‘a’ (ASCII 0x61). The instruction prior to that puts the value 0x61 (hex for ‘a’) into the edi register, which is used as an argument to the putchar function. So, we first load the character ‘a’ into edi, then call putchar. putchar looks at edi, sees the value 0x61, and prints ‘a’.
The same can be said for the following two lines. However, at address 0x1147, we load 0x62 (hex for ‘b’) into edi, and at address 0x1151, we load 0x0a (hex for newline) into edi.
So, if we wanted to change the program to print “ac” instead of “ab”, we would need to change the instruction at address 0x1147 to load 0x63 (hex for ‘c’) into edi instead of 0x62. Simple.
To disassemble this into a hex dump, we can use xxd:
λ workspace $ xxd main > main.asm
λ workspace $ cat main.asm
The dump is rather lengthy, so I'll only print out the relevant portion
.... redacted ....
00001050: f050 5445 31c0 31c9 488d 3dda 0000 00ff .PTE1.1.H.=.....
00001060: 155b 2f00 00f4 662e 0f1f 8400 0000 0000 .[/...f.........
00001070: 488d 3da1 2f00 0048 8d05 9a2f 0000 4839 H.=./..H.../..H9
00001080: f874 1548 8b05 3e2f 0000 4885 c074 09ff .t.H..>/..H..t..
00001090: e00f 1f80 0000 0000 c30f 1f80 0000 0000 ................
000010a0: 488d 3d71 2f00 0048 8d35 6a2f 0000 4829 H.=q/..H.5j/..H)
000010b0: fe48 89f0 48c1 ee3f 48c1 f803 4801 c648 .H..H..?H...H..H
000010c0: d1fe 7414 488b 050d 2f00 0048 85c0 7408 ..t.H.../..H..t.
000010d0: ffe0 660f 1f44 0000 c30f 1f80 0000 0000 ..f..D..........
000010e0: f30f 1efa 803d 2d2f 0000 0075 3355 4883 .....=-/...u3UH.
000010f0: 3dea 2e00 0000 4889 e574 0d48 8b3d 0e2f =.....H..t.H.=./
00001100: 0000 ff15 d82e 0000 e863 ffff ffc6 0504 .........c......
00001110: 2f00 0001 5dc3 662e 0f1f 8400 0000 0000 /...].f.........
00001120: c30f 1f40 0066 662e 0f1f 8400 0000 0000 ...@.ff.........
00001130: f30f 1efa e967 ffff ff55 4889 e5bf 6100 .....g...UH...a.
00001140: 0000 e8e9 feff ffbf 6200 0000 e8df feff ........b....... < HERE
00001150: ffbf 0a00 0000 e8d5 feff ffb8 0000 0000 ................
00001160: 5dc3 0000 f30f 1efa 4883 ec08 4883 c408 ].......H...H...
00001170: c300 0000 0000 0000 0000 0000 0000 0000 ................
00001180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
.... redacted ....
In the above hex dump, each line starts with an offset (e.g., 00001050), followed by the hex representation of the bytes, and finally the ASCII representation on the right. To find the instruction at address 0x1147, we need to calculate its offset in the file. The main function starts at 0x1139, so the offset of 0x1147 is 0x1147 - 0x1139 = 0xE (14 in decimal). Specifically, we need to look at he line starting with offset 00001140 and find the 14th byte in that line. I have marked it with < HERE in the above dump. To change this to load 0x63 instead of 0x62, we need to change the byte 62 to 63.
This is the line we’re interested in:
00001140: 0000 e8e9 feff ffbf 6200 0000 e8df feff ........b....... < HERE
And this is what we want to change it to:
00001140: 0000 e8e9 feff ffbf 6300 0000 e8df feff ........b....... < Notice the 63 (0x63, i.e. 'c')
To do this, we can open the hex dump in a text editor, make the change, and then write it back to a binary file using xxd:
λ workspace $ xxd -r main.asm modified_main
We can then run the modified executable to see the results:
λ workspace $ ./modified_main
ac
As you can see, the program now prints “ac” instead of “ab”. By modifying the machine code directly, we were able to change the behavior of the program without recompiling the source code.