Dynamic Linking¶
Version History
Date | Description |
---|---|
Mar 19, 2021 | add sth about LD_PRELOAD |
Jan 01, 2021 | Add kernel module loading part |
Dec 24, 2020 | Adopted from my previous note |
This blog looks at some part of the user-space dynamic linker, how kernel loads user program, and how kernel loads kernel modules.
The related code: glibc, kernel execve loader, kernel module loader.
C Start Up (csu)¶
For code pointers, see the glibc code here.
In glibc:
csu/libc-start.c
__libc_start_main()
is the entry point. Inside, it will call__libc_csu_init()
. Then it will call user’smain()
.- Great reference: Linux x86 Program Start Up. I saved a printed PDF copy in this repo.
Dynamic Linking in User Space¶
The dynamic linker/loader ld.so
is part of glibc.
I was particularly interested in how it resolves the dynamic symbols during runtime. I took a brief read of the source code and found some relevant ones.
ld.so¶
ld.so is the dynamic linker/loader:
The programs ld.so find and load the shared objects (shared libraries) needed by a program, prepare the program to run, and then run it.
You can run man ld.so
to see more details.
ld.so is a program, after all. It is part of glibc library.
- ELF’s
.interp
section points to the dynamic linker. During execve(), kernel will jump to ld.so instead of user code entry point. - Related code:
elf/rtld.c
,sysdep/generic
,sysdep/x86_64/
, and more - Inside
dl_main()
, you can see howLD_PRELOAD
is handled. GOT[1]
contains address of thelink_map
data structure.GOT[2]
points to_dl_runtime_resolve()
! This is the runtime dynamic linker entry point.
File sysdep/generic/dl-machine.c
populates GOT[1]
and GOT[2]
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
_dl_runtime_resolve()
is architecture specific and has a mix of assembly and C code.
The flow is similar to the syscall handling: it first saves the registers,
then calling the actual resolver, then restore all saved registers.
For 64bit x86, the source code is in sysdeps/x86_64/dl-trampoline.h
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Bingo, _dl_fixup()
is the final piece of the runtime dynamic linker resolver. We could find it in elf/dl-runtime.c
, which is a file for on-demand PLT fixup.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Understanding this piece of code requires some effort. Happy hacking!
Fun fact about LD_PRELOAD¶
If you use LD_PRELOAD
to run a program,
it will affect popen()
since it will inherit environment variables.
Hence, if you are doing some one time initilization within in your LD_PRELOAD library via, say constructor
marked function,
you should call unsetenv("LD_PRELOAD")
before popen()
call.
Understanding¶
Most recent ELF produced by GCC is slightly different than
the ones described by previous textbook or papers.
The difference is small, though. You should use man elf
to check latest.
- When a program imports a certain function or variable, the linker
will include a string with the function or variable’s name in the
.dynstr
section. - A symbol (Elf Sym) that refers to the function or variable’s name in the
.dynsym
section, and a relocation (Elf Rel) pointing to that symbol in the.rela.plt
section. .rela.dyn
and.rela.plt
are for imported variables and functions, respectively..plt
is the normal one, it has instructions..got
and.got.plt
maybe the first is for variable, and the latter is for function. But essentially the same global offset table functionality.
Relationship among .dynstr
, .dynsym
, .rela.dyn
or .rela.plt
. Credit: link:
PIC Lazy Binding. Credit: link:
Note
GOT and PLT were invented for share libraries, so those libraries can be used by arbitrary processes without changing any of the library text.
However, nowadays, even an non-PIC binary will always have GOT and PLT sections. In theory, it probably should use basic load-time relocation to resolve dynamic symbols (See CSAPP chapter 7 if you are not familiar with this).
I think GOT/PLT are used over load-time relocation technique for the following 2 reasons: a) load-time relocation needs to modify code and this not good during time. Especially considering code section probably is not writable. b) GOT/PLT’s lazy-binding has performance win at start-up time. However, keep in mind that GOT/PLT’s lazy-bindling pay extra runtime cost!
Reading:
How Kernel Loads User Program¶
Kernel loads user program via exec()
or some variations.
This post explained the flow in great details.
Note that kernel can recognize dynamic linking via the .interp
section
and then invoke the dynamic linker ld.so
instead of invoking user ELF binary directly.
How Kernel Loads Kernel Module¶
Kernel can load modules during runtime. Those modules are ELF binaries. Let’s first examine those binaries and see how kernel parses them.
Suppose we have this simple C module code:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Once you compile it into a kernel module, we can examine the binary
by using objdump -dx hello.ko
.
Those highlighted lines mark some of the dynamic linking slots.
They will be patched by basic load-time relocation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
It is also worth checking out the .symtab
section.
If your module is using kernel functions or variables,
the compiler does not know their precise addresses during compile time.
The compiler will add several entries into the .symtab
with properties
marked as GLOBAL, UND
. For example, you can run readelf -s hello.ko
to check that. I will post part of the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The simplify_symbols()
below will find the kernel virtual addresses
for UNDEF symbols in the .symtab
section. Those will further be
used to patch dynamic relocation entries.
Now let us dive into kernel implementation.
The kernel has several system calls for module.
The loading part is using SYSCALL_DEFINE3(init_module)
.
Within that, it calls the big function load_module()
.
In the begining of load_module()
, there are some
usual tasks examining ELF headers, allocating memory etc.
After that, kernel will try to find the addresses for UNDEF symbols:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
After resolving symbols to the real kernel virtual addresses,
the next step is to patch the code to update all the relocation entries.
If will do so for sections with these two types: SHT_REL
and SHT_RELA
.
It looks like x86_64 is only using apply_relocate_add()
, the one with explict addends.
1 2 3 4 5 6 7 8 |
|
Zoom into apply_relocate_add()
, very interesting function.
It is similar to the userspace linker ld.so.
It could be summarized as follows:
- Find the start of the relocation entry section in ELF.
- Walk through each relocation entry, for each entry, do:
- Get the location where we need to patch the code (e.g., the assembly instructions dumped above)
- Find the symbol the entry is using. The symbol was alread resolved to kernel virtual address
- Update the location by applying certain computation on top of the resolved symbol address.
The computation is dictated by entry type (e.g.,
R_X86_64_PLT32
)
See the full kernel code here.
Summary¶
There you have it. We walk through how kernel loads user program, how kernel loads kernel module, and how dynamic linker resolves dynamic linking. The kernel and ld.so share a lot similarities in dealing with the linking process.
We have not covered the static linking part in this post, but its process is similar to how the basic load-time relocation patches instructions.
The essense of linking and loading is to do lazy information binding and pass information along the toolchain. The whole concetps involves many parties, ranging from compiler, linker, and kernel. Each takes its own part in the process.
As always, hope you enjoyed this blog. Happy Hacking!