內核分析-第7周
劉文學 + 原創作品轉載請註明出處 http://blog.csdn.net/wdxz6547/article/details/51112486 + 《Linux內核分析》MOOC課程http://mooc.study.163.com/course/USTC-1000029000
本文我們想解決的問題:
核心問題
- 一個程序文件(.c, .cpp, .java .go) 文件是怎樣變成二進制文件的.
- 二進制文件是怎樣被載入並運行的.
輔助問題
- 一個二進制文件的格式是怎麽樣的? 不同的語言的二進制文件格式會不同麽? 主要探討 ELF 格式文件
- 靜態鏈接和動態鏈接的差別
- 可運行文件與進程的地址空間的映射關系
一個程序文件(.c, .cpp, .java .go) 文件是怎樣變成二進制文件的
C 文件 –> 預處理 –> 匯編成匯編代碼(.asm) –> 匯編成目標碼(.o) –> 鏈接成可運行文件
- 預處理: 把 include 的文件包括進來及宏定義替換
gcc -E -o hello.cpp hello.c
- 編譯
gcc -x cpp-output -S -o hello.s hello.cpp
- 匯編: 生成二進制文件(之前都是可讀的文本文件, 此步驟生成二進制文件,
包括一些機器指令, 但不是可運行文件)
gcc -x assembler -c hello.s -o hello.o
- 鏈接(ELF 格式文件)
gcc -o hello hello.o //默認動態
gcc -o hello.static hello.o -static //靜態
$ readelf -h hello.o
ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2‘s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 320 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 64 (bytes) Number of section headers: 13 Section header string table index: 10
$ readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2‘s complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400440
Start of program headers: 64 (bytes into file)
Start of section headers: 4504 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 30
Section header string table index: 27
$ readelf -h hello.static
ELF Header:
Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2‘s complement, little endian
Version: 1 (current)
OS/ABI: UNIX - GNU
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x400f4e
Start of program headers: 64 (bytes into file)
Start of section headers: 789968 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 6
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 28
$ ldd hello
linux-vdso.so.1 => (0x00007fff06ffe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc6c2d40000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc6c3125000)
可運行文件格式
具體參考這裏
A.out --> COFF --> PE (Windows)
--> ELF (Linux)
ABI 與目標文件格式關系: 目標文件一般也叫ABI 文件, 實際目標文件已經是二進制兼容的格式(即該二進制文件已經適應到某一種 CPU 體系結構的二進制指令).
ELF
Object 參與程序的鏈接(創建一個程序)和運行(運行一個程序)
Linking View Execution View
============ ==============
ELF header ELF header
Program header table (optional) Program header table
Section 1 Segment 1
… Segment 2
Section n …
Section header table Section header table (optional)
ELF 頭在文件的開頭, 保存了線路圖(road map), 描寫敘述了文件的組織情況
程序頭表告訴系統怎樣創建一個進程的內存映像,
section 頭表: 包括描寫敘述文件 sections 部分, 每一個 section 在這個表中都有一個入口;
每一個入口給出了該 section 的名字, 大小等信息
可運行文件與進程地址空間的映射關系
當創建或添加一個進程映像的時候, 系統理論上將拷貝一個文件的段到一個虛擬的內存段
File Offset File Virtual Address
=========== ==== ===============
0 ELF header
Program header table
Other information
0x100 Text segment 0x8048100
...
0x2be00 bytes 0x8073eff //8048100 + 2be00
0x2bf00 Data segment 0x8074f00
...
0x4e00 bytes 0x8079cff
0x30d00 Other information
...
靜態鏈接的 ELF 可運行文件與進程的地址空間的關系
一般靜態鏈接會將全部的代碼放在一個代碼段
動態鏈接的進程會有多個代碼段
二進制文件是怎樣被載入並運行的
由前面章節的知識推測, 運行一個二進制文件的基本思路:
開啟一個新的進程, 該進程主要工作就是載入並運行可運行文件, 主要包括載入與運行兩部分; 當代碼運行到載入可運行文件的時候, 調用 execve 系統調用. 該調用應該將可運行文件的內容載入到內存而且重置堆棧, sp, ip, 等關鍵寄存器, 之後運行可運行文件裏指定的代碼,這裏必定涉及到寄存器相關的操作.
這裏將以 bash 為例解釋一個程序的運行的過程(其它相似).
- Shell 將命令行參數和環境參數傳遞給Bash 的 main 函數, main 函數將命令行解析後傳遞給系統調用 execve
首先, 我們在 bash 中輸入一個命令
$./hello
因為 bash 也是 C 程序, 因此它也一定有 main 函數. 關於 shell 怎樣到達 execve 的過程略.
假設你想看你運行的程序在 execve 是怎麽運行的,
int execve(const char * filename,char * const argv[ ],char * const envp[ ]);
$ strace ./hello
execve("./hello", ["./hello"], [/* 78 vars */]) = 0
brk(0) = 0xacd000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cc000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=122541, ...}) = 0
mmap(NULL, 122541, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f08182ae000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0817ce7000
mprotect(0x7f0817ea2000, 2093056, PROT_NONE) = 0
mmap(0x7f08180a1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f08180a1000
mmap(0x7f08180a7000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f08180a7000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ad000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182ab000
arch_prctl(ARCH_SET_FS, 0x7f08182ab740) = 0
mprotect(0x7f08180a1000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ) = 0
mprotect(0x7f08182ce000, 4096, PROT_READ) = 0
munmap(0x7f08182ae000, 122541) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 10), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f08182cb000
write(1, "hello kernel", 12hello kernel) = 12
exit_group(0) = ?
+++ exited with 0 +++
main 實際調用 execve 系統調用完畢命令運行
http://code.woboq.org/linux/linux/fs/exec.c.html#1628
SYSCALL_DEFINE3(execve,
const char __user *, filename,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
{
return do_execve(getname(filename), argv, envp);
}
http://code.woboq.org/linux/linux/fs/exec.c.html#do_execve
int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp)
{
struct user_arg_ptr argv = { .ptr.native = __argv }; //復制環境變量和參數信息
struct user_arg_ptr envp = { .ptr.native = __envp };
return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
}
/*
* sys_execve() executes a new program.
*/
static int do_execveat_common(int fd, struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp,
int flags)
{
file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
sched_exec();
...
retval = copy_strings(bprm->envc, envp, bprm);
if (retval < 0)
goto out;
retval = copy_strings(bprm->argc, argv, bprm);
if (retval < 0)
goto out;
retval = exec_binprm(bprm);
if (retval < 0)
goto out;
...
}
static int exec_binprm(struct linux_binprm *bprm)
{
pid_t old_pid, old_vpid;
int ret;
/* Need to fetch pid before load_binary changes it */
old_pid = current->pid;
rcu_read_lock();
old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
rcu_read_unlock();
ret = search_binary_handler(bprm);
if (ret >= 0) {
audit_bprm(bprm);
trace_sched_process_exec(current, old_pid, bprm);
ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
proc_exec_connector(current);
}
return ret;
}
int search_binary_handler(struct linux_binprm *bprm) {
...
list_for_each_entry(fmt, &formats, lh) {
if (!try_module_get(fmt->module))
continue;
read_unlock(&binfmt_lock);
bprm->recursion_depth++;
retval = fmt->load_binary(bprm);
read_lock(&binfmt_lock);
put_binfmt(fmt);
bprm->recursion_depth--;
if (retval < 0 && !bprm->mm) {
/* we got to flush_old_exec() and failed after it */
read_unlock(&binfmt_lock);
force_sigsegv(SIGSEGV, current);
return retval;
}
if (retval != -ENOEXEC || !bprm->file) {
read_unlock(&binfmt_lock);
return retval;
}
}
...
}
http://code.woboq.org/linux/linux/include/linux/binfmts.h.html#linux_binfmt
/*
* This structure defines the functions that are used to load the binary formats that
* linux accepts.
*/
struct linux_binfmt {
struct list_head lh;
struct module *module;
int (*load_binary)(struct linux_binprm *);
int (*load_shlib)(struct file *);
int (*core_dump)(struct coredump_params *cprm);
unsigned long min_coredump; /* minimal dump size */
};
http://code.woboq.org/linux/linux/fs/binfmt_elf.c.html#1089
static struct linux_binfmt elf_format = {
.module = THIS_MODULE,
.load_binary = load_elf_binary,
.load_shlib = load_elf_library,
.core_dump = elf_core_dump,
.min_coredump = ELF_EXEC_PAGESIZE,
};
static int load_elf_binary(struct linux_binprm *bprm)
{
...
start_thread(regs, elf_entry, bprm->p);
retval = 0;
...
}
start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
{
start_thread_common(regs, new_ip, new_sp,
__USER_CS, __USER_DS, 0);
}
http://code.woboq.org/linux/linux/arch/x86/kernel/process_64.c.html#start_thread_common
static void
start_thread_common(struct pt_regs *regs, unsigned long new_ip,
unsigned long new_sp,
unsigned int _cs, unsigned int _ss, unsigned int _ds)
{
loadsegment(fs, 0);
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
regs->ip = new_ip;
regs->sp = new_sp;
regs->cs = _cs;
regs->ss = _ss;
regs->flags = X86_EFLAGS_IF;
force_iret();
}
眼下 Linux 支持的二進制格式
binfmt_script - support for interpreted scripts that are starts from the #! line;
static struct linux_binfmt script_format = {
.module = THIS_MODULE,
.load_binary = load_script,
};
binfmt_misc - support different binary formats, according to runtime configuration of the Linux kernel;
binfmt_misc detects binaries via a magic or filename extension and invokes a specified wrapper. This
should obsolete binfmt_java, binfmt_em86 and binfmt_mz.
static struct linux_binfmt misc_format = {
.module = THIS_MODULE,
.load_binary = load_misc_binary,
};
binfmt_elf - support elf format;
binfmt_aout - support a.out format;
static struct linux_binfmt script_format = {
.module = THIS_MODULE,
.load_binary = load_script,
};
binfmt_flat - support for flat format;
binfmt_elf_fdpic - Support for elf FDPIC binaries;
som_format - support som format used by HP-UX.;
static struct linux_binfmt som_format = {
.module = THIS_MODULE,
.load_binary = load_som_binary,
.load_shlib = load_som_library,
.core_dump = som_core_dump,
.min_coredump = SOM_PAGESIZE
};
flat_format : support flat_format
static struct linux_binfmt flat_format = {
.module = THIS_MODULE,
.load_binary = load_flat_binary,
.core_dump = flat_core_dump,
.min_coredump = PAGE_SIZE
};
binfmt_em86 - support for Intel elf binaries running on Alpha machines.
static struct linux_binfmt em86_format = {
.module = THIS_MODULE,
.load_binary = load_em86,
};
elf_fdpic_format :
static struct linux_binfmt elf_fdpic_format = {
.module = THIS_MODULE,
.load_binary = load_elf_fdpic_binary,
#ifdef CONFIG_ELF_CORE
.core_dump = elf_fdpic_core_dump,
#endif
.min_coredump = ELF_EXEC_PAGESIZE,
};
各種格式通過 register_binfmt 註冊
execve -> do_execve -> do_execveat_common -> exec_binprm –> search_binary_handler
–> load_elf_binary -> start_thread –> start_thread_common
當中 start_thread_common 通過改動內核 EIP 作為程序新的起點.
可運行文件與進程的地址空間的映射關系
相應 ELF 文件能夠參考 load_elf_library 函數
靜態鏈接和動態鏈接(動態庫)的關系
鏈接是將各種代碼和數據部分收集起來並組合成為一個單一文件的過程。這個文件能夠被載入
(或者拷貝)到存儲器並運行。
如今的鏈接是由叫做鏈接器的程序自己主動運行。
鏈接能夠分為三種情形:1、編譯時鏈接,也就是我們常說的靜態鏈接;2、裝載時鏈接;3、運行時鏈接。裝載時鏈接和運行時鏈接合稱為動態鏈接。
靜態鏈接
以一組可重定位目標文件和命令行參數作為輸入,生成一個全然鏈接的能夠載入和運行的可運行目標文件作為輸出。
鏈接器有兩個任務:
a) 符號解析:目標文件定義和引用符號。
b) 重定位:編譯器和匯編器生成從地址0開始的代碼和數據節。鏈接後可運行文件裏的各個段的虛擬地址都已經確定。鏈接器就改動全部對這些符號的引用,從而重定位這些節。
目標文件
a) 可重定位目標文件:包括二進制代碼和數據。(形式name.o)
b) 可運行目標文件:包括二進制代碼和數據。能夠復制到存儲器並運行。(形式name.out)
c) 共享目標文件:一種特殊類型的可重定位目標文件,能夠在載入或者運行時被動態地載入到存儲器並鏈接。
是由內核負責載入可運行程序依賴的動態鏈接庫麽?
不是, 由 ld 程序
動態鏈接
動態鏈接分為可運行程序裝載時動態鏈接和運行時動態鏈接,例如以下代碼演示了這兩種動態鏈接。
$ ls
dllibexample.c dllibexample.h main.c shlibexample.c shlibexample.h
$ gcc -fPIC -shared shlibexample.c -o libshlibexample.so
$ gcc -fPIC -shared dllibexample.c -o libdllibexample.so
$ gcc main.c -o main -L . -lshlibexample -ldl
$ ./main
This is a Main program!
Calling SharedLibApi() function of libshlibexample.so!
This is a shared libary!
Calling DynamicalLoadingLibApi() function of libdllibexample.so!
This is a Dynamical Loading libary!
調試
靜態連接程序跟蹤
qemu-system-x86_64 -kernel ../linux-3.18.6/arch/x86/boot/bzImage -initrd ../rootfs.img -S -s
(gdb) file ../linux-3.18.6/vmlinux
Reading symbols from ../linux-3.18.6/vmlinux…done.
(gdb) remote target:1234
Undefined remote command: “target:1234”. Try “help remote”.
(gdb) target remote:1234
Remote debugging using :1234
0x0000000000000000 in irq_stack_union ()
(gdb) b sys_execve
Breakpoint 1 at 0xffffffff811626f0: file fs/exec.c, line 1604.
(gdb) b load_elf_binary
Breakpoint 2 at 0xffffffff811aa260: load_elf_binary. (2 locations)
(gdb) b start_thread
Breakpoint 3 at 0xffffffff810013b0: file arch/x86/kernel/process_64.c, line 249.
動態連接程序跟蹤
TODO
總結:
可運行程序的裝載是一個系統調用。
可運行程序運行時。由execve系統調用後便陷入到內核態裏。而後載入可運行文件,把當前進程的可運行程序覆蓋掉。當execve系統調用返回的時,返回的則是新的可運行程序。
新的程序仍然有同樣的PID。而且繼承了調用execve函數時已打開的全部的文件描寫敘述符。
內核分析-第7周