我简化了这一小段代码,我希望它是一个无限循环:
#include <stddef.h>
int main(int argc, char* argv[]) {
for(int i = 0; i < argc; i++) {
main(argc, NULL);
}
}(是否传递argv并不重要。无论如何,编译器倾向于优化它。)
但是,在clang 9.0.1和gcc 9.2.0下,上述代码都会出现地址边界错误。
看一看asm (我抛弃了here),我仍然没有看到任何会导致这一切变得混乱的东西。
发布于 2020-03-29 01:38:56
如果您使用的是gdb,那么很容易忽略这样一个事实:每次调用都会使用一个新的堆栈框架。默认情况下,gdb只为main显示一个堆栈帧,无论执行多少次递归:
$ cat recursive_main.c
#include <stddef.h>
int main(int argc, char* argv[]) {
for(int i = 0; i < argc; i++) {
main(argc, NULL);
}
}
$ clang-9 -o recursive_main -Wall -g recursive_main.c
$ ./recursive_main
Segmentation fault (core dumped)
$ gdb -q ./recursive_main
Reading symbols from ./recursive_main...done.
(gdb) break main
Breakpoint 1 at 0x4004b6: file recursive_main.c, line 4.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>bt
>end
(gdb) r
Starting program: /home/rici/src/tmp/recursive_main
Breakpoint 1, main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:4
(gdb) c
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x0) at recursive_main.c:4
(gdb)
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x0) at recursive_main.c:4
(gdb)
Continuing.但是如果我们打印出每个入口的堆栈指针,我们可以看到它每次都是递减的:
$ gdb -q ./recursive_main
Reading symbols from ./recursive_main...done.
(gdb) break main
Breakpoint 1 at 0x4004b6: file recursive_main.c, line 4.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>info r esp
>end
(gdb) r
Starting program: /home/rici/src/tmp/recursive_main
Breakpoint 1, main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
esp 0xffffddc0 -8768
(gdb) c
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
esp 0xffffdd90 -8816
(gdb)
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
esp 0xffffdd60 -8864
(gdb)
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
esp 0xffffdd30 -8912
(gdb) 因此,每次递归调用都会将堆栈扩展0x30 (48)字节。
这种奇怪行为的原因是gdb在命中main时故意结束回溯。它之所以这样做,是因为可执行文件的真正入口点不是main,而是一些与平台相关的代码,这些代码设置所有内容,以便可以调用main,然后调用main。因此,gdb并不真正知道堆栈“从哪里开始”。或者更确切地说,它知道可执行文件的堆栈从哪里开始,但它不知道程序的堆栈从哪里开始。在每个回溯中包含可执行文件的设置代码中的函数会让人有点困惑,所以默认情况下,gdb在遇到入口点为main的帧时会停止遍历堆栈。如果你知道这个选项,你可以控制它:
(gdb) help set backtrace past-main
Set whether backtraces should continue past "main".
Normally the caller of "main" is not of interest, so GDB will terminate
the backtrace at "main". Set this variable if you need to see the rest
of the stack trace.通过设置选项,您可以看到与对main的递归调用相对应的各种堆栈框架
$ gdb -q ./recursive_main
Reading symbols from ./recursive_main...done.
(gdb) set backtrace past-main 1
(gdb) break main
Breakpoint 1 at 0x4004b6: file recursive_main.c, line 4.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>bt
>end
(gdb) r
Starting program: /home/rici/src/tmp/recursive_main
Breakpoint 1, main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:4
#1 0x00007ffff7a05b97 in __libc_start_main (main=0x4004a0 <main>, argc=1, argv=0x7fffffffdec8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdeb8) at ../csu/libc-start.c:310
#2 0x00000000004003da in _start ()
(gdb) c
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x0) at recursive_main.c:4
#1 0x00000000004004d5 in main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:5
#2 0x00007ffff7a05b97 in __libc_start_main (main=0x4004a0 <main>, argc=1, argv=0x7fffffffdec8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdeb8) at ../csu/libc-start.c:310
#3 0x00000000004003da in _start ()
(gdb)
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x0) at recursive_main.c:4
#1 0x00000000004004d5 in main (argc=1, argv=0x0) at recursive_main.c:5
#2 0x00000000004004d5 in main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:5
#3 0x00007ffff7a05b97 in __libc_start_main (main=0x4004a0 <main>, argc=1, argv=0x7fffffffdec8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdeb8) at ../csu/libc-start.c:310
#4 0x00000000004003da in _start ()
(gdb)
Continuing.
Breakpoint 1, main (argc=1, argv=0x0) at recursive_main.c:4
4 for(int i = 0; i < argc; i++) {
#0 main (argc=1, argv=0x0) at recursive_main.c:4
#1 0x00000000004004d5 in main (argc=1, argv=0x0) at recursive_main.c:5
#2 0x00000000004004d5 in main (argc=1, argv=0x0) at recursive_main.c:5
#3 0x00000000004004d5 in main (argc=1, argv=0x7fffffffdec8) at recursive_main.c:5
#4 0x00007ffff7a05b97 in __libc_start_main (main=0x4004a0 <main>, argc=1, argv=0x7fffffffdec8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdeb8) at ../csu/libc-start.c:310
#5 0x00000000004003da in _start ()
(gdb) 但是,虽然了解这个gdb选项可能很好(我直到15分钟前才知道它),但这并不是真正必要的。您可以在链接的反汇编中看到在偏移量1120处创建堆栈帧的代码,不过在-S输出中更容易看到(或者使用http://gcc.godbolt上的便捷服务):
0000000000001120 :
1120: 55 push %rbp
1121: 48 89 e5 mov %rsp,%rbp
1124: 48 83 ec 20 sub $0x20,%rsp
1128: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
112f: 89 7d f8 mov %edi,-0x8(%rbp)
1132: 48 89 75 f0 mov %rsi,-0x10(%rbp)
1136: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
113d: 8b 45 ec mov -0x14(%rbp),%eax
1140: 3b 45 f8 cmp -0x8(%rbp),%eax
1143: 0f 8d 1a 00 00 00 jge 1163
1149: 31 c0 xor %eax,%eax
114b: 89 c6 mov %eax,%esi
114d: 8b 7d f8 mov -0x8(%rbp),%edi
1150: e8 cb ff ff ff callq 1120 正如您所看到的,在进入main时(位于所示的偏移量1120处),第一个%rbp被push到堆栈上,导致%esp递减8(对于64位模式)。然后堆栈指针再递减0x20 (32),留出空间来保存将要使用的寄存器(包括用于向被调用函数传递参数的寄存器,以及用于存储i的值的寄存器)。最后(经过一些工作),执行偏移量1150处的callq指令,该指令将下一条指令的地址压入堆栈,占用另外8个字节。
因此,在每次递归调用时都会推送一个48字节的堆栈帧。由于递归永远不会终止,因此最终必须命中堆栈之前的受保护页面,此时会发出segfault信号。
请注意,在任何积极的优化级别,clang都不会发生这种情况:
$ clang-9 -o recursive_main -Wall -g -O1 recursive_main.c
$ ./recursive_main
$ gdb -q ./recursive_main
Reading symbols from ./recursive_main...done.
(gdb) disass main
Dump of assembler code for function main:
0x00000000004004a0 <+0>: xor %eax,%eax
0x00000000004004a2 <+2>: retq
End of assembler dump.在这里,编译器利用了标准的要求(在§6.8.5/6中,见下文),即没有可观察到的影响的循环可以被假定为终止;在这种情况下,编译器假定它立即终止,这是合法的,因为在循环最终终止之前不会发生任何变化。
顺便说一句,GCC似乎没有执行这种优化,所以无论优化级别如何,它都会出现seem。至少,这是我的测试中发生的事情。
标准C,§6.8.5:
发布于 2020-03-28 15:42:13
由于没有前进的进展,程序会导致未定义的行为。C11 6.8.5/6:
对于其控制表达式不是常量表达式、不执行输入/输出操作、不访问易失性对象、并且在其主体中不执行同步或原子操作的迭代语句,控制表达式或(在
语句的情况下)其表达式-3可被实现假定为终止。
因此,编译器可以假定for循环终止。由于循环的执行实际上永远不会终止,因此行为不是由省略定义的,因此任何事情都可能发生。
https://stackoverflow.com/questions/60897024
复制相似问题