我在Ubuntu12.04.1LTS64位、2CPU和4 4GB内存上运行httperf 0.9.0 (downloaded from Google Code)。我正在尝试对web服务器进行基准测试,但遇到了以下缓冲区溢出问题。
终端命令:
httperf --timeout=5 --client=0/1 --server=localhost --port=9090 --uri=/?value=benchmarks --rate=1200 --send-buffer=4096 --recv-buffer=16384 --num-conns=5000 --num-calls=10在运行几秒钟后,它崩溃了:
*** buffer overflow detected ***: httperf terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f1f5efa1007]
/lib/x86_64-linux-gnu/libc.so.6(+0x107f00)[0x7f1f5ef9ff00]
/lib/x86_64-linux-gnu/libc.so.6(+0x108fbe)[0x7f1f5efa0fbe]
httperf[0x404054]
httperf[0x404e9f]
httperf[0x406953]
httperf[0x406bd1]
httperf[0x40639f]
httperf[0x4054d5]
httperf[0x40285e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f1f5eeb976d]
httperf[0x4038f1]
======= Memory map: ========
...
...
7f1f5fd74000-7f1f5fd79000 rw-p 00000000 00:00 0
7f1f5fd91000-7f1f5fd95000 rw-p 00000000 00:00 0
7f1f5fd95000-7f1f5fd96000 r--p 00022000 08:03 4849686 /lib/x86_64-linux-gnu/ld-2.15.so
7f1f5fd96000-7f1f5fd98000 rw-p 00023000 08:03 4849686 /lib/x86_64-linux-gnu/ld-2.15.so
7fff10452000-7fff10473000 rw-p 00000000 00:00 0 [stack]
7fff1054f000-7fff10550000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted我用gdb检查了the core dump file,如下所示:
(gdb) list
198 event_signal (EV_PERF_SAMPLE, 0, callarg);
199
200 /* prepare for next sample interval: */
201 perf_sample_start = timer_now ();
202 timer_schedule (perf_sample, regarg, RATE_INTERVAL);
203 }
204
205 int
206 main (int argc, char **argv)
207 {
(gdb) bt
#0 0x00007f33d4643445 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f33d4646bab in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f33d4680e2e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f33d4716007 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f33d4714f00 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f33d4715fbe in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000404054 in set_active (s=<optimized out>, fdset=0x612bc0) at core.c:367
#7 0x0000000000404e9f in core_connect (s=0x17e7100) at core.c:980
#8 0x0000000000406953 in make_conn (arg=...) at conn_rate.c:64
#9 0x0000000000406bd1 in tick (t=<optimized out>, arg=...) at rate.c:94
#10 0x000000000040639f in timer_tick () at timer.c:104
#11 0x00000000004054d5 in core_loop () at core.c:1255
#12 0x000000000040285e in main (argc=11, argv=<optimized out>) at httperf.c:971我跟踪了一下源代码,发现FD_SET似乎是原因。
最后,对于较低的费率(例如--rate=100或--rate=500),httperf可以很好地工作。我正在对不同的web服务器进行基准测试,导致崩溃的比率是不同的。我的价格从100到1200不等。
有关更多细节,实际上我正在尝试重复the experiments done by Roberto Ostinelli,我已经调整了TCP设置并应用了他的博客文章中提到的补丁。
你知道是什么导致了这个问题吗?谢谢!
发布于 2012-11-03 12:26:54
您正在尝试使用大于1024的fd。在低负载的情况下,您不需要/使用那么多fds。在高负载的情况下,你需要更多的fds,最终会达到1024,这就导致了问题。
即使当我增加__FD_SETSIZE时,我也会遇到这个问题,所以我认为在执行边界检查的任何代码中实际上都存在一个错误(gcc/llvm?)
发布于 2014-10-16 07:11:07
较新版本的glibc在内部对FD_SET执行自己的检查(从httperf调用),这些检查失败,导致中止。尽管httperf是使用不同的__FD_SET_SIZE构建的,但glibc仍然使用编译时使用的原始a。
为了解决这个问题,我从__FD_ELT执行检查之前找出了sys/select.h和bits/select.h的旧版本,并将它们放入httperf的src/目录(在sys/和bits中)。通过这种方式,httperf使用旧的FD_SET宏,这些宏不执行导致中止的检查。我使用的是glibc-2.14.1,但是任何没有bits/select2.h的版本都应该可以。
我的一个朋友正在收集httperf的这个补丁和其他补丁,供我们自己使用(还有您的!)在https://github.com/klueska/httperf
发布于 2012-12-09 00:10:41
我遇到了类似的崩溃,在我的例子中,用epoll()替换select()解决了这个问题。
我只能在发布编译中重现这个问题(我使用eclipse作为devenv,我让eclipse为debug和release设置编译器选项)。
下面是崩溃的原因:
1354813976/SBNotificationServer terminated
06/12 21:13:54 > ======= Backtrace: =========
06/12 21:13:54 > /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f4e10f90807]
06/12 21:13:54 > /lib/x86_64-linux-gnu/libc.so.6(+0x109700)[0x7f4e10f8f700]
06/12 21:13:54 > /lib/x86_64-linux-gnu/libc.so.6(+0x10a7be)[0x7f4e10f907be]
06/12 21:13:54 > 1354813976/SBNotificationServer[0x49db90]
06/12 21:13:54 > 1354813976/SBNotificationServer[0x49de05]
06/12 21:13:54 > 1354813976/SBNotificationServer[0x4a4b07]
06/12 21:13:54 > 1354813976/SBNotificationServer[0x4a5318]
06/12 21:13:54 > 1354813976/SBNotificationServer[0x4a2628]
06/12 21:13:54 > /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f4e10c70e9a]
06/12 21:13:54 > /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4e10f79cbd]
06/12 21:13:54 > ======= Memory map: ========
06/12 21:13:54 > 00400000-00507000 r-xp 00000000 ca:01 141328 /usr/share/spotbros/SBNotServer/1354813976/SBNotificationServer
06/12 21:13:54 > 00706000-00707000 r--p 00106000 ca:01 141328 /usr/share/spotbros/SBNotServer/1354813976/SBNotificationServer
06/12 21:13:54 > 00707000-00708000 rw-p 00107000 ca:01 141328 /usr/share/spotbros/SBNotServer/1354813976/SBNotificationServer
06/12 21:13:54 > 00708000-0070d000 rw-p 00000000 00:00 0
06/12 21:13:54 > 0120d000-01314000 rw-p 00000000 00:00 0 [heap]
06/12 21:13:54 > 7f49f8000000-7f49f8021000 rw-p 00000000 00:00 0
.......
.......在此之后,我能够生成一个核心转储,并且我得到:
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./SBNotificationServer ../config.xml'.
Program terminated with signal 6, Aborted.
0 0x00007feb90d67425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
(gdb) bt
0 0x00007feb90d67425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
1 0x00007feb90d6ab10 in abort () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007feb90da539e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
3 0x00007feb90e3b807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6
4 0x00007feb90e3a700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
5 0x00007feb90e3b7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6
6 0x000000000049e290 in CPhpResponseReader::WaitUntilReadable(int, int&, bool&) ()
7 0x000000000049e505 in CPhpResponseReader::Read(CReallocableBuffer&, int) ()
8 0x00000000004a5207 in CHttpPostBufferInterface::Flush() ()
9 0x00000000004a5a18 in CPhpRequestJob::Execute() ()
10 0x00000000004a2d28 in CThreadPool::Worker(void*) ()
11 0x00007feb90b1be9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
12 0x00007feb90e24cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6更多详细信息:
WaitUntilReadable (参见回溯)是一个函数,它主要使用select()来等待从see服务器读取一些数据。我使用epoll()重写了该函数。
我的程序是一个与数千个客户端保持连接的服务器。客户端向服务器发出请求,然后服务器将这些请求传递给The服务器,并将响应发送回客户端。对于对For服务器的请求,我的服务器有一个线程池。所以这里有另一个重要的细节:如果我将线程池的线程数设置为一个很高的数字,就会发生崩溃,这意味着大量select()调用正在并发进行(我将其设置为1024个线程)。然后,我启动了一个用于测试的客户端模拟器,并启动了1000个客户端线程,以尽可能快的速度发出请求。这使得它在使用select()和在发行版中编译时崩溃得很快,但在引入epoll()之后没有问题。
请注意,池中的每个线程都使用1个fd进行select()调用。
下面是我为解决这个问题而更改的函数:
bool WaitUntilReadableDeprecated(int timeoutMSecs, int& elapsedMSecs, bool& closed)
{
CTimeLapsus t;
fd_set fileDescriptorSet;
struct timeval timeStruct;
closed = false;
timeStruct.tv_sec = timeoutMSecs / 1000;
timeStruct.tv_usec = 0;
FD_ZERO(&fileDescriptorSet);
FD_SET(m_socket, &fileDescriptorSet);
int sel = select(m_socket + 1, &fileDescriptorSet, NULL, NULL, &timeStruct);
if(sel == 0)
{
LogDebug("[CPhpResponseReader::WaitUntilReadable] select() returned 0, no data available");
elapsedMSecs = t.GetElapsedMilis();
return false;
}
else if(sel == -1)
{
if(errno == EBADF)
{
LogDebug("[CPhpResponseReader::WaitUntilReadable] select() returned -1, errno is EBADF, connection reset by host?");
closed = true;
elapsedMSecs = t.GetElapsedMilis();
return true;
}
throw "CPhpResponseReader::WaitUntilReadable select error";
}
elapsedMSecs = t.GetElapsedMilis();
return true;
}
bool WaitUntilReadableEpoll(int timeoutMSecs, int& elapsedMSecs, bool& closed)
{
CIoPoller poller(8);
CTimeLapsus t;
closed = false;
if(poller.Add(m_socket, EPOLLIN) == -1)
LogError("[CPhpResponseReader::WaitUntilReadableEpoll] poller.Add(%d, EPOLLIN) failed", m_socket);
int nfds = poller.Wait(timeoutMSecs);
if (nfds > 0)
{
int theSocket = poller.GetEvents()[0].data.fd;
uint32_t event = poller.GetEvents()[0].events;
if(theSocket != m_socket)
{
LogError("[CPhpResponseReader::WaitUntilReadableEpoll] socket is different than expected", m_socket);
elapsedMSecs = t.GetElapsedMilis();
return false;
}
if((event & EPOLLERR) || (event & EPOLLHUP))
{
LogWarning("[CPhpResponseReader::WaitUntilReadableEpoll] Disconnected socket %d (event %d)", m_socket, event);
elapsedMSecs = t.GetElapsedMilis();
closed = true;
return false;
}
if(event & EPOLLIN)
{
// ok
}
}
else if (nfds == -1)
{
if(errno == EBADF)
{
LogWarning("[CPhpResponseReader::WaitUntilReadableEpoll] poller.Wait() returned -1, errno is EBADF, maybe connection reset by host");
closed = true;
elapsedMSecs = t.GetElapsedMilis();
return true;
}
LogError("[CPhpResponseReader::WaitUntilReadableEpoll] poller.Wait() failed");
elapsedMSecs = t.GetElapsedMilis();
closed = true;
return false;
}
else
{
LogDebug("[CPhpResponseReader::WaitUntilReadableEpoll] poller.Wait() returned 0, no data available");
elapsedMSecs = t.GetElapsedMilis();
return false;
}
elapsedMSecs = t.GetElapsedMilis();
return true;
}CIoPoller只是一个c++包装器。
Ubuntu版本:
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04https://stackoverflow.com/questions/12583955
复制相似问题