我现在正试着通过编写一个简单的web抓取器来学习C套接字,但是我自己做套接字编程和HTTP请求,目前使用的是套接字库。我已经编写了一个函数,它成功地向http://mirror.vcu.edu发送了一个非SSL请求,并将输出存储在一个名为response的变量中。
char *noSSLRequest(REQUEST_HEADER_INFO *request_header_info) {
struct sockaddr_in serverAddress;
char *requestHeader;
unsigned short serverPort;
char serverIP[13];
domainToIP(request_header_info->host, serverIP);
char *response = calloc(0, 0);
ssize_t bytesReceived = 0;
int sockFD; //Only supporting IPV4 right now, returns file descriptor for socket
if ((sockFD = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < -1) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not open socket in http.c getHTMLBody(). Reason for error %s", strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "LOG: Socket file descriptor is %d" ANSI_COLOR_RESET, sockFD);
serverPort = 80;
memset(&serverAddress, 0, sizeof(serverAddress));
serverAddress.sin_family = AF_INET;
serverAddress.sin_port = htons(80);
inet_aton(serverIP, &serverAddress.sin_addr);
if (connect(sockFD, (const struct sockaddr *) &serverAddress, sizeof(serverAddress)) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not connect socket in http.c getHTMLBody(). Reason for error %s",
strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Connected socket at descriptor %d to IP %s and port %d" ANSI_COLOR_RESET, sockFD,
serverIP, serverPort);
requestHeader = craftRequestHeader(request_header_info);
if (send(sockFD, requestHeader, strlen(requestHeader), 0) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not send request. Reason for error %s",
strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Sent HTTP request from socket at descriptor %d to IP %s and port %d." ANSI_COLOR_RESET,
sockFD,
serverIP, serverPort);
free(requestHeader);
printf(ANSI_COLOR_GREEN "\nLOG: Starting receive operation" ANSI_COLOR_RESET);
ssize_t bytesReceivedPrevious = -1;
char buffer[RESPONSE_BUFFER_SIZE];
while (bytesReceived < (RESPONSE_MAX_LEN * sizeof(char)) && bytesReceived > bytesReceivedPrevious) {
bytesReceivedPrevious = bytesReceived;
bytesReceived = recv(sockFD, buffer, RESPONSE_BUFFER_SIZE, 0);
response = realloc(response, sizeof(*response) + RESPONSE_BUFFER_SIZE);
strcat(response, buffer); //Append to the end, safe because recv takes care of limiting buffer size
}
response = realloc(response, sizeof(*response) + sizeof(char));
response[strlen(response)] = '\0';
printf(ANSI_COLOR_GREEN "\nLOG: Received HTTP response from socket at descriptor %d to IP %s and port %d.\n\n\n\n\n" ANSI_COLOR_RESET,
sockFD,
serverIP, serverPort);
if (close(sockFD) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not close socket in http.c getHTMLBody(). Reason for error %s", strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Closed socket at descriptor %d" ANSI_COLOR_RESET, sockFD);
freeRequestHeaderInfo(request_header_info);
return response;
}一切正常,response有一个空的结束符,生活很好,除了在我的控制台中,出于某种原因,我打印出了response的输出。我感觉好像有什么东西在泄漏,因为这个输出也是绿色的,即使在每次日志之后,我都会将颜色重置为默认值。我知道上面的一些标志和其他东西没有显示出来,我无法在这里获得所有的信息和代码,所以我有一个github repo和更详细的issue。
日志的图片在这里和这个问题上,尽管我不能得到完整的输出,所以非彩色文本版本在这个问题上。

发布于 2018-05-16 04:40:47
这段代码
response = realloc(response, sizeof(response) + RESPONSE_BUFFER_SIZE);这段代码
response = realloc(response, sizeof(response) + sizeof(char));两者都会导致未定义的行为。
response是一个char * -a指针。sizeof() a pointer是指针的大小,而不是它指向的字符串的长度。
还要注意的是,根据定义,sizeof(char)是一个。
https://stackoverflow.com/questions/50358774
复制相似问题