首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >http响应pythyon中的随机3-4长字符串

http响应pythyon中的随机3-4长字符串
EN

Stack Overflow用户
提问于 2021-03-03 02:37:58
回答 1查看 42关注 0票数 0

我正在尝试使用python中的套接字模块提出一个请求。它成功地发出请求、获取响应并对其进行解码。当我查看HTML文档时,除了HTML文档中有3-4个长的随机字符串外,所有这些都是正确的。我认为我的代码是正确的,但我不能百分之百肯定。这是我的代码:

代码语言:javascript
复制
def recive_data(get, timeout):
  ready = select.select([get], [], [], timeout)
  if ready[0]:
    return get.recv(4096)
  return b""

def get_file(website, port, file, https=False):
  data = []
  new_data = ""

  if https:
    get = ssl.create_default_context().wrap_socket(socket.socket(socket.AF_INET, socket.SOCK_STREAM), server_hostname=website)
  else:
    get = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  get.connect((website, port))
  get.sendall(f"GET {file} HTTP/1.1\r\nHost: {website}:{port}\r\n\r\n".encode())
  while True:
    new_data = recive_data(get, 5).decode()
    if new_data != "" and new_data != None:
      data.append(new_data)
      new_data = ""
    else:
      break

  data = "".join(data)
  header = data[0:data.find(newline+newline)]
  data = data[data.find(newline+newline):data.rfind(f"{newline}0{newline}{newline}")]

  data = BeautifulSoup(data, 'html.parser').prettify()

  get.close()
  return (header, data)

如果我输入https://stackoverflow.com,它就会输出:

代码语言:javascript
复制
30d
<!DOCTYPE html>
<html class="html__responsive html__unpinned-leftnav">
 <head>
  <title>
   Stack Overflow - Where Developers Learn, Share, &amp; Build Careers
  </title>
  <link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196" rel="shortcut icon"/>
  <link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a" rel="apple-touch-icon"/>
  <link href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a" rel="image_src"/>
  <link href="/opensearch.xml" rel="search" title="Stack Overflow" type="application/opensearchdescription+xml"/>
  <meta content="Stack Overflow is the largest, most trusted online communi
20d0
ty for developers to learn, share​ ​their programming ​knowledge, and build their careers." name="description"/>
  <meta content="width=device-width, height=device-height, initial-scale=1.0, minimum-scale=1.0" name="viewport"/>
  <meta content="website" property="og:type">

等等。然而,有些网站有比其他网站更多,我也搞不懂。任何帮助都是非常感谢的!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-03 03:20:07

响应中标题的最后一行为您提供了一个线索:

代码语言:javascript
复制
HTTP/1.1 200 OK
Connection: keep-alive
cache-control: private
...
transfer-encoding: chunked

transfer-encoding的意思是,标题后面的内容并不是纯HTML。来自规格

代码语言:javascript
复制
   The chunked encoding modifies the body of a message in order to
   transfer it as a series of chunks, each with its own size indicator,
   followed by an OPTIONAL trailer containing entity-header fields
...
   The chunk-size field is a string of hex digits indicating the size of
   the chunk. The chunked encoding is ended by any chunk whose size is
   zero, followed by the trailer, which is terminated by an empty line.

换句话说,您看到的是一个十六进制数,显示下一个块中的字节数。可能有不止一大块。您将需要检查该HTTP头,如果它存在,则在将页面解析为HTML之前找到所有块并将它们连接在一起。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66449988

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档