文章/答案/技术大牛

发布

社区首页 >问答首页 >抓取html页面，并返回len大于8的所有字符串。

问抓取html页面，并返回len大于8的所有字符串。
EN

Stack Overflow用户

提问于 2021-11-24 20:53:20

回答 2查看 48关注 0票数 1

我正在抓取一个页面，在那里我想返回找到的所有字符串。我用的是蟒蛇

我的代码：

import requests
from bs4 import BeautifulSoup as bs

doc = "https://www.kite.com/"
res = requests.get(doc)
 

soup = bs(res.content, "html.parser")
 

tag = soup.body
 

for string in tag.strings:
    stringsOut = string
    print(stringsOut)

我到目前为止是element.navigableString型的。我希望它在一个字符串列表中，其中包含来自页面的所有文本。其中字符串的大小大于8。例如：

result = ['superpowers','languages']

python

beautifulsoup

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-11-24 21:10:11

这个有用吗？

tag = soup.body
my_list = list()
for string in tag.strings:
    for word in string.split(' '):
        if len(word) > 8:
            my_list.append(word)

print(my_list)

票数 1

Stack Overflow用户

发布于 2021-11-24 21:42:25

这就是.stripped_strings发挥作用的地方。因为您可能也想去掉空白空间：

tag = soup.body
print([i for i in tag.stripped_strings if len(i) > 8])

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70102792

复制

相似问题

问抓取html页面，并返回len大于8的所有字符串。
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问抓取html页面，并返回len大于8的所有字符串。EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问抓取html页面，并返回len大于8的所有字符串。
EN