我试图在我抓取的HTML页面中搜索特定的字符串。我在bs4中使用了find_all()方法并提供了字符串参数,但它不起作用。
from bs4 import BeautifulSoup
import requests
def search(soup):
results = soup.find_all(string="Union",recursive=True)
print(len(results))
web_url = "https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pKVGlnQVAB?hl=en-IN&gl=IN&ceid=IN%3Aen"
r = requests.get(web_url)
soup = BeautifulSoup(r.text,'html.parser')
search(soup)我得到len(结果)的输出为零。我的搜索功能有问题吗?
发布于 2020-03-22 02:27:18
在使用字符串参数进行搜索时,强制findall()为每个可导航元素查找精确的匹配项。您可以使用regex逻辑对匹配子字符串进行更简单的搜索。
from bs4 import BeautifulSoup
import requests
import re
def search(soup):
results = soup.find_all(string=re.compile("Union"),recursive=True)
print(len(results))
web_url = "https://news.google.com/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGx6TVdZU0FtVnVHZ0pKVGlnQVAB?hl=en-IN&gl=IN&ceid=IN%3Aen"
r = requests.get(web_url)
soup = BeautifulSoup(r.text,'html.parser')
search(soup)为此我得到了7个匹配
https://stackoverflow.com/questions/60791594
复制相似问题