我试图从一个基于关键字的网站获取URL。我只想打印前10个结果(以避免许多请求的错误)。
import urllib
import requests
from bs4 import BeautifulSoup
queries = ["ner", "spacy", "bert", "lda"]
for i in queries:
reqs = requests.get("https://github.com/search?q=" + str(i))
soup = BeautifulSoup(reqs.text, 'html.parser')
for links in soup.select('a'):
print(links.get('href'))我的产出:
https://github.com/
/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2Fsearch&source=header
/features/actions
/features/packages
/features/security
/features/codespaces
/features/copilot
/features/code-review
/features/issues
/features/discussions
/features
https://docs.github.com
https://skills.github.com/我在找一张包含这些词的链接列表.
发布于 2022-11-03 15:02:21
假设您只想获得结果的链接,那么只需选择每个列表项中的第一个链接就可以了:
for e in soup.select('.codesearch-results li'):
print(e.a.get('href'))示例
import requests
from bs4 import BeautifulSoup
queries = ["ner", "spacy", "bert", "lda"]
for i in queries:
reqs = requests.get(f"https://github.com/search?q={i}")
soup = BeautifulSoup(reqs.text, 'html.parser')
for e in soup.select('.codesearch-results li'):
print(e.a.get('href'))输出
/shiyybua/NER
/ryanoasis/nerd-fonts
/preservim/nerdtree
/bmild/nerf
/wavewangyue/ner
/synalp/NER
/preservim/nerdcommenter
/containerd/nerdctl
/NervJS/nerv
/deeppavlov/ner
/explosion/spaCy
/explosion/spacy-course
/explosion/spacy-models
/explosion/spacy-transformers
/chartbeat-labs/textacy
/susanli2016/NLP-with-Python
/explosion/spacy-streamlit
...https://stackoverflow.com/questions/74304979
复制相似问题