文章/答案/技术大牛

发布

社区首页 >问答首页 >从<a href>中抓取python中的url

问从<a href>中抓取python中的url
EN

Stack Overflow用户

提问于 2021-01-08 10:04:09

回答 2查看 93关注 0票数 0

我正在尝试从页面源中抓取链接。

下面是我正在使用的页面源代码的一部分：

<a class="Fl(end) Mt(3px) Cur(p)" href="https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1578447785&amp;period2=1610070185&amp;interval=1d&amp;events=history&amp;includeAdjustedClose=true" download="GOOG.csv"><svg class="Va(m)! Mend(5px) Stk($linkColor)! Fill($linkColor)! Cur(p)" width="15" height="15" viewBox="0 0 48 48" data-icon="download" style="fill: rgb(0, 129, 242); stroke: rgb(0, 129, 242); stroke-width: 0; vertical-align: bottom;"><path d="M43.002 43.002h-38c-1.106 0-2.002-.896-2.002-2v-11c0-1.105.896-2 2.002-2 1.103 0 1.998.895 1.998 2v9h34.002v-9c0-1.105.896-2 2-2s2 .895 2 2v11c0 1.103-.896 2-2 2m-19-8L11.57 23.307c-.75-.748-.75-1.965 0-2.715.75-.75 1.965-.75 2.715 0l7.717 7.716V2h4v26.308l7.717-7.716c.75-.75 1.964-.75 2.714 0s.75 1.967 0 2.715L24.002 35.002z"></path></svg><span>Download</span></a>

所以我基本上要做的就是提取"href“之后的url链接。

这就是我所做的，但没有成功。输出只是[]。如果我能得到帮助，我将不胜感激

hist = BeautifulSoup(requests.get(urlHist).text, 'lxml')
stockHist = hist.find_all('a',{'class': 'Fl(end) Mt(3px) Cur(p)'})

我使用的是BautifulSoup，urlHist是给定页面的url。

urlHist = "https://ca.finance.yahoo.com/quote/GOOG/history?p=GOOG&.tsrc=fin-srch"

python

回答 2

Stack Overflow用户

发布于 2021-01-08 10:19:34

您的输出将获得一个标记列表，其中包含类为"Fl(end) Mt(3px) Cur(p)“的所有"a”标记。从该列表中，您需要以其他方式找到与您要查找的内容相匹配的那个(或者，如果总是只有一个，则将find_all替换为find)，并将"href“作为索引值。例如：

for hist in stockHist:
    if hist.get_text() == "Download":
        url = hist["href"]

票数 0

Stack Overflow用户

发布于 2021-01-08 10:24:29

使用css选择器，尝试

from bs4 import BeautifulSoup

# assuming you have lxml installed and urlHist is set somewhere
soup = BeautifulSoup(requests.get(urlHist).content, 'lxml')
# select the a tag whose text contains the word Download
t = soup.select_one('a:-soup-contains("Download")')
# get the href attribute
desired_url = t.get('href')

如果您使用的是旧版本的bs4/SoupSieve，则可能需要执行以下操作

soup.select_one('a:contains("Download")')

下面是我使用的选择器的docs

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65622458

复制

相似问题

问从<a href>中抓取python中的url
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从<a href>中抓取python中的urlEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从<a href>中抓取python中的url
EN