为了获取网站中的所有标签,我有一段代码:
results=[]
all_links = soup.find_all('article')
for link in all_links:
print link.find('div', class_="cb-category cb-byline-element")通过这种方式,我可以得到以以下方式显示的数据(使用',',分离<a>标记):
<div class="cb-category cb-byline-element"><i class="fa fa-folder-o"></i> <a href="http://ridethetempo.com/category/canadian/" title="View all posts in Canadian">Canadian</a>, <a href="http://ridethetempo.com/category/music/garage-rock/" title="View all posts in Garage">Garage</a>, <a href="http://ridethetempo.com/category/listen-2/" title="View all posts in Listen">Listen</a>, <a href="http://ridethetempo.com/category/music/" title="View all posts in Music">Music</a>, <a href="http://ridethetempo.com/category/music/psychedelic/" title="View all posts in Psychedelic">Psychedelic</a>, <a href="http://ridethetempo.com/category/under-2000/" title="View all posts in Under 2000">Under 2000</a></div>但是,如果我这样做的话:
results.append(link.find('div', class_="cb-category cb-byline-element"))
for link in results:
link.find('a', href=True)['href']我只为每个<div>块获得第一个<div>,如下所示:
http://ridethetempo.com/category/canadian/如何递归地检索所有<a>标记,最后得到这个结果?
http://ridethetempo.com/category/canadian/
http://ridethetempo.com/category/music/garage-rock/
http://ridethetempo.com/category/listen-2/
http://ridethetempo.com/category/music/
http://ridethetempo.com/category/music/psychedelic/
http://ridethetempo.com/category/under-2000/发布于 2017-04-20 04:37:50
for link in soup.find_all('a'):
print(link.get('href'))将打印所有“a”标记元素。
https://stackoverflow.com/questions/43510031
复制相似问题