我正在尝试解析来自URL:https://apptopia.com/store-insights/top-charts/google-play/comics/united-states的一些数据。
我能够从bs4.element.Tag中提取文本和href。但是,输出是串联的。
下面是我的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://apptopia.com/store-insights/top-charts/google-play/comics/united-states").read()
soup = BeautifulSoup(html, 'xml')
app_info_lst = soup.find_all("div", {"class": "media-object app-link-block"})
###############################################################
# print first element in this tag:
print(app_info_lst[0])
>>><div class="media-object app-link-block" href="https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence"><div class="media-figure"><img alt="" class="img-rounded img-fluid" src="https://d1nxzqpcg2bym0.cloudfront.net/google_play/com.naver.linewebtoon/5739f736-9f84-11e9-9bdb-4f6f6db47610/64x64"/></div><div class="media-body"><p class="text-truncate app-name m-b-0 l-h-md"><a href="https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence" title="WEBTOON">WEBTOON</a></p><p class="text-truncate app-publisher text-xxxs m-b-0 l-h-md"><a class="text-muted" href="/publishers/google_play/2457079" title="WEBTOON ENTERTAINMENT">WEBTOON ENTERTAINMENT</a></p></div></div>
###############################################################
# My actual output:
print(app_info_lst[0].get_text(strip=True))
>>>'WEBTOONWEBTOON ENTERTAINMENT'
print(app_info_lst[0].get('href'))
>>>'https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence'但是,我的预期输出是:
print(app_info_lst[0].get_text(strip=True))
>>>['WEBTOON', 'WEBTOON ENTERTAINMENT']
print(app_info_lst[0].get('href'))
>>>['https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence', '/publishers/google_play/2457079']我该怎么做呢?任何建议/帮助都是非常感谢的!谢谢!
发布于 2021-11-22 21:06:58
要生成两个包含信息的列表,可以使用list comprehension。
链接:
[x['href'] for x in soup.select('table a')]
['https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence', '/publishers/google_play/2457079', 'https://apptopia.com/google-play/app/com.progdigy.cdisplay/intelligence', '/publishers/google_play/1643949',...]文本:
[x.text for x in soup.select('table a')]
['WEBTOON','WEBTOON ENTERTAINMENT','CDisplayEx Comic Reader','Progdigy Software',...]在我看来,使用字典列表要好得多:
[{'href':x['href'],'title':x.text} for x in soup.select('table a')]
[{'href': 'https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence', 'title': 'WEBTOON'}, {'href': '/publishers/google_play/2457079', 'title': 'WEBTOON ENTERTAINMENT'}, {'href': 'https://apptopia.com/google-play/app/com.progdigy.cdisplay/intelligence', 'title': 'CDisplayEx Comic Reader'}, {'href': '/publishers/google_play/1643949', 'title': 'Progdigy Software'}, {'href': 'https://apptopia.com/google-play/app/com.naver.linewebtoon/intelligence', 'title': 'WEBTOON'},...]https://stackoverflow.com/questions/70072057
复制相似问题