我一直在从angelist csv文件中抓取公司数据,我想获得创始人的名字,姓氏和角色头衔。为此,我使用了美丽汤的请求。我想我对soup.select做错了什么。
这是当前嵌套的类树。
-founders section
--section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm
---dsr31 startup_roles fsp87 startup_profile_group _a _jm
----ul.larger roles
-----li.role
------<<dynamic div>>
-------g-lockup top larger
--------photo
--------text
---------name
---------role_title
---------bio下面是示例页面URL https://angel.co/dealflicks
import requests
from bs4 import BeautifulSoup, element
req = requests.get('https://angel.co/dealflicks', headers={'User-Agent': 'Mozilla/5.0'})
print(req.status_code)
soup = BeautifulSoup(req.text,"lxml")
founders = soup.select('.founders section .section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm .dsr31 startup_roles fsp87 startup_profile_group _a _jm .larger roles role')
print (founders)它抛出了这个错误
Traceback (most recent call last):
File "hello.py", line 11, in <module>
founders = soup.select('.founders section .section with_filler
with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm
.dsr31 startup_roles fsp87 startup_profile_group _a _jm .larger roles
role')
File "C:\Users\nandi\Anaconda3\lib\site-packages\bs4\element.py", line
1477, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "_a"发布于 2019-01-08 15:09:29
这是因为_a是一个类,而不是标记名或<_a>,因此需要在类的值前加上点.或将其与findAll()一起使用
soup.findAll('div', class='section with_filler with_editable_regions dsss17 startups-show-sections ffs70 founders _a _jm')但是您只需要这个简单的选择器
founders = soup.select('ul.larger.roles li')
print (founders)或者从web开发人员工具中的元素面板复制选择器
founders = soup.select('#root > div.page.flush_bottom.dl85.layouts.fhr17.header._a._jm > div > div.content.s-grid.s-grid--outer.u-maxWidthLayout.s-vgBottom2 > div > div.s-flexgrid0.s-flexgrid0--fixed.panes_grid > div.main.pane.s-flexgrid0--footer.s-flexgrid0-colMdW.s-vgPadRight1 > div > div.founders.section > div > div > ul > li')https://stackoverflow.com/questions/54086306
复制相似问题