我正试着为一个类项目建立一个网络刮刀。我用的是美汤。
我想刮掉以下值的值:
data-bathroom-value和
data-bedroom-value以下元素中的参数:
<td class="floorplan-bed-bath" data-bathroom-value="1" data-bedroom-value="0">Studio / 1 bath</td>基本上是想得到卧室数和卧室数的值。
发布于 2017-05-07 03:17:38
您可以使用BeautifulSoup解析您的html,然后获取标记的属性。
演示
>>> html_doc = '<td class="floorplan-bed-bath" data-bathroom-value="1" data-b edroom-value="0">Studio / 1 bath</td>'
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html_doc, 'html.parser')
>>> attrs = soup.td.attrs
{u'data-bathroom-value': u'1', u'data-bedroom-value': u'0', u'class': [u'floorplan-bed-bath']}
>>> attrs.get('data-bedroom-value')
u'0'发布于 2017-05-07 03:18:03
from bs4 import BeautifulSoup
import urllib2
page = urllib2.urlopen("http://example.com/path/to/page")
soup = BeautifulSoup(page.read())
for td in soup.find_all("td"):
if "data-bathroom-value" in td.attrs:
print("Bathrooms: ", td["data-bathroom-value"])
if "data-bathroom-value" in td.attrs:
print("Bedrooms: ", td["data-bedroom-value"])https://stackoverflow.com/questions/43827392
复制相似问题