(环境:Python2.7+ BeautifulSoup 4.3.2)
目的:取代码中的文字“2009年1月23日下午12:05”。
由于公司网站中的网页需要登录和重定向,所以我将目标页面的源代码复制到一个文件中,并将其保存为C:\中的“example.html”,以便于练习。
这是原始代码的一部分:
<tr class="ghj">
<td>
<span class="city-sh">
<sh src="./citys/1.jpg" alt="boy" title="boy" />
</span>
<a href="./membercity.php?mode=view&u=12563">port_new_cape</a>
</td>
<td class="position">
<a href="./search.php?id=12563&sr=positions"
title="Search positions">452</a>
</td>
<td class="details">
<div>South</div>
</td>
<td>May 09, 1997</td>
<td>Jan 23, 2009 12:05 pm </td>
</tr>到目前为止我想出的密码是:
url = r"C:\example.html"
page = open(url)
soup = BeautifulSoup(page.read())
cities = soup.find_all('td', {'class' : details})
sis = cities.find_next_siblings('td')
for s in sis:
print s我不知道怎么直接捡起来让兄弟姐妹走人。然而,当我运行它时,它给出了如下错误消息,似乎它无法识别兄弟姐妹。
Traceback (most recent call last):
File "C:/Python27/Last Activity mydyingbride.py", line 17, in <module>
sis = cities.find_next_siblings('td')
AttributeError: 'ResultSet' object has no attribute 'find_next_siblings'以何种方式,我可以通过使用本地文件来进行练习?
发布于 2014-02-05 09:20:17
我建议您使用Python调试器查看变量的当前值。无论如何,这是一个解决办法:
soup = BeautifulSoup(page.read())
cities = soup.find_all('td', {'class' : 'details'})
counter = 0
while len(cities) > counter:
sis = cities[counter].find_next_siblings('td')
for s in sis:
print s
counter += 1产出如下:
<td>May 09, 1997</td>
<td>Jan 23, 2009 12:05 pm┬á</td>回答你的下一个问题。请参阅以下示例:
from bs4 import BeautifulSoup
html_doc = '''
<tr class="ghj">
<td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&u=12563">port_new_cape</a></td>
<td class="position"><a href="./search.php?id=12563&sr=positions" title="Search positions">452</a></td>
<td class="details"><div>South</div></td>
<td>May 09, 1997</td>
<td>Jan 23, 2009 12:05 pm </td>
</tr>
<tr class="ghj">
<td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&u=12563">port_new_cape</a></td>
<td class="position"><a href="./search.php?id=12563&sr=positions" title="Search positions">452</a></td>
<td class="details"><div>South</div></td>
<td>May 09, 1997</td>
<td>Jan 24, 2009 12:05 pm </td>
</tr>
<tr class="ghj">
<td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&u=12563">port_new_cape</a></td>
<td class="position"><a href="./search.php?id=12563&sr=positions" title="Search positions">452</a></td>
<td class="details"><div>South</div></td>
<td>May 09, 1997</td>
<td>Jan 25, 2009 12:05 pm </td>
</tr>
'''
soup = BeautifulSoup(html_doc)
cities = soup.find_all('td', {'class' : 'details'})
counter = 0
while len(cities) > counter:
datesColumn = cities[counter].find_next_siblings('td')
# Assuming you are interested in second column of date
if len(datesColumn) == 2:
print datesColumn[1].string
counter += 1产出如下:
Jan 23, 2009 12:05 pm
Jan 24, 2009 12:05 pm
Jan 25, 2009 12:05 pm https://stackoverflow.com/questions/21572812
复制相似问题