我写了一个脚本,它解析了craigslist上不同商品的名称和价格。通常,当一个脚本找到名称或价格为None时,它会抛出错误。我已经修复了它,现在它成功地获取了结果。希望我做得完美无缺。
import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
rows = tree.xpath('//li[@class="result-row"]')
for row in rows:
link = row.xpath('.//a[contains(@class,"hdrlnk")]/text()')[0] if len(row.xpath('.//a[contains(@class,"hdrlnk")]/text()'))>0 else ""
price = row.xpath('.//span[@class="result-price"]/text()')[0] if len(row.xpath('.//span[@class="result-price"]/text()'))>0 else ""
print (link,price)发布于 2017-05-26 10:28:27
请求宽恕通常比请求许可容易。您可以用try..except块包围语句:
import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
for row in tree.xpath('//li[@class="result-row"]'):
try:
link = row.xpath('.//a[contains(@class,"hdrlnk")]/text()')[0]
except IndexError:
link = ""
try:
price = row.xpath('.//span[@class="result-price"]/text()')[0]
except IndexError:
price = ""
print (link, price)如果您有许多这样的操作,您可以将其放入一个函数中:
def get_if_exists(row, path, index=0, default=""):
"""
Gets the object at `index` from the xpath `path` from `row`.
Returns the `default` if it does not exist.
"""
try:
return row.xpath(path)[index]
except IndexError:
return default在这里你可以这样用:
for row in tree.xpath('//li[@class="result-row"]'):
# Using the defined default values for index and default:
link = get_if_exists(row, './/a[contains(@class,"hdrlnk")]/text()')
# Manually setting them instead:
price = get_if_exists(row, './/span[@class="result-price"]/text()', 0, "")
print (link, price)发布于 2017-07-10 20:11:33
我最近学习了findtext方法,它可以很容易地从xpath表达式中解析文本内容,而不需要经过复杂的过程。这个findtext方法最吸引人的特性是,当不存在预期元素时,它总是以None (默认情况)的形式给出结果。此外,它使代码简洁和干净。如果有人无意中发现了上述问题,他可能还想尝试一下。
import requests
from lxml import html
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = html.fromstring(page)
for row in tree.xpath('//li[@class="result-row"]'):
link = row.findtext('.//a[@data-id]')
price = row.findtext('.//span[@class="result-price"]')
print (link, price)https://codereview.stackexchange.com/questions/164236
复制相似问题