我在多个脚本上不断收到这个错误,我做了大量的抓取,我有一个遍历数百页的循环,在某些时候,脚本会因为这个错误而停止。下面是一个脚本示例
示例2:
def scrape(urls):
for url in urls:
session = HTMLSession()
resp = session.get(url)
resp.html.render()
try:
phone = resp.html.find('span.phone')[0].text
except IndexError:
phone = None
biz_name = resp.html.find('h1')[0].text
try:
biz_desc = resp.html.find('p.biz-description-text')[0].text
except IndexError:
biz_desc = None
biz_location = resp.html.find('span.title-address-text')[0].text
city = biz_location.split(',')[-1]
print(
f'phone is: {phone}\nthe business name is: {biz_name}\nthe description is: {biz_desc}\nthe city is: {city}')
import_data(biz_name, phone, biz_desc, city)
def import_data(name, phone, desc, city):
global keyword
wp_title_box = driver.find_element_by_xpath('//*[@id="title"]')
wp_title_box.send_keys(name)
time.sleep(1)
wp_desc_box = driver.find_element_by_xpath('//*[@id="content_ifr"]')
wp_desc_box.send_keys(desc)
time.sleep(1)
new_field_button = driver.find_element_by_xpath('//*[@id="newmeta-submit"]')
select_box = Select(driver.find_element_by_xpath('//*[@id="metakeyselect"]'))
select_box.select_by_value("ad_city")
wp_city_fill = driver.find_element_by_xpath('//*[@id="metavalue"]')
wp_city_fill.send_keys(city)
new_field_button.click()
time.sleep(2)
select_box.select_by_value("ad_phone")
wp_city_fill = driver.find_element_by_xpath('//*[@id="metavalue"]')
wp_city_fill.send_keys(phone)
new_field_button.click()
time.sleep(2)
select_box.select_by_value("ad_promote")
wp_city_fill = driver.find_element_by_xpath('//*[@id="metavalue"]')
wp_city_fill.send_keys('1')
new_field_button.click()
time.sleep(2)
save_btn = driver.find_element_by_xpath('//*[@id="save-post"]')
driver.execute_script("window.scrollTo(0,0);")
time.sleep(1)
save_btn.click()
time.sleep(2)
driver.find_element_by_xpath('//*[@id="menu-posts"]/ul/li[3]/a').click()
time.sleep(2)我添加了示例2,因为示例1是由下面提供的循环解决的。在第二个例子中,脚本应该结束,因为我使用了一个for循环,一旦它完成了所有的urls并导入它们,它就应该结束了,我是不是遗漏了什么?
发布于 2020-01-28 03:59:07
你的程序永远不会终止。如果你要使用递归,你需要有一个终止或基本情况。
一种建议是使用计数器来跟踪递归的深度,然后在每一步递增计数器,直到它达到指定的深度。
我确实认为,对于您正在做的事情,您根本不需要递归,因为由于函数调用的开销,递归是昂贵的。一个简单的循环就可以了:
import random
import urllib3
from requests_html import HTMLSession
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def scrape(rand_num):
session = HTMLSession()
resp = session.get("https://www.example.com/prize/?d=" + '92' + str(rand_num))
resp.html.render()
print(f'trying coupon code 92{rand_num}')
prize = resp.html.find(containing="You've won a prize")
print(prize)
if prize:
print("https://www.example.com/prize/?d=" + '92' + str(rand_num))
def number():
for i in range(99999999):
x = random.randint(00000000, 99999999)
scrape(x)
number()https://stackoverflow.com/questions/59937892
复制相似问题