我试图构建web 铲运机项目,我想要做的就是使用urlib3和请求以及漂亮的汤来实现智能重试机制。
当im设置timeout=1以失败重试和检查重试其中断时,下面的异常代码如下:
import requests
import re
from bs4 import BeautifulSoup
import json
import time
import sys
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
# this get_items methods is for getting dict of link to scrape items per link
def get_items(self, dict):
itemdict = {}
for k, v in dict.items():
boolean = True
# here, we fetch the content from the url, using the requests library
while (boolean):
try:
a =requests.Session()
retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[301,500, 502, 503, 504])
a.mount(('https://'), HTTPAdapter(max_retries=retries))
page_response = a.get('https://www.XXXXXXX.il' + v, timeout=1)
except requests.exceptions.Timeout:
print ("Timeout occurred")
logging.basicConfig(level=logging.DEBUG)
else:
boolean = False
# we use the html parser to parse the url content and store it in a variable.
page_content = BeautifulSoup(page_response.content, "html.parser")
for i in page_content.find_all('div', attrs={'class':'prodPrice'}):
parent = i.parent.parent.contents[0]
getparentfunc= parent.find("a", attrs={"href": "javascript:void(0)"})
itemid = re.search(".*'(\d+)'.*", getparentfunc.attrs['onclick']).groups()[0]
itemName = re.sub(r'\W+', ' ', i.parent.contents[0].text)
priceitem = re.sub(r'[\D.]+ ', ' ', i.text)
itemdict[itemid] = [itemName, priceitem]感谢您对效率、重试机制、解决方法或任何其他简单方法的感谢。
发布于 2019-02-26 00:26:07
我通常做这样的事情:
def get(url, retries=3):
try:
r = requests.get(url)
return r
except ValueError as err:
print(err)
if retries < 1:
raise ValueError('No more retries!')
return get(href, retries - 1)https://stackoverflow.com/questions/54871840
复制相似问题