文章/答案/技术大牛

发布

社区首页 >问答首页 >当GET请求gz数据时为什么使用ConnectionError请求

问当GET请求gz数据时为什么使用ConnectionError请求
EN

Stack Overflow用户

提问于 2017-12-28 20:52:46

回答 1查看 684关注 0票数 1

我试图从Appnexus请求批处理日志级别的数据。根据官方数据服务指南，有四个主要步骤：

1. Json中的帐户身份验证->返回令牌

2.获取可用数据提要列表并查找Json中的下载参数->返回参数

3.通过传递下载参数获取请求文件下载位置代码->从头提取位置代码

4.通过传递位置代码->获取下载日志数据文件，返回gz数据文件。

这些步骤在使用curl的终端中非常有效：

curl -b cookies -c cookies -X POST -d @auth 'https://api.appnexus.com/auth'
curl -b cookies -c cookies 'https://api.appnexus.com/siphon?siphon_name=standard_feed'
curl --verbose -b cookies -c cookies 'https://api.appnexus.com/siphon-download?siphon_name=standard_feed&hour=2017_12_28_09&timestamp=20171228111358&member_id=311&split_part=0'
curl -b cookies -c cookies 'http://data-api-gslb.adnxs.net/siphon-download/[location code]' > ./data_download/log_level_feed.gz

在Python中，我尝试使用相同的方法来测试。然而，它一直给我"ConnectionError".在步骤1-2中，它仍然工作得很好，因此我成功地从Json响应中获得了参数，以构建步骤3的url，其中我需要请求位置代码并从响应的头中提取它。

Step1:

# Step 1
############ Authentication ###########################    
# Select End-Point
auth_endpoint = 'https://api.appnexus.com/auth'

# API Key
auth_app = json.dumps({'auth':{'username':'xxxxxxx','password':'xxxxxxx'}})

# Proxy
proxy = {'https':'https://proxy.xxxxxx.net:xxxxx'}
r = requests.post(auth_endpoint, proxies=proxy, data=auth_app)
data = json.loads(r.text)
token = data['response']['token']

Step2:

# Step 2
########### Check report list ###################################
check_list_endpoint = 'https://api.appnexus.com/siphon?siphon_name=standard_feed'
report_list = requests.get(check_list_endpoint, proxies=proxy, headers={"Authorization":token})
data = json.loads(report_list.text)
print(str(len(data['response']['siphons'])) + ' previous hours available for download')

# Build url for single report - extract para
download_endpoint = 'https://api.appnexus.com/siphon-download'
siphon_name = 'siphon_name=standard_feed' 
hour = 'hour=' + data['response']['siphons'][400]['hour']
timestamp = 'timestamp=' + data['response']['siphons'][400]['timestamp'] 
member_id = 'member_id=311' 
split_part = 'split_part=' + data['response']['siphons'][400]['splits'][0]['part']

# Build url
download_endpoint_url = download_endpoint + '?' + \
siphon_name + '&' + \
hour + '&' + \
timestamp + '&' + \
member_id + '&' + \
split_part
# Check
print(download_endpoint_url)

然而，下面的"ConnectionError“步骤3中的"requests.get”没有运行来完成，而是一直给出警告。此外，我发现“位置代码”实际上是在"/siphon-download/".之后的警告信息中。因此，我使用"try..except“从警告消息中提取它并保持代码运行。

Step3:

# Step 3
######### Extract location code for target report ####################
try:
    TT = requests.get(download_endpoint_url, proxies=proxy, headers={"Authorization":token}, timeout=1)
except ConnectionError, e:
    text = e.args[0].args[0]
    m = re.search('/siphon-download/(.+?) ', text)
    if m:
        location = m.group(1)
print('Successfully Extracting location: ' + location)

原始警告消息在Step3中没有"try..except“

ConnectionError: HTTPConnectionPool(host='data-api-gslb.adnxs.net', port=80): Max retries exceeded with url: 
/siphon-download/dbvjhadfaslkdfa346583 
(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000007CBC7B8>: 
Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not 
properly respond after a period of time, or established connection failed because connected host has failed to respond',))

然后，我尝试使用从先前的警告消息中提取的位置代码发出最后一个GET请求，以下载gz数据文件，就像我在终端中使用"curl“一样。但是，我收到了同样的警告消息- ConnectionError.。

Step4:

# Step 4
######## Download data file #######################
extraction_location = 'http://data-api-gslb.adnxs.net/siphon-download/' + location
LLD = requests.get(extraction_location, proxies=proxy, headers={"Authorization":token}, timeout=1)

原始警告消息在Step4中的应用

ConnectionError: HTTPConnectionPool(host='data-api-gslb.adnxs.net', port=80): Max retries exceeded with url: 
/siphon-download/dbvjhadfaslkdfa346583 
(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000007BE15C0>: 
Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not 
properly respond after a period of time, or established connection failed because connected host has failed to respond',))

为了进行双重检查，我使用curl测试了Python脚本中在终端中生成的所有端点、参数和位置代码。它们都正常工作，下载的数据是正确的。有人能帮我用Python来解决这个问题吗?或者指出正确的方向来发现为什么会发生这种情况？非常感谢！

python

api

curl

python-requests

urllib2

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-01-05 23:58:38

1)在卷曲中，您正在读和写cookie (-b cookie、-c cookie)。对于请求，您不使用会话对象http://docs.python-requests.org/en/master/user/advanced/#session-objects，因此cookie数据丢失。

2)定义了https代理，然后尝试通过http连接，不使用代理(到data gslb.adnxs.net)。同时定义http和https，但只在会话对象上定义一次。见http://docs.python-requests.org/en/master/user/advanced/#proxies。(这可能是您看到的错误消息的根本原因。)

3)请求自动处理重定向--不需要提取位置标头并在下一个请求中使用，它将被自动重定向。因此，当其他错误被修复时，有3个步骤而不是4个步骤。(这也回答了Hetzroni在上述评论中提出的问题。)

所以用吧

s = requests.Session() 
s.proxies = {
               'http':'http://proxy.xxxxxx.net:xxxxx',
               'https':'https://proxy.xxxxxx.net:xxxxx'
             } # set this only once using valid proxy urls.

然后使用

s.get()

和

s.post()

而不是

requests.get()

和

requests.post()

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48014240

复制

相似问题

问当GET请求gz数据时为什么使用ConnectionError请求
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当GET请求gz数据时为什么使用ConnectionError请求EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当GET请求gz数据时为什么使用ConnectionError请求
EN