我需要更新网址的查询部分(page_index=)。我已经尝试了下面所示的几种方法,但遇到了困难。我是python的新手,正在寻找指导。页面索引的范围从0- 511 (添加新的每日),我需要更新的网址循环通过所有的索引。索引将始终从0开始。
import urlparse
url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?
start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
parts = urlparse.urlparse(url)
parts = parts._replace(query = page_index [2])
parts.geturl()我得到了错误:
TypeError Traceback (most recent call last)
<ipython-input-29-066332f37bb3> in <module>()
3 url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
4 parts = urlparse.urlparse(url)
----> 5 parts = parts._replace(query = page_index [2])
6 parts.geturl()
7
TypeError: 'function' object has no attribute '__getitem__'发布于 2017-07-10 04:08:13
您必须取出urlparse()结果的query组件并修改它,然后重新构造一个新的URL,如下所示:
pr = urlparse.urlparse(url)
parts = pr.query.split('&')
parts[2] = 'page_index=2'
new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), pr.fragment])要遍历所有页码,可以遍历最后两行,以获得所需的页码范围。
发布于 2017-07-10 03:57:22
最简单的方法就是直接修改url:
base_url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index={}&countries=US"
for pi in range(512):
this_url = base_url.format(pi)
# now get it一种稍微复杂,但更容易定制的方式--将参数作为字典进行传递:
import requests
url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews"
params = {
"start_date": "2016-1-01",
"end_date" : "2017-8-26"
"countries" : "US"
}
for pi in range(512):
params["page_index"] = pi
res = requests.get(url, params)
if res.ok:
html = res.texthttps://stackoverflow.com/questions/45000379
复制相似问题