文章/答案/技术大牛

发布

社区首页 >问答首页 >加速Pandas通过请求申请API调用

问加速Pandas通过请求申请API调用
EN

Stack Overflow用户

提问于 2018-01-18 16:24:23

回答 1查看 1K关注 0票数 3

我在df中有一个IP地址列表。这些IP地址在GET请求中使用requests发送到ARIN数据库，我感兴趣的是获取该IP地址的组织或客户。我正在使用requests Session()在requests-futures FuturesSession()中，希望能加快API调用的速度。以下是代码：

s = requests.Session()
session = FuturesSession(session=s, max_workers=10)

def getIPAddressOrganization(IP_Address):
    url = 'https://whois.arin.net/rest/ip/' + IP_Address + '.json'
    request = session.get(url)
    response = request.result().json()
    try:
        organization = response['net']['orgRef']['@name']
    except KeyError:
        organization = response['net']['customerRef']['@name']
    return organization

df['organization'] = df['IP'].apply(getIPAddressOrganization)

添加常规的requests Session()很大程度上提高了性能，但是requests-futures FuturesSession()并没有起作用(可能是因为我缺乏知识)。

如何将pandas apply()与requests-futures结合使用，以及/或是否还有其他更有效的方法来加速API调用？

pandas

python-requests

python

multithreading

python-3.x

回答 1

Stack Overflow用户

发布于 2021-12-15 20:13:27

这并不直接回答这个问题，但它表明熊猫的apply()函数确实等待每个API调用的结果，并且不对IO时间进行并行化或优化：

import time
import pandas as pd


df = pd.DataFrame(data=range(10))
start = time.perf_counter()
df.apply(lambda r: time.sleep(5), axis=1)
end = time.perf_counter() - start

print(f'total time: {end}')

总时间: 50.05315346799034

结论--也许最好考虑采用异步IO方法

暂定方向：

async def parallel_rest_calls(data: List):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for ip in data:
            tasks.append(getIPAddressOrganization(session=session, ip)

        enriched_data_col = await asyncio.gather(*tasks, return_exceptions=True)
        return enriched_data_col


async def getIPAddressOrganization(session: aiohttp.ClientSession, IP_Address):
    url = 'https://whois.arin.net/rest/ip/' + IP_Address + '.json'
    async with session.get(url, headers=headers, params=params) as response:
        json = await response.json()
        status = response.status
        
        try:
            organization = json['net']['orgRef']['@name']
        except KeyError:
            organization = json['net']['customerRef']['@name']
        return (IP_Address, organization)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48325908

复制

相似问题

问加速Pandas通过请求申请API调用
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速Pandas通过请求申请API调用EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问加速Pandas通过请求申请API调用
EN