首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >aiohttp.ClientSession().get在大量迭代之后停止工作

aiohttp.ClientSession().get在大量迭代之后停止工作
EN

Stack Overflow用户
提问于 2022-08-15 16:10:12
回答 1查看 135关注 0票数 1

我写了一个简单的程序,它解析了一个游戏数据库,其中包含了每个项目的单独页面。它迭代一个名称列表,并分析每个项目的页面(总共246页)。但数量越多,起作用的可能性就越小。

例如,如果我只迭代100个条目,代码将在9 seconds...but中编译,如果我再次编译它,它将在1,5秒内完成。有时它会结冰,而我的调试器甚至没有告诉我原因。如果我设置150+名称,它将永远无法工作。看来我陷入了无止境的循环。使用print(123)的魔力,我发现它停在responses = await asyncio.gather(*tasks)上,但我不知道问题出在哪里

那么,原因是什么?是我做错了什么,还是网站忽视了我的"DDOS攻击“?

对于我来说,这是一个非常奇怪的问题,因为我看到程序执行了成千上万的请求。

非常感激。

代码语言:javascript
复制
import requests
import aiohttp
import asyncio
from bs4 import BeautifulSoup

url = 'https://www.thecycledb.com/items' # website url
data = ['Hammer', 'Shattergun', 'Advocate', 'S-576 PDW', 'Kinetic Arbiter', 'Bulldog', 'KOR-47'] # names of every item (short version)

list = []

def get_tasks(session):  # creating a task for every item
    tasks = []
    for i in data:
        tasks.append(asyncio.create_task(session.get(url[:-1]+'/'+i.lower().replace("'",'').replace(' ','-')))) #iterate every URL
    return tasks


async def parse():
    async with aiohttp.ClientSession() as session:
        tasks = get_tasks(session)               # Tasks 

        responses = await asyncio.gather(*tasks) # where the programm stops


        for i in responses:                                    # Parsing process itself, it works okay              
            item_soup = BeautifulSoup(await i.text(),'lxml')  
            try:
                a = item_soup.find('h3',string='Shop Price')
                list.append(int(a.next_element.next_element.text.replace(',','')))
            except(AttributeError):
                list.append(0)
        print(list)


asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) # keeps a RuntimeError away 
asyncio.run(parse())

产出(应如此):

代码语言:javascript
复制
[73000, 54000, 76000, 1200, 412000, 6400, 210000]

如果您想要编译以下内容,就需要完整的数据列表:

代码语言:javascript
复制
data = ['Hammer', 'Shattergun', 'Advocate', 'S-576 PDW', 'Kinetic Arbiter', 'Bulldog', 'KOR-47', 'Rusty K-28', 'Rusty B9 Trenchgun', 'KARMA-1', "KM-9 'Scrapper'", 'C-32 Bolt Action', 'Lacerator', 'Voltaic Brute', 'PKR Maelstrom', 'B9 Trenchgun', 'Basilisk', 'K-28', 'Binocular', 'Rusty S-576 PDW', 'Rusty AR-55 Autorifle', 'Scarab', 'Phasic Lancer', 'Manticore', 'ICA Guarantee', 'Gorgon', 'Heavy Mining Tool', 'KOMRAD', 'KBR Longshot', 'Asp Flechette Gun', 'Zeus Beam', 'Flashlight', 'Mineral Scanner', 'AR-55 Autorifle', 'Audio Decoy', 'Combat Stim', 'Weak Stim', 'Grenade', 'Combat Medkit', 'Strong Stim', 'Strong Medkit', 'Smoke Grenade', 'Gas Grenade', 'Weak Medkit', 'Large Backpack', 'Small Backpack', 'Heavy Duty Backpack', 'Worn Emergency Bag', 'Medium Backpack', 'Epic Helmet', 'Common Tactical Helmet', 'Uncommon Tactical Helmet', 'Common Helmet', 'Rare Helmet', 'Rare Tactical Helmet', 'Exotic Helmet', 'Rare Restoration Helmet', 'Legendary NV Helmet', 'Uncommon Helmet', 'Uncommon Restoration Helmet', 
'Marauder Head', 'ICA Scrip', 'NiC Oil Cannister', '"Magic-GROW" Fertilizer', 'Heavy Strider Flesh', 'Savage Marauder Flesh', 'Resin Gun', 'Smart Mesh', 'Heavy Strider Head', 'Crusher Hide', 'Mature Rattler Head', 'Crusher Flesh', 'Pure Focus Crystal', '"Fusion Cartridge" Batteries ', 'Salvaged Insulation', 'Rattler Skin', 'Autoloader', 'Spinal Base', 'Mature Rattler Eyes', 'Portable Lab', 'Shard Slicer', 'Copper Wire', 'Hardened Metals', 'Interactive Screen', 'Meteor Core', 'Electronic Cables', 'Shock Absorber', 'Meteor Fragment', 'Titan Ore', 'Clear Veltecite', 'Crusher Head', 'Miniature Reactor', 'Hydraulic Piston', 'Blue Runner Egg', 'Circuit Board', 'Marauder Flesh', 'Strider Head', 'Korolev Scrip', 'Nutritional Bar', 'Dustbloom', 'Letium Clot', 'Radio Equipment', 'Polymetallic Prefabricate', 'Biological Sampler', 'Old Medicine', 'Print Resin', 'Zero Systems CPU', 'Master Unit CPU', 'Ball Bearings', 'Altered Nickel', 'Veltecite Heart', 'Toxic Glands', 'Brittle Titan Ore', 'Nickel', 'Pure Veltecite', 'Aluminum scrap', 'Alpha Crusher Skull', 'Sample Container', 'Medical Supplies', 'Charged Spinal Base', 'Rattler Head', 'Optic Glass', 'Metallic Alloys', 'Old Currency', 'Brightcap Mushroom', 'Jewellery', 'Osiris Scrip', 'Pale Ivy Blossom', 'Waterweed Filament', 'Focus Crystal', 'Textiles', 'Indigenous Fruit', 'Rattler Eyes', 'Alpha Crusher Heart', 'Flawed Veltecite', 'Hardened Bone Plates', 'Strider Flesh', 'Glowy Brightcap Mushroom', 'Co-TEC MultiTool', 'Azure Tree Bark', 'Compound Sheets', 'Gyroscope', 'Savage Marauder Head', 'Magnetic Field Stabilizer', 'Derelict Explosives', 'Cloudy Veltecite', 'Ultralight Stock', 'Tactical Stock', 'Light Converter', '4x Optic', 'MKM Ultralight Stock', 'Small Suppressor', 'Medium Creature Dmg', 'Titan Ore Scanning Module', 'Standard Stock', 'Ergonomic Grip', 'Shotgun Slugs', 'Shotgun Quickdraw', 'Tactical Foregrip', 'Quickdraw Stock', 'Heavy Quickdraw', '8x Optic', 'Medium Muzzle Brake', 'Crude Oil Scanning Module', 'Focus Crystal Scanning Module', 'MKM Tactical Stock', 'Holographic Sight', 'Quickdraw Foregrip', 'Light Extended Quickdraw', 'Shotgun Extended', '2x Optic', 'Veltecite Scanning Module', 'Small Muzzle Brake', 'Medium Quickdraw', 'Angled Foregrip', 'Marksman Stock', 'Tactical Rear Grip', 'Medium Suppressor', 'Shotgun Converter', 'Medium Converter', 'Light Quickdraw', 'Medium Extended Quickdraw', 'Light Creature Dmg', 'Heavy Converter', 'Red Dot Sight', '6x Optic', 'MKM Quickdraw Stock', 'Medium Extended', 'Heavy Extended', 'Light Extended', 'Tactical Light', '2 - 4x Variable Optic', 'Common Shield', 'Uncommon Shield', 'Rare Tactical Shield', 'Rare Shield', 'Common Tactical Shield', 'Epic Shield', 'Uncommon Tactical Shield', 'Uncommon Restoration Shield', 'Rare Restoration Shield', 'Exotic Shield', 'Janitors Key', 'Armory Key', 'Loose House Key', 'Lab Keycard', 'Bar Storage Key', 'Bright Sands Observation Room Key', 'Community Room', 'Overseers Office', 'Tall House Key', 'Mine Access Key', "Boss' Office", 'Garage Office', 'Server Access Key', 'Luggage Saferoom Key', 'Skeleton Key', 'Letium Bio Samples', 'Uncommon Data Drive', 'Notes on Meteor Experiment - 1', 'Letium Coated Helmet', 'Miner Cam #D027', 'Old Bones', 'Data Drive', 'Notes on Meteor Experiment - 2', 'Oil Pump Part', 'Warden Skull', 'Miner Cam #2F53', 'Orbital Cannon Beacon', 'Rare Data Drive', 'Unique Data Drive', 'Miner Cam #A45D', 'Valuable Data Drive', '"Dig Site" Data Drive', 'Flight Recorder', 'Laser Drill Control Unit', 'Oil Pump Beacon', 'Laser Drill Beacon', 'Alpha Crusher Bait', 'Old Notebook', 'Sign of Life from Stranded Prospector', 'Medium ammo', 'Special ammo', 'Light ammo', 'Shotgun ammo', 'Heavy ammo']
#
EN

回答 1

Stack Overflow用户

发布于 2022-08-15 21:19:50

尝试限制异步并发(例如与asyncio.Semaphore ),以避免服务器DDOS:

代码语言:javascript
复制
import aiohttp
import asyncio
from bs4 import BeautifulSoup

url = "https://www.thecycledb.com/items"
item_url = "https://www.thecycledb.com/item/{}"

data = [
    "Hammer",
    "Shattergun",
    "Advocate",
    "S-576 PDW",
    "Kinetic Arbiter",
    "Bulldog",
    "KOR-47",
]  # names of every item (short version)

lst = []


async def download(session, sem, u):
    async with sem:
        rv = await session.get(u)
        print(f"Downloading {u} done")
        return rv


def get_tasks(session, sem):
    tasks = []
    for i in data:
        item_name = i.lower().replace("'", "").replace(" ", "-")
        u = item_url.format(item_name)
        tasks.append(download(session, sem, u))
    return tasks


async def parse():
    sem = asyncio.Semaphore(2)   # <-- limit to max 2 parallel downloads

    async with aiohttp.ClientSession() as session:
        tasks = get_tasks(session, sem)

        responses = await asyncio.gather(*tasks)

        for i in responses:  # Parsing process itself, it works okay
            item_soup = BeautifulSoup(await i.text(), "lxml")
            try:
                a = item_soup.find("h3", string="Shop Price")
                lst.append(
                    int(a.next_element.next_element.text.replace(",", ""))
                )
            except (AttributeError):
                lst.append(0)
        print(lst)


asyncio.run(parse())

指纹:

代码语言:javascript
复制
Downloading https://www.thecycledb.com/item/shattergun done
Downloading https://www.thecycledb.com/item/hammer done
Downloading https://www.thecycledb.com/item/advocate done
Downloading https://www.thecycledb.com/item/s-576-pdw done
Downloading https://www.thecycledb.com/item/kinetic-arbiter done
Downloading https://www.thecycledb.com/item/bulldog done
Downloading https://www.thecycledb.com/item/kor-47 done
[73000, 54000, 76000, 1200, 412000, 6400, 210000]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73363624

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档