首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >web抓取错误消息:'int‘对象没有属性'get’

web抓取错误消息:'int‘对象没有属性'get’
EN

Stack Overflow用户
提问于 2020-05-14 18:36:49
回答 1查看 495关注 0票数 0

你好,堆栈溢出贡献者!

我想刮一个新闻网站的多个页面,它在这个步骤中显示一个错误消息。

代码语言:javascript
复制
 response = requests.get(page, headers = user_agent)

错误信息是

代码语言:javascript
复制
AttributeError: 'int' object has no attribute 'get'

代码行是

代码语言:javascript
复制
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

#controlling the crawl-rate
start_time = time() 
request = 0

def scrape(url):
    urls = [url + str(x) for x in range(0,10)]
    for page in urls:
        response = requests.get(page, headers = user_agent)   
    print(page)
代码语言:javascript
复制
print(scrape('https://nypost.com/search/China+COVID-19/page/'))

更具体地说,这个页面和它旁边的页面是我想要刮的:

https://nypost.com/search/China+COVID-19/page/1/?orderby=relevance

任何帮助都将不胜感激!!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-14 18:50:36

对我来说,这个代码运行正常。我必须把request放在你的功能里。确保不要将模块requests与变量request混在一起。

代码语言:javascript
复制
from random import randint
from time import sleep, time
from bs4 import BeautifulSoup as bs


user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

# controlling the crawl-rate
start_time = time() 

def scrape(url):
    request = 0
    urls = [f"{url}{x}" for x in range(0,10)]
    params = {
       "orderby": "relevance",
    }
    for page in urls:
        response = requests.get(url=page,
                                headers=user_agent,
                                params=params)   

        #pause the loop
        sleep(randint(8,15))

        #monitor the requests
        request += 1
        elapsed_time = time() - start_time
        print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))
#         clear_output(wait = True)

        #throw a warning for non-200 status codes
        if response.status_code != 200:
            warn('Request: {}; Status code: {}'.format(request, response.status_code))

        #Break the loop if the number of requests is greater than expected
        if request > 72:
            warn('Number of request was greater than expected.')
            break

        #parse the content
        soup_page = bs(response.text, 'lxml') 
        
print(scrape('https://nypost.com/search/China+COVID-19/page/'))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61804908

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档