文章/答案/技术大牛

发布

社区首页 >问答首页 >web抓取错误消息：'int‘对象没有属性'get’

问web抓取错误消息：'int‘对象没有属性'get’
EN

Stack Overflow用户

提问于 2020-05-14 18:36:49

回答 1查看 495关注 0票数 0

你好，堆栈溢出贡献者！

我想刮一个新闻网站的多个页面，它在这个步骤中显示一个错误消息。

 response = requests.get(page, headers = user_agent)

错误信息是

AttributeError: 'int' object has no attribute 'get'

代码行是

user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

#controlling the crawl-rate
start_time = time() 
request = 0

def scrape(url):
    urls = [url + str(x) for x in range(0,10)]
    for page in urls:
        response = requests.get(page, headers = user_agent)   
    print(page)

print(scrape('https://nypost.com/search/China+COVID-19/page/'))

更具体地说，这个页面和它旁边的页面是我想要刮的：

https://nypost.com/search/China+COVID-19/page/1/?orderby=relevance

任何帮助都将不胜感激！！

for-loop

web-scraping

beautifulsoup

fetch

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-14 18:50:36

对我来说，这个代码运行正常。我必须把request放在你的功能里。确保不要将模块requests与变量request混在一起。

from random import randint
from time import sleep, time
from bs4 import BeautifulSoup as bs


user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}

# controlling the crawl-rate
start_time = time() 

def scrape(url):
    request = 0
    urls = [f"{url}{x}" for x in range(0,10)]
    params = {
       "orderby": "relevance",
    }
    for page in urls:
        response = requests.get(url=page,
                                headers=user_agent,
                                params=params)   

        #pause the loop
        sleep(randint(8,15))

        #monitor the requests
        request += 1
        elapsed_time = time() - start_time
        print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))
#         clear_output(wait = True)

        #throw a warning for non-200 status codes
        if response.status_code != 200:
            warn('Request: {}; Status code: {}'.format(request, response.status_code))

        #Break the loop if the number of requests is greater than expected
        if request > 72:
            warn('Number of request was greater than expected.')
            break

        #parse the content
        soup_page = bs(response.text, 'lxml') 
        
print(scrape('https://nypost.com/search/China+COVID-19/page/'))

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61804908

复制

相似问题

问web抓取错误消息：'int‘对象没有属性'get’
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问web抓取错误消息：'int‘对象没有属性'get’EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问web抓取错误消息：'int‘对象没有属性'get’
EN