首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >迭代页面时只返回第一个页面的结果

迭代页面时只返回第一个页面的结果
EN

Stack Overflow用户
提问于 2020-05-26 06:24:32
回答 1查看 36关注 0票数 0

我正在从这个页面抓取新闻文章的链接:https://time.com/search/?q=China%20COVID-19&page=1我写了代码来获取页面1和页面2的链接,但它只返回页面1中的文章。我不知道如何解决这个问题,让它成功地从多个页面返回结果。

代码语言:javascript
复制
def scrape(url):
    user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}
    request = 0
    params = {
        'q': 'China%20COVID-19',
    }
    pagelinks = []
    
    myarticle = []
    for page_no in range(1,3):
        params['page'] = page_no
        response = requests.get(url=url,
                                headers=user_agent,
                                params=params) 
      
                # controlling the crawl-rate
        start_time = time() 
                #pause the loop
        sleep(randint(8,15))
                #monitor the requests
        request += 1
        elapsed_time = time() - start_time
        print('Request:{}; Frequency: {} request/s'.format(request, request/elapsed_time))
        clear_output(wait = True)

            #parse the content
        soup_page = bs(response.text, 'lxml') 
                #select all the articles for a single page
        containers = soup_page.findAll("article", {'class': 'partial tile media image-top margin-16-right search-result'})

            
  
            scrape the links of the articles
        for i in containers:
            url = i.find('a')['href']
            pagelinks.append(url)
        print(pagelinks)
代码语言:javascript
复制
scrape('https://time.com/search/')

如有任何建议,我们将不胜感激!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-27 05:25:22

将代码中添加到pagelinks的部分更改为(不要覆盖稍后在请求中使用的url变量):

代码语言:javascript
复制
#scrape the links of the articles
for i in containers:
    pagelinks.append(i.find('a')['href'])

在此之后,脚本将打印:

代码语言:javascript
复制
Request:1; Frequency: 838860.8 request/s
Request:2; Frequency: 1398101.3333333333 request/s
['https://time.com/5841895/global-coronavirus-battle/', 'https://time.com/5842256/world-health-organization-china-coronavirus-outbreak/', 'https://time.com/5826025/taiwan-who-trump-coronavirus-covid19/', 'https://time.com/5836611/china-superpower-reopening-coronavirus/', 'https://time.com/5783401/covid19-hubei-cases-classification/', 'https://time.com/5782633/covid-19-drug-remdesivir-china/', 'https://time.com/5778994/coronavirus-china-country-future/', 'https://time.com/5830420/trump-china-rivalry-coronavirus-intelligence/', 'https://time.com/5810493/coronavirus-china-united-states-governments/', 'https://time.com/5813628/china-coronavirus-statistics-wuhan/', 'https://time.com/5793363/china-coronavirus-covid19-abandoned-pets-wuhan/', 'https://time.com/5779678/li-wenliang-coronavirus-china-doctor-death/', 'https://time.com/5820389/africans-guangzhou-china-coronavirus-discrimination/', 'https://time.com/5824599/china-coronavirus-covid19-economy/', 'https://time.com/5784286/covid-19-china-plasma-treatment/', 'https://time.com/5796425/china-coronavirus-lockdown/', 'https://time.com/5825362/china-coronavirus-lawsuit-missouri/', 'https://time.com/5811222/wuhan-coronavirus-death-toll/']
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62011704

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档