文章/答案/技术大牛

发布

社区首页 >问答首页 >美丽的汤KeyError:返回self.attrs[key]，但是在错误发生之前(和之后)保留值

问美丽的汤KeyError:返回self.attrs[key]，但是在错误发生之前(和之后)保留值
EN

Stack Overflow用户

提问于 2019-07-15 13:30:48

回答 1查看 482关注 0票数 0

我有一个脚本，它正在刮网络。但它会引发一些错误：

返回self.attrskey KeyError：“数据索引”

也许，这是因为‘数据索引’是not existing。但是，我想收集所有可用的数据，并且

如果没有数据，只需放置一个'NONE‘(我只能写一个循环，如果没有数据就继续，但是我无法存储可用的数据)；
然后，将文件保存到一个数据框架中(由于上面的错误，这现在是不可能的)。

期望输出：

熊猫数据框架(这是虚拟数据)：

标题价格赞助的url asin index_asin $12，Price 1，B $14，No，Y ABCD 4，B $14，Price 1

import requests
from bs4 import BeautifulSoup
#from textwrap import shorten
import pandas as pd 

urls = ['https://www.amazon.com/s?k=shaver+for+men&i=beauty&ref=nb_sb_noss_2',
       "https://www.amazon.com/s?k=electric+shaver&ref=nb_sb_noss_2"]

headers={'User-Agent':'Mozilla/5.0'}

#df = pd.DataFrame(columns =['Title', 'Price', 'Sponsored', 'asin', 'index_asin'])
df = []

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(requests.get(url, headers=headers).text, 'lxml') #lxml

    for div in soup.select('div[data-asin]'):
        title, price = div.select_one('span.a-text-normal').text, div.select_one('.a-offscreen').text if div.select_one('.a-offscreen') else '-'
        sponsored = 'Yes' if div.select_one('span:contains("Sponsored")') else 'No'
        url = response.url
        asin = div['data-asin']
        index_asin = div['data-index']  

        print('title',title)
        print('price',price)
        print('sponsored',sponsored)
        print('url',url)
        print('asin',asin)
        print('index_asin',index_asin)

        # I want to store everything in a data frame 
        #df.append(title, price, sponsored, url, asin, index_asin)

pandas

beautifulsoup

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-07-15 13:52:16

如果索引不存在，则使用try..except块，它将转到not块。

import requests
from bs4 import BeautifulSoup
#from textwrap import shorten
import pandas as pd

urls = ['https://www.amazon.com/s?k=shaver+for+men&i=beauty&ref=nb_sb_noss_2',
       "https://www.amazon.com/s?k=electric+shaver&ref=nb_sb_noss_2"]

headers={'User-Agent':'Mozilla/5.0'}

#df = pd.DataFrame(columns =['Title', 'Price', 'Sponsored', 'asin', 'index_asin'])
df = []

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(requests.get(url, headers=headers).text, 'lxml') #lxml

    for div in soup.select('div[data-asin]'):
        title, price = div.select_one('span.a-text-normal').text, div.select_one('.a-offscreen').text if div.select_one('.a-offscreen') else '-'
        sponsored = 'Yes' if div.select_one('span:contains("Sponsored")') else 'No'
        url = response.url
        asin = div['data-asin']
        try:
          index_asin = div['data-index']
        except:
          index_asin='NAN'

        print('title',title)
        print('price',price)
        print('sponsored',sponsored)
        print('url',url)
        print('asin',asin)
        print('index_asin',index_asin)

        # I want to store everything in a data frame
        df.append({title, price, sponsored, url, asin, index_asin})


print(df)

编辑了 df。

df=df.append({'Title':title,'Price':price,'Sponsored':sponsored,'url':url,'asin':asin,'index_asin':index_asin},ignore_index=True)

import requests
from bs4 import BeautifulSoup
import pandas as pd

urls = ['https://www.amazon.com/s?k=shaver+for+men&i=beauty&ref=nb_sb_noss_2',
       "https://www.amazon.com/s?k=electric+shaver&ref=nb_sb_noss_2"]

headers={'User-Agent':'Mozilla/5.0'}

df = pd.DataFrame(columns =['Title', 'Price', 'Sponsored','url', 'asin', 'index_asin'])

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(requests.get(url, headers=headers).text, 'lxml') #lxml

    for div in soup.select('div[data-asin]'):
        title, price = div.select_one('span.a-text-normal').text, div.select_one('.a-offscreen').text if div.select_one('.a-offscreen') else '-'
        sponsored = 'Yes' if div.select_one('span:contains("Sponsored")') else 'No'
        url = response.url
        asin = div['data-asin']
        try:
          index_asin = div['data-index']
        except:
          index_asin='NAN'

        print('title',title)
        print('price',price)
        print('sponsored',sponsored)
        print('url',url)
        print('asin',asin)
        print('index_asin',index_asin)

        # I want to store everything in a data frame
        df=df.append({'Title':title,'Price':price,'Sponsored':sponsored,'url':url,'asin':asin,'index_asin':index_asin},ignore_index=True)



print(df)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57040852

复制

相似问题

问美丽的汤KeyError:返回self.attrs[key]，但是在错误发生之前(和之后)保留值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽的汤KeyError:返回self.attrs[key]，但是在错误发生之前(和之后)保留值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽的汤KeyError:返回self.attrs[key]，但是在错误发生之前(和之后)保留值
EN