首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >网站上关于汽车价格的网刮美女汤

网站上关于汽车价格的网刮美女汤
EN

Stack Overflow用户
提问于 2022-07-19 14:00:01
回答 1查看 59关注 0票数 -1

我想在网上刮这个网页,并在谷歌单张中导出。理想情况下,这是用FOR循环编码的,以获得所有的trims (也适用于其他的car/url输入)。如何在我的代码中得到修剪和价格?理想情况下,每修剪一行

因为我刚刚开始刮擦和编码会非常感谢您的输入!

网站的期望输出和输入

输出:[<td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3" title="2022 Tesla Model 3 Specs">Model 3</a></td>, <td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3-long-range" title="2022 Tesla Model 3 Long Range Specs">Long Range</a></td>, <td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3-performance" title="2022 Tesla Model 3 Performance Specs">Performance</a></td>]

代码:

代码语言:javascript
复制
from googleapiclient.discovery import build
from google.oauth2 import service_account
from bs4 import BeautifulSoup
import requests
import time

#Google Sheets verification
SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive']
creds = None
creds = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)
SAMPLE_SPREADSHEET_ID = 'PLACEHOLDER'
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()

#Google sheets header row
sheet_header = [['Trim', 'MSRP']]
request = sheet.values().update(spreadsheetId=SAMPLE_SPREADSHEET_ID, range="caranddriver!A1", valueInputOption="USER_ENTERED", body={"values":sheet_header}).execute()

#Inputs/URLs to scrape:
URL2 = ('https://carbuzz.com/cars/tesla/model-3')
(response := requests.get(URL2)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()


trim  = overview.find_all(class_='car-sub-model-trim-levels-table__name cb-table__sticky-left')
print(trim)


msrp = ''


data_sheets = [[trim, msrp]]
request = sheet.values().append(spreadsheetId=SAMPLE_SPREADSHEET_ID, range="caranddriver!A2", valueInputOption="USER_ENTERED", body={"values":data_sheets}).execute()````
EN

回答 1

Stack Overflow用户

发布于 2022-07-19 14:51:35

我建议你用潘达斯。这对你写的任务来说是完美的

代码语言:javascript
复制
import pandas as pd

df = pd.read_html('https://carbuzz.com/cars/tesla/model-3')
print(df[0])

产出:

代码语言:javascript
复制
          Trim    Engine            Transmission        Drivetrain Price (MSRP)
0      Model 3  Electric  Single Speed Automatic  Rear-Wheel Drive      $46,990
1   Long Range  Electric  Single Speed Automatic   All-Wheel Drive      $57,990
2  Performance  Electric  Single Speed Automatic   All-Wheel Drive      $62,990

更新非熊猫解决方案

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


url = 'https://carbuzz.com/cars/tesla/model-3'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
for tr in soup.find('tbody').find_all('tr'):
    trim = tr.find('td', {'class': ['car-sub-model-trim-levels-table__name', 'cb-table__sticky-left']}).get_text()
    price = tr.find('td', class_='car-sub-model-trim-levels-table__price').get_text()
    print(trim, price)

产出:

代码语言:javascript
复制
Model 3 $46,990
Long Range $57,990
Performance $62,990
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73038344

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档