我想在网上刮这个网页,并在谷歌单张中导出。理想情况下,这是用FOR循环编码的,以获得所有的trims (也适用于其他的car/url输入)。如何在我的代码中得到修剪和价格?理想情况下,每修剪一行
因为我刚刚开始刮擦和编码会非常感谢您的输入!
输出:[<td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3" title="2022 Tesla Model 3 Specs">Model 3</a></td>, <td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3-long-range" title="2022 Tesla Model 3 Long Range Specs">Long Range</a></td>, <td class="car-sub-model-trim-levels-table__name cb-table__sticky-left"><a href="/cars/tesla/model-3/2022-tesla-model-3-performance" title="2022 Tesla Model 3 Performance Specs">Performance</a></td>]
代码:
from googleapiclient.discovery import build
from google.oauth2 import service_account
from bs4 import BeautifulSoup
import requests
import time
#Google Sheets verification
SERVICE_ACCOUNT_FILE = 'keys.json'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive']
creds = None
creds = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)
SAMPLE_SPREADSHEET_ID = 'PLACEHOLDER'
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
#Google sheets header row
sheet_header = [['Trim', 'MSRP']]
request = sheet.values().update(spreadsheetId=SAMPLE_SPREADSHEET_ID, range="caranddriver!A1", valueInputOption="USER_ENTERED", body={"values":sheet_header}).execute()
#Inputs/URLs to scrape:
URL2 = ('https://carbuzz.com/cars/tesla/model-3')
(response := requests.get(URL2)).raise_for_status()
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()
trim = overview.find_all(class_='car-sub-model-trim-levels-table__name cb-table__sticky-left')
print(trim)
msrp = ''
data_sheets = [[trim, msrp]]
request = sheet.values().append(spreadsheetId=SAMPLE_SPREADSHEET_ID, range="caranddriver!A2", valueInputOption="USER_ENTERED", body={"values":data_sheets}).execute()````发布于 2022-07-19 14:51:35
我建议你用潘达斯。这对你写的任务来说是完美的
import pandas as pd
df = pd.read_html('https://carbuzz.com/cars/tesla/model-3')
print(df[0])产出:
Trim Engine Transmission Drivetrain Price (MSRP)
0 Model 3 Electric Single Speed Automatic Rear-Wheel Drive $46,990
1 Long Range Electric Single Speed Automatic All-Wheel Drive $57,990
2 Performance Electric Single Speed Automatic All-Wheel Drive $62,990更新非熊猫解决方案
import requests
from bs4 import BeautifulSoup
url = 'https://carbuzz.com/cars/tesla/model-3'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
for tr in soup.find('tbody').find_all('tr'):
trim = tr.find('td', {'class': ['car-sub-model-trim-levels-table__name', 'cb-table__sticky-left']}).get_text()
price = tr.find('td', class_='car-sub-model-trim-levels-table__price').get_text()
print(trim, price)产出:
Model 3 $46,990
Long Range $57,990
Performance $62,990https://stackoverflow.com/questions/73038344
复制相似问题