我从一个网站上抓取数据,我想用JSON、excel、sqlite或文本格式来存储这些数据,这样数据看起来有条理,也很合理。请帮帮我。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.amazon.in/Skybags-Brat-Black-Casual-Backpack/dp/B08Z1HHHTD/ref=sr_1_2?dchild=1&keywords=skybags&qid=1627786382&sr=8-2')
product_title = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "productTitle"))).text
print(product_title)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[@data-hook='see-all-reviews-link-foot']"))).click()
while True:
for item in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[data-hook='review']"))):
reviewer = item.find_element_by_css_selector("span.a-profile-name").text
review = ' '.join([i.text.strip() for i in item.find_elements_by_xpath(".//span[@data-hook='review-body']")])
print(reviewer,review)
try:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@data-hook='pagination-bar']//a[contains(@href,'/product-reviews/') and contains(text(),'Next page')]"))).click()
WebDriverWait(driver, 10).until(EC.staleness_of(item))
except Exception as e:
break
driver.quit()发布于 2021-08-01 09:32:04
将值product_title、review和reviewer存储在字典中,并使用json模块将其转换为Json格式。
您可以以这种格式存储数据,并最终将列表转换为JSON。
lst = [{"product_title": <title>, "reviews": [{"review": <review>, "reviewer": <reviewer>}, {"review": <review>, "reviewer": <reviewer>}....]import json
json.dumps(lst)将数据写入JSON文件
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(lst , f, ensure_ascii=False)https://stackoverflow.com/questions/68608789
复制相似问题