首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将selenium and Beautiful soup中的多个字符串转换为CSV文件

将selenium and Beautiful soup中的多个字符串转换为CSV文件
EN

Stack Overflow用户
提问于 2021-01-02 09:35:07
回答 1查看 42关注 0票数 1

我有这个刮刀,我试图导出为一个csv文件在谷歌Colab。我以字符串值的形式收到了抓取的信息,但无法将其转换为csv。我希望每个抓取的属性"title“、"size”等填充csv文件中的一列。我已经在Beautiful soup中运行了字符串,以删除HTML格式。请参阅下面的代码以获得帮助。

代码语言:javascript
复制
import pandas as pd
import time
import io
from io import StringIO
import csv
#from google.colab import drive
#drive.mount('drive')
#Use new Library (kora.selenium) to run chromedriver 
from kora.selenium import wd
#Import BeautifulSoup to parse HTML formatting
from bs4 import BeautifulSoup
wd.get("https://www.grailed.com/sold/EP8S3v8V_w") #Get webpage

ScrollNumber=round(200/40)+1
for i in range(0,ScrollNumber):
  wd.execute_script("window.scrollTo(0,document.body.scrollHeight)")
  time.sleep(2)

#--------------#
#Each new attribute will have to found using XPATH because Grailed's website is written in Javascript (js.react) not HTML
#Only 39 results will show because the JS page is infinite scroll and selenium must be told to keep scrolling.
follow_loop = range(2, 200)
for x in follow_loop:
  #Title 
    title = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    title += str(x)
    title += "]/a/div[3]/div[2]/p"
    title = wd.find_elements_by_xpath(title)
    title = str(title)
  #Price 
    price = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    price += str(x)
    price += "]/div/div/p/span"
    price = wd.find_elements_by_xpath(price)
    price = str(price)
  #Size 
    size = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    size += str(x)
    size += "]/a/div[3]/div[1]/p[2]"
    size = wd.find_elements_by_xpath(size)
    size = str(size)
  #Sold 
    sold = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    sold += str(x)
    sold += "]/a/p/span"
    sold = wd.find_elements_by_xpath(sold)
    sold = str(sold)
  #Clean HTML formatting using Beautiful soup
    cleantitle = BeautifulSoup(title, "lxml").text
    cleanprice = BeautifulSoup(price, "lxml").text
    cleansize = BeautifulSoup(size, "lxml").text
    cleansold = BeautifulSoup(sold, "lxml").text
EN

回答 1

Stack Overflow用户

发布于 2021-01-02 12:14:32

这可是一大堆工作

代码语言:javascript
复制
from selenium import webdriver
import time
import csv

driver = webdriver.Chrome()

driver.get("https://www.grailed.com/sold/EP8S3v8V_w")

scroll_count = round(200 / 40) + 1
for i in range(scroll_count):
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    time.sleep(2)

time.sleep(3)

titles = driver.find_elements_by_css_selector("p.listing-designer")
prices = driver.find_elements_by_css_selector("p.sub-title.sold-price")
sizes = driver.find_elements_by_css_selector("p.listing-size.sub-title")
sold = driver.find_elements_by_css_selector("div.-overlay")

data = [titles, prices, sizes, sold]

data = [list(map(lambda element: element.text, arr)) for arr in data]

with open('sold_shoes.csv', 'w') as file:
    writer = csv.writer(file)
    j = 0
    while j < len(titles):
        row = []
        for i in range(len(data)):
            row.append(data[i][j])
        writer.writerow(row)
        j += 1

我不确定为什么它会在文件中的每一行之间换行,但我认为这不是问题。此外,这是一个天真的解决方案,因为它假设每个列表的大小是相同的,考虑使用一个列表并从父元素的子元素中创建新的列表。此外,我只使用了没有BeautifulSoup的Selenium,因为它对我来说更容易,但您也应该学习BS,因为它比Selenium更快。祝你编码愉快。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65535311

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档