文章/答案/技术大牛

发布

社区首页 >问答首页 >将selenium and Beautiful soup中的多个字符串转换为CSV文件

问将selenium and Beautiful soup中的多个字符串转换为CSV文件
EN

Stack Overflow用户

提问于 2021-01-02 09:35:07

回答 1查看 42关注 0票数 1

我有这个刮刀，我试图导出为一个csv文件在谷歌Colab。我以字符串值的形式收到了抓取的信息，但无法将其转换为csv。我希望每个抓取的属性"title“、"size”等填充csv文件中的一列。我已经在Beautiful soup中运行了字符串，以删除HTML格式。请参阅下面的代码以获得帮助。

import pandas as pd
import time
import io
from io import StringIO
import csv
#from google.colab import drive
#drive.mount('drive')
#Use new Library (kora.selenium) to run chromedriver 
from kora.selenium import wd
#Import BeautifulSoup to parse HTML formatting
from bs4 import BeautifulSoup
wd.get("https://www.grailed.com/sold/EP8S3v8V_w") #Get webpage

ScrollNumber=round(200/40)+1
for i in range(0,ScrollNumber):
  wd.execute_script("window.scrollTo(0,document.body.scrollHeight)")
  time.sleep(2)

#--------------#
#Each new attribute will have to found using XPATH because Grailed's website is written in Javascript (js.react) not HTML
#Only 39 results will show because the JS page is infinite scroll and selenium must be told to keep scrolling.
follow_loop = range(2, 200)
for x in follow_loop:
  #Title 
    title = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    title += str(x)
    title += "]/a/div[3]/div[2]/p"
    title = wd.find_elements_by_xpath(title)
    title = str(title)
  #Price 
    price = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    price += str(x)
    price += "]/div/div/p/span"
    price = wd.find_elements_by_xpath(price)
    price = str(price)
  #Size 
    size = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    size += str(x)
    size += "]/a/div[3]/div[1]/p[2]"
    size = wd.find_elements_by_xpath(size)
    size = str(size)
  #Sold 
    sold = "//*[@id='shop']/div/div/div[3]/div[2]/div/div["
    sold += str(x)
    sold += "]/a/p/span"
    sold = wd.find_elements_by_xpath(sold)
    sold = str(sold)
  #Clean HTML formatting using Beautiful soup
    cleantitle = BeautifulSoup(title, "lxml").text
    cleanprice = BeautifulSoup(price, "lxml").text
    cleansize = BeautifulSoup(size, "lxml").text
    cleansold = BeautifulSoup(sold, "lxml").text

string

selenium

csv

python

pandas

回答 1

Stack Overflow用户

发布于 2021-01-02 12:14:32

这可是一大堆工作

from selenium import webdriver
import time
import csv

driver = webdriver.Chrome()

driver.get("https://www.grailed.com/sold/EP8S3v8V_w")

scroll_count = round(200 / 40) + 1
for i in range(scroll_count):
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    time.sleep(2)

time.sleep(3)

titles = driver.find_elements_by_css_selector("p.listing-designer")
prices = driver.find_elements_by_css_selector("p.sub-title.sold-price")
sizes = driver.find_elements_by_css_selector("p.listing-size.sub-title")
sold = driver.find_elements_by_css_selector("div.-overlay")

data = [titles, prices, sizes, sold]

data = [list(map(lambda element: element.text, arr)) for arr in data]

with open('sold_shoes.csv', 'w') as file:
    writer = csv.writer(file)
    j = 0
    while j < len(titles):
        row = []
        for i in range(len(data)):
            row.append(data[i][j])
        writer.writerow(row)
        j += 1

我不确定为什么它会在文件中的每一行之间换行，但我认为这不是问题。此外，这是一个天真的解决方案，因为它假设每个列表的大小是相同的，考虑使用一个列表并从父元素的子元素中创建新的列表。此外，我只使用了没有BeautifulSoup的Selenium，因为它对我来说更容易，但您也应该学习BS，因为它比Selenium更快。祝你编码愉快。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65535311

复制

相似问题

问将selenium and Beautiful soup中的多个字符串转换为CSV文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将selenium and Beautiful soup中的多个字符串转换为CSV文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将selenium and Beautiful soup中的多个字符串转换为CSV文件
EN