首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >为什么这段代码会生成多个文件?我想要一个包含所有条目的文件

为什么这段代码会生成多个文件?我想要一个包含所有条目的文件
EN

Stack Overflow用户
提问于 2019-05-01 00:27:59
回答 1查看 65关注 0票数 0

我正在尝试使用beautifulsoup和xpath,并尝试使用以下代码,但现在我为每个URL获取一个文件,而不是以前为所有URL获取一个文件

我只是移到了CSV的读数上,以获得url列表,还添加了对url和响应的解析。但是当我现在运行它时,我得到了很多单独的文件,在某些情况下,一个文件实际上可能包含两个抓取的页面数据。那么,我是否需要将保存的文件移出(缩进)

代码语言:javascript
复制
import scrapy
import requests
from DSG2.items import Dsg2Item
from bs4 import BeautifulSoup
import time
import datetime
import csv

class DsgSpider(scrapy.Spider):
    name = "dsg"

    def start_requests(self):
        urlLinks = []
        with open('dsgLinks.csv','r') as csvf:
            urls = csv.reader(csvf)
            for urlLink in urls:
                urlLinks.append(urlLink)

        for url in urlLinks:
            yield scrapy.Request(url=url[0], callback=self.parse)

    def parse(self, response):
        dets = Dsg2Item()
        now = time.mktime(datetime.datetime.now().timetuple())
        r = requests.get(response.url, timeout=5)

        html = r.text
        soup = BeautifulSoup(html, "html.parser")

        dets['style'] = " STYLE GOES HERE "
        dets['brand'] = " BRAND GOES HERE "                    
        dets['description'] = " DESCRIPTION GOES HERE "
        dets['price'] = " PRICE GOES HERE "
        dets['compurl'] = response.url[0]
        dets['reviewcount'] = " REVIEW COUNT GOES HERE "
        dets['reviewrating'] = " RATING COUNT GOES HERE "
        dets['model'] = " MODEL GOES HERE "
        dets['spechandle'] = " HANDLE GOES HERE "
        dets['specbladelength'] = " BLADE LENGTH GOES HERE "
        dets['specoveralllength'] = " OVERALL LENGTH GOES HERE "
        dets['specweight'] = " WEIGHT GOES HERE "
        dets['packsize'] = " PACKSIZE GOES HERE "

        for h1items in soup.find_all('h1',class_="product-title"):
            strh1item = str(h1items.get_text())
            dets['description']=strh1item.lstrip()

        for divitems in soup.find_all('div', class_="product-component"):
            for ulitems in divitems.find_all('ul'):
                for litem in ulitems.find_all('li'):
                    strlitem = str(litem.get_text())
                    if 'Model:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['model']=strlitem[bidx:lidx].lstrip()

                    elif 'Handle:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['spechandle']=strlitem[bidx:lidx].lstrip()

                    elif 'Blade Length:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['specbladelength'] = strlitem[bidx:lidx].lstrip()

                    elif 'Overall Length:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['specoveralllength'] = strlitem[bidx:lidx].lstrip()

                    elif 'Weight:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['specweight'] = strlitem[bidx:lidx].lstrip()

                    elif 'Pack Qty:' in strlitem:
                        bidx = strlitem.index(':')+1
                        lidx = len(strlitem)
                        dets['packsize']=strlitem[bidx:lidx].lstrip()             

        for litems in soup.find_all('ul', class_="prod-attr-list"):
            for litem in litems.find_all('li'):
                strlitem = str(litem.get_text())
                if 'Style:' in strlitem:
                    bidx = strlitem.index(':')+1
                    lidx = len(strlitem)
                    dets['style']=strlitem[bidx:lidx].lstrip()

                elif 'Brand:' in strlitem:
                    bidx = strlitem.index(':')+1
                    lidx = len(strlitem)
                    dets['brand']=strlitem[bidx:lidx].lstrip()                    

        for divitems in soup.find_all('div', class_="outofstock-label"):
            dets['price'] = divitems.text          

        for spanitems in soup.find_all('span',class_="final-price"):
            for spanitem in spanitems.find_all('span',itemprop="price"):
                strspanitem = str(spanitem.get_text())
                dets['price'] = '${:,.2f}'.format(float(strspanitem.lstrip()))

        for divitems in soup.find_all('div',id="BVRRSummaryContainer"):
            for spanitem in divitems.find_all('span',class_="bvseo-reviewCount"):
                strspanitem = str(spanitem.get_text())
                dets['reviewcount']=strspanitem.lstrip()
            for spanitem in divitems.find_all('span',class_="bvseo-ratingValue"):
                strspanitem = str(spanitem.get_text())
                dets['reviewrating']=strspanitem.lstrip()

        filename = 'dsg-%s.csv' % str(int(now))
        locallog = open(filename, 'a+')
        locallog.write(','.join(map(str, dets.values())) +"\n")
        locallog.close()

我想修复这个代码,因为它现在的工作将所有抓取的数据保存到一个文件,因为它是原来的。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-05-01 00:29:35

为每次运行创建一个带有时间戳的新文件名:

filename = 'dsg-%s.csv' % str(int(now))

只需将其替换为:

filename = 'dsg.csv'

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55925202

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档