文章/答案/技术大牛

发布

社区首页 >问答首页 >Scrapy CSV文件的格式不正确

问Scrapy CSV文件的格式不正确
EN

Stack Overflow用户

提问于 2020-07-09 21:46:02

回答 1查看 46关注 0票数 1

基本上，我将提取的数据放到csv文件中，但格式有一些问题。

-First只显示零件，其他部分不显示fg。数量和价格-Secondly列标题似乎在向下重复行。

我想为零件，价格，数量要显示在不同的列和标题将是名称。如果有人能告诉我在哪里可以学到这一点，那将会有很大帮助！

    name = 'digi'
    allowed_domains = ['digikey.com']
    custom_settings = {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"


    }
    start_urls = ['https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?FV=-1%7C428%2C-8%7C774%2C7%7C1&quantity=0&ColumnSort=0&page=1&k=cy621&pageSize=500&pkeyword=cy621']

    def parse(self, response):
        data={}
        parts=response.css('Table#productTable.productTable')
        for part in parts:
            for p in part.css('tbody#lnkPart'):
                yield {
                    'Part': p.css('td.tr-mfgPartNumber span::text').extract(),
                    'Quantity': p.css('td.tr-minQty.ptable-param span.desktop::text').extract(),
                    'Price': p.css('td.tr-unitPrice.ptable-param span::text').extract()
                }

设置

BOT_NAME = 'website1'

SPIDER_MODULES = ['website1.spiders']
NEWSPIDER_MODULE = 'website1.spiders'

#Export as CSV Feed
#FEED_EXPORT_FIELDS: ["parts", "quantity", "price"]
FEED_FORMAT = "csv"
FEED_URI = "parts.csv"

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'website1 (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

csv

scrapy

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-09 22:07:21

当你在Scrapy shell中测试时，你是否得到了正确的数据？在将选择器提交到脚本之前，在scrapy shell中尝试它们是值得的。

我没有详细研究过你的CSS选择器，但是有很多for循环，基本上你需要做的就是遍历tr，所以找到一个CSS选择器，让你得到所有的行，而不是遍历整个表，然后向下工作，可能会更有效。

更新：

由于您询问了for循环

for p in response.css('tbody#lnkPart > tr'):
        
       yield {
                'Part': p.css('td.tr-mfgPartNumber span::text').get(),
                'Quantity': p.css('td.tr-minQty.ptable-param span.desktop::text').get(),
                'Price': p.css('td.tr-unitPrice.ptable-param span::text').get()
       }

注意，我们只需要循环遍历tr，这将选择所有的tr。get()方法只选择具有特定tr的项。

注意:您需要考虑如何处理空格和无项。值得仔细考虑这一部分，并想出一种简单的方法来修改结果。

更新的代码

def parse(self, response):

    for p in response.css('tbody#lnkPart > tr'):
    
        if p.css('td.tr-minQty.ptable-param span.desktop::text').get(): 
            quantity = p.css('td.tr-minQty.ptable-param span.desktop::text').get()
            quantity = quantity.strip()
            cleaned_quantity = int(quantity.replace(',',''))
        else:
            quantity = 'No quantity'
     
        if p.css('td.tr-unitPrice.ptable-param span::text').get():
            price = p.css('td.tr-unitPrice.ptable-param span::text').get()
            cleaned_price = price.strip()
        else: 
            price = 'No Price'
        yield {
                'Part': p.css('td.tr-mfgPartNumber span::text').get(),
                'Quantity': cleaned_quantity,
                'Price': cleaned_price
                }

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62816350

复制

相似问题

问Scrapy CSV文件的格式不正确
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy CSV文件的格式不正确EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Scrapy CSV文件的格式不正确
EN