首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python问题将img的相对url连接到绝对url

python问题将img的相对url连接到绝对url
EN

Stack Overflow用户
提问于 2018-11-09 21:11:15
回答 1查看 101关注 0票数 0

为了使其正常工作,我的当前代码面临以下问题。我只是连接了URL,但它不起作用:

当前相对路径(这是我在普通response.xpath爬行中得到的):

代码语言:javascript
复制
/imagename.jpg

这是我目前的代码:

代码语言:javascript
复制
class MercadoSpider(CrawlSpider):
    name = 'extractor'
    item_count = 0

    rules = {
        # Para cada item
        Rule(LinkExtractor(allow = (), restrict_xpaths = ('//*[@id="main-container"]/div/div[2]/div[1]/ul/li[7]/a'))),
        Rule(LinkExtractor(allow =(), restrict_xpaths = ('//*[@id="main-container"]/div/div[2]/div[2]/div/div/div/h4/a')),
                            callback = 'parse_item', follow = False)
    }

    def parse_item(self, response):
        ml_item = MercadoItem()
        ml_item['titulo'] = response.xpath('normalize-space(//*[@id="main-container"]/div/div[2]/div[1]/div[2]/h2)').extract_first()
        ml_item['sku'] = response.xpath('normalize-space(//*[@id="main-container"]/div/div[2]/div[1]/div[2]/ul/li[2]/a)').extract()
        ml_item['marca'] = response.xpath('normalize-space(//*[@id="main-container"]/div/div[2]/div[1]/div[2]/ul/li[1]/a)').extract()
        ml_item['tecnologia'] = response.xpath('normalize-space(//*[@id="DetailedSpecs"]/table/tbody/tr[4]/td)').extract_first()
        ml_item['tipo'] = response.xpath('normalize-space(//*[@id="DetailedSpecs"]/table/tbody/tr[3]/td)').extract()
        ml_item['precio'] = response.xpath('normalize-space(//*[@id="main-container"]/div/div[2]/div[1]/div[2]/div[1]/span[2])').extract()
        ml_item['color'] = response.xpath('normalize-space(//*[@id="mainC"]/div/div/div/div/ul/li/b)').extract()
        ml_item['potencia'] = response.xpath('normalize-space(//*[@id="ProductReview"]/div/div/div/dl/dd/strong)').extract()
        ml_item['condicion'] = response.xpath('normalize-space(//*[@class="stock in-stock"])').extract_first()
        ml_item['desc_corta'] = response.xpath('normalize-space(//*[@id="tab-additional_information"])').extract()
        ml_item['descripcion'] = response.xpath('normalize-space(//*[@id="main-container"]/div/div[2]/div[2]/div)').extract()
        ml_item['id_publicacion'] = response.xpath('normalize-space(//*[@id="mainC"]/div/div/div[11]/div[1]/ul/li[1]/b)').extract()
        #imagenes del producto
        xpath1 = 'http://www.website.com.ar'
        xpath2 = response.xpath('//*[@id="main-container"]/div/div[2]/div[1]/div[1]/p/img/@src').extract_first()
        ml_item['image_urls'] = xpath1 + xpath2
        ml_item['image_name'] = response.xpath('//*[@id="main-container"]/div/div[2]/div[1]/div[1]/p/img/@src').extract()
        #info de la tienda o vendedor
        ml_item['categoria'] = response.xpath('normalize-space(//*[@class="woocommerce-breadcrumb breadcrumbs"])').extract_first()
        self.item_count += 1
        if self.item_count > 10000:
            raise CloseSpider('item_exceeded')
        yield ml_item
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-11-10 05:12:22

试一试

absolute_url = response.urljoin(your_url_from_xpath)

刮伤文件

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53233318

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档