文章/答案/技术大牛

发布

社区首页 >问答首页 >从内页获取数据并与当前页合并

问从内页获取数据并与当前页合并
EN

Stack Overflow用户

提问于 2011-10-30 12:11:30

回答 1查看 90关注 0票数 0

在我的html页面中有两列表，第一列是名称，第二列是链接，其中有一个 Date ，我希望能够下载这个页面获取这个date并引发它，所以在输出中我将有名称和日期。例如，在我们的第一页中

<table>
      <tr>
         <td>A</td>
         <td>http://something.com/2564.html</td>
      </tr>
</table>

在2564.html页面中有

<body>
     <p>the date is: 25 April 2009</p>
</body>

我怎么能有

<xml>
     <row>
         <name>A</name>
         <date>25 April 2009</date>
     </row>
</xml>

python

web-crawler

scrapy

回答 1

Stack Overflow用户

发布于 2011-11-01 13:14:05

我的方法是创建项目，用页面上的数据填充它，然后在meta中传递条目，向页面发出一个包含缺失数据的请求。当下载第二页时，我从meta获取项目并填充其他数据：

def parseItem(self, response):
    '''Get date from the first page.'''
    item = Item()
    item['firstdata'] = '???'
    ...
    otherDataPageLink = '???'
    yield Request(otherDataPageLink, meta = {'item': item}, callback = self.parseComments)

def parseComments(self, response):
    '''Get all the other data from second page.'''
    item = response.meta['item']
    item['otherdata'] = '???'
    yield item # return the item with all the data

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/7944802

复制

相似问题

问从内页获取数据并与当前页合并
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从内页获取数据并与当前页合并EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从内页获取数据并与当前页合并
EN