嗨,我有一个网站的主页,我正在使用Curl阅读,我需要抓取该网站的页数。
信息在div中:-
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>我需要的值是15,但这可以是任何数字,这取决于网站,但将始终在相同的位置。
如何在PHP中轻松读取此值并将其赋值给变量。
谢谢
乔纳森
发布于 2009-10-20 14:41:46
为此,您可以使用PHP's DOM module。使用DOMDocument::loadhtmlfile()读取页面,然后创建一个DOMXPath对象并查询文档中具有class=“页面编号”属性的所有span元素。
(编辑:哦,这不是你要找的,请看第二个代码片段)
$html = '<html><head><title>:::</title></head><body>
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
</body></html>';
$doc = new DOMDocument;
// since the content "is already here" we use loadhtml(content)
// instead of loadhtmlfile(url)
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//span[@class="page-numbers"]');
echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';编辑:这样做吗?
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>(倒数第二个a元素)总是指向最后一页,即此链接是否包含您要查找的值?
然后,您可以使用一个XPath表达式来选择倒数第二个a元素,然后选择它的子元素span。
//div[@class="pager"] <- select each <div> where the attribute class equals "pager"
//div[@class="pager"]/a <- select each <a> that is a direct child of the pager div
//div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last
//div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>(您可能想获取一个好的XPath教程;-)
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
echo $nodelist->item(0)->nodeValue;
}
else {
echo 'not found';
}发布于 2009-10-20 14:50:06
这是你可能想要使用xpath来做的事情--这需要将页面加载为dom文档对象:
$domDoc = new DOMDocument();
$domDoc->loadHTMLFile("http://path/to/yourfile.html");
$xp = new DOMXPath($domDoc);
$nodes = $xp->query("//xpath/to/relevant/node");
$value = $nodes[0];我已经有一段时间没有编写好的xpath了,所以您应该阅读一些文章来理解这一部分,但这应该不会太难。
发布于 2009-10-20 14:50:55
也许吧
$nodes = $dom->getElementsByTagName("span");
$maxPageNum = 0;
foreach($nodes as $node)
{
if( $node.class == "page-numbers" && $node.value > $maxPageNum )
{
$maxPageNum = $node.value;
}
}我不了解PHP,所以访问dom节点的类/内部文本可能并不那么容易,但一定有某种方法可以获得这些信息,并且这里的伪代码应该可以工作。
https://stackoverflow.com/questions/1595072
复制相似问题