我试图在解析的网页中获取节点的内容。这是我的代码:
include('simplehtmldom_1_5/simple_html_dom.php');
// get DOM from URL or file
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
$html = file_get_html($feedUrl);
$xpath = "/html/body/div[5]/div[1]/div[1]/div[1]/div[5]/div[3]/div[1]/div[1]/div[1]/div[1]/a[1]/div[1]/div[1]/div[3]/div[1]/div[2]/h3[1]/div[1]/a[1]";
foreach($html->find($xpath) as $e)
echo $e->title . '<br>';在本例中,我试图从网页中获得“税务经验CPA,Inc”的名称。问题是find($xpath)返回的数组始终为空。当我打开Google并使用那个xpath搜索节点时,我能够准确地找到我想要的节点。但这在我的代码中行不通。我所使用的路径一定有问题,但我不知道它是什么。我找了又找,但都找不到我做错了什么。请帮帮忙。
发布于 2013-12-21 12:26:40
这个网站有很多有ids和类的节点,使用它们创建一个更短、更简单的xpath表达式来检索您想要的!
以下是您的工作代码:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load_file($feedUrl);
// Find all anchors
$anchors = $html->find("//div[@class='srp-business-name']/a");
// Display all titles
foreach($anchors as $a)
echo $a->title . '<br>';输出
Tax Experience CPA Inc
Bernice Hassan CPA Accounting & Tax Services
Begosh Tax Service CPA
At-Home CPA Tax Service
CPA Financial & Tax Service
My Tax CPA
...Working DEMO
编辑:
下面是一个修改后的代码,从每个"element/div“获取标题和电话号码。
注意,find("...", $index)返回由$index指定的一个元素(从0开始的N个元素),如果没有设置$index,则返回一个元素数组.
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load_file($feedUrl);
// Find all elements
$divs = $html->find('div.business-container-inner');
// loop through all elements and display the useful parts
foreach($divs as $div) {
$title = $div->find('div.srp-business-name a', 0)->title;
$phone = $div->find('span.business-phone', 0)->plaintext;
echo $title ." - ". $phone . "<br>";
}
// Clear DOM object
$html->clear();
unset($html);Working DEMO
发布于 2013-12-21 13:19:50
我觉得你应该试试这个。
include('simplehtmldom_1_5/simple_html_dom.php');
// get DOM from URL or file
$feedUrl = "http://www.yellowpages.com/md/cpa-tax?menu_search=false&page=1&refinements%5Bfacet_clicked%5D=HeadingText&refinements%5Bheadingtext%5D%5B%5D=Accountants-Certified+Public&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation&refinements%5Bheadingtext%5D%5B%5D=Tax+Return+Preparation-Business";
$html = new simple_html_dom();
$html->load_file($feedUrl);
$xpath = ".srp-business-name a";
foreach($html->find($xpath) as $e)
echo $e->title . '<br>';https://stackoverflow.com/questions/20716713
复制相似问题