这是我的第一次尝试,但没有成功。
$this->crawler = $client->request('GET', $this->url);
$document = new \DOMDocument('1.0', 'UTF-8');
$root = $document->appendChild($document->createElement('_root'));
$this->crawler->rewind();
$root->appendChild($document->importNode($this->crawler->current(), true));
$selectorsToRemove = ['script','p'];
foreach ($selectorsToRemove as $selector) {
$crawlerInverse = $this->crawler->filter($selector);
foreach ($crawlerInverse as $elementToRemove) {
$parent = $elementToRemove->parentNode;
$parent->removeChild($elementToRemove);
}
}
$this->crawler->clear();
$this->crawler->add($document);我想从这个页面中获得"p“标记,kv会告诉我,它在段落中有一些js,所以当我尝试执行$node-> text ()时,它会给我文本和js在"p”中的“脚本”。结构是这样的;
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<script>
"JS CODE"
</script>
</p>所以我只想要Lorem ipsum文本。
发布于 2014-12-20 12:04:16
我看了一下DomCrawler,没有看到它有什么意义。它似乎只是围绕着已经很容易使用的DOM扩展,所以我将采取一条捷径,直接使用它。
这个例子又短又简单,你应该能够或多或少地适应它的原样。你已经准备好了一个DOMDocument。
示例:
$html = <<<'HTML'
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<script>
"JS CODE"
</script>
</p>
HTML;
$dom = new DOMDocument();
$dom->loadXML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//p/script') as $node) {
$node->parentNode->removeChild($node);
}
echo $dom->saveXML();输出:
<?xml version="1.0"?>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>https://stackoverflow.com/questions/27579469
复制相似问题