如何像这样解析嵌套的html标签:
<article class="tile">
<div class="tile-content">
<a href=link-1">ignore</a>
<div class="tile-content__text tile-content__text--arrow-white">
<label class="label-date label-date--blue">01.12.2021</label>
<h4><a class="link-color-black" href="link-1">title-1</a></h4>
<p class="tile-content__paragraph tile-content__paragraph--gray pd-ver-10">
content-1
</p>
</div>
<a href="link-1" class="btn btn-link btn-link__more btn-link--arrow-right float-right">more</a>
</div><article class="tile">
<div class="tile-content">
<a href=link-1">ignore</a>
<div class="tile-content__text tile-content__text--arrow-white">
<label class="label-date label-date--blue">02.12.2021</label>
<h4><a class="link-color-black" href="link-2">title-2</a></h4>
<p class="tile-content__paragraph tile-content__paragraph--gray pd-ver-10">
content-2
</p>
</div>
<a href="link-2" class="btn btn-link btn-link__more btn-link--arrow-right float-right">more</a>
</div>
</article>要像这样排列:
$parsedArray = [
0 =>
['title => 'title',
'link' => 'link-1',
'date' => '2021-12-01',
'content' => 'content-1']
1 =>
['title => 'title-2',
'link' => 'link-2',
'date' => '2021-12-02',
'content' => 'content-2']
,....]我像上面一样使用xquery,但这删除了所有标签,之后我只从所有标签中提取文本,我需要从所有标签中提取信息,有什么建议吗?
$dom = new DOMDocument();
$dom->loadHTML($html['html']);
$xpath = new DOMXPath($dom);
$nodelist = $xpath->query("//article[contains(@class, 'tile')]");
foreach ($nodelist as $n) {
echo '<pre>';
var_dump($n);
echo '</pre>';}
发布于 2021-12-03 02:10:07
var_dump不会解析DOM :)
您只需在tile中重新查询元素,然后将它们分配给数组。
如果重要的话,分配一个工作项数组来定义结构,否则只需构建结果即可。
<?php
$str = '<article class="tile">
<div class="tile-content">
<a href=link-1">ignore</a>
<div class="tile-content__text tile-content__text--arrow-white">
<label class="label-date label-date--blue">02.12.2021</label>
<h4><a class="link-color-black" href="link-2">title-2</a></h4>
<p class="tile-content__paragraph tile-content__paragraph--gray pd-ver-10">
content-2
</p>
</div>
<a href="link-2" class="btn btn-link btn-link__more btn-link--arrow-right float-right">more</a>
</div>
</article>';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHtml($str);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$result = [];
foreach ($xpath->query("//article[contains(@class, 'tile')]") as $tile) {
// define item structure
$item = [
'title' => '',
'link' => '',
'date' => '',
'content' => ''
];
// find date
$query = $xpath->query("//label[contains(@class, 'label-date')][1]", $tile);
if (count($query)) {
$item['date'] = $query[0]->nodeValue;
}
// find link/title
$query = $xpath->query("//h4/a[1]", $tile);
if (count($query)) {
$item['link'] = $query[0]->getAttribute('href');
$item['title'] = $query[0]->nodeValue;
}
// find content
$query = $xpath->query("//p[contains(@class, 'tile-content__paragraph')][1]", $tile);
if (count($query)) {
$item['content'] = $query[0]->nodeValue;
}
// assign
$result[] = $item;
// cleanup
unset($item, $query);
}
print_r($result);输出:
Array
(
[0] => Array
(
[title] => title-2
[link] => link-2
[date] => 02.12.2021
[content] =>
content-2
)
)https://stackoverflow.com/questions/70208171
复制相似问题