文章/答案/技术大牛

发布

社区首页 >问答首页 >WordPress:基于内容的标题生成目录

问WordPress:基于内容的标题生成目录
EN

Stack Overflow用户

提问于 2022-02-01 09:00:42

回答 1查看 308关注 0票数 -1

我想根据我的文章标题生成一个目录列表。

我已经找到了一个解决方案，可以从内容中获取所有标题，并将<h2>标记替换为<a>标记。

问题是，我还需要用链接替换<h3>标记，并将它们显示在链接列表中。

我的结果应该是这样的：

<ul>
    <li><a href="#h2-1">I was a H2 headline</a></li>
    <li>
        <a href="#h2-2">Also a H2 headline</a>
        <ul>
            <li><a href="#h3-1">H3 headline</a></li>
            <li><a href="#h3-2">Another H3 headline</a></li>
        </ul>
    </li>
</ul>

我的问题是，有些标题可能有class=""元素，而其他标题则没有。目前，我用str_replace删除了所有的class=""。这不是最好的解决方案，但它对我和我对regex的一点理解都有效。

下面的代码是我从内容中获取每个标题的函数。

我首先获取帖子的内容并将其存储在$content中。

从这里，我得到了所有的标题(<h2> - <h6>)，并用下面的行将它们存储在$results中：

preg_match_all('#<h[2-6]*[^>]*>.*?<\/h[2-6]>#',$content,$results);

目前，我只使用<h2>标题，因为我不知道如何明智地这样做，而且我必须对每个标题级别重复以下几行：

$toc = str_replace('<h2','<li><a',$toc);
$toc = str_replace('</h2>','</a></li>',$toc);

但我最大的问题是标题的嵌套。我如何生成像上面这样的HTML代码？

同样重要的是:我如何处理这些不同的标题格式：

<h2 class="style" id="name">
<h2 id="name" class="style">
<h2 id="name">

以下是我的当前代码：

$content_postid = get_the_ID();
$content_post   = get_post($content_postid);
$content        = $content_post->post_content;
$content        = apply_filters('the_content', $content);
$content        = str_replace(']]>', ']]&gt;', $content);

preg_match_all('#<h[2-6]*[^>]*>.*?<\/h[2-6]>#',$content,$results);

$toc = implode("\n",$results[0]);

// This part is messy because I don't really understand regex :-(
$toc = preg_replace('/class=".*?"/', '', $toc);
$toc = str_replace('<strong>','',$toc);
$toc = str_replace('</strong>','',$toc);
$toc = str_replace('<h2','<li><a',$toc);
$toc = str_replace('</h2>','</a></li>',$toc);
$toc = str_replace('id="','href="#',$toc);

//plug the results into appropriate HTML tags
$toc = '<div id="toc">
<ul class="list-unstyled">
'.$toc.'
</ul>
</div>';

echo $toc;

这是我当前的输出(如您所见，只有<h2>标题)：

<ul class="list-unstyled">
    <li><a href="#h2-1">I was a H2 headline</a></li>
    <li><a href="#h2-2">Also a H2 headline</a></li>
</ul>

编辑：这里有一个$content内部的示例HTML代码

<p>Lorem ipsum dolor sit amet...</p>
<p>consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat</p>
<img src="/path/to/image.jpg" />
<h2 class="style" id="name">
<p>Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat</p>
<p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat</p> 
<h3 class="style" id="name">Headline 3</h3>
<p>vel illum dolore eu feugiat nulla facilisis at vero et accumsan et iusto odio dignissim qui</p>
<h3 class="style" id="name">On more Headline 3</h3>
<p>blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi</p>
<h2 id="name" class="style">Headline 2 with class</h2>
<p>Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet</p>
<h2 id="name">Another Headline 2 without class</h2>
<p>doming id quod mazim placerat facer possim assum</p>

编辑2:

我找到了一个看起来正确的函数(这里)。但我不能让它起作用。

我还找到了一个使用DOMDocument 这里的函数。但我现在正在测试。目前，它显示了整个内容。

下面是代码：

$doc = new DOMDocument();
$doc->loadHTML($code);

// create document fragment
$frag = $doc->createDocumentFragment();
// create initial list
$frag->appendChild($doc->createElement('ol'));
$head = &$frag->firstChild;
$xpath = new DOMXPath($doc);
$last = 1;

// get all H1, H2, …, H6 elements
foreach ($xpath->query('//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]') as $headline) {
    // get level of current headline
    sscanf($headline->tagName, 'h%u', $curr);

    // move head reference if necessary
    if ($curr < $last) {
        // move upwards
        for ($i=$curr; $i<$last; $i++) {
            $head = &$head->parentNode->parentNode;
        }
    } else if ($curr > $last && $head->lastChild) {
        // move downwards and create new lists
        for ($i=$last; $i<$curr; $i++) {
            $head->lastChild->appendChild($doc->createElement('ol'));
            $head = &$head->lastChild->lastChild;
        }
    }
    $last = $curr;

    // add list item
    $li = $doc->createElement('li');
    $head->appendChild($li);
    $a = $doc->createElement('a', $headline->textContent);
    $head->lastChild->appendChild($a);

    // build ID
    $levels = array();
    $tmp = &$head;
    // walk subtree up to fragment root node of this subtree
    while (!is_null($tmp) && $tmp != $frag) {
        $levels[] = $tmp->childNodes->length;
        $tmp = &$tmp->parentNode->parentNode;
    }
    $id = 'sect'.implode('.', array_reverse($levels));
    // set destination
    $a->setAttribute('href', '#'.$id);
    // add anchor to headline
    $a = $doc->createElement('a');
    $a->setAttribute('name', $id);
    $a->setAttribute('id', $id);
    $headline->insertBefore($a, $headline->firstChild);
}

// append fragment to document
$doc->getElementsByTagName('body')->item(0)->appendChild($frag);

// echo markup
echo $doc->saveHTML();

wordpress

replace

php

html

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-01 20:55:45

一种只使用DOM从html源代码中解析和提取相关信息的方法。结果是将其构建为字符串。

libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTML($html);

$xp = new DOMXPath($dom);
$nodes = $xp->query('//*[contains("h1 h2 h3 h4 h5 h6", name())]');

$currentLevel = ['level' => 0 /*, 'count' => 0*/ ];
$stack = [];
$format = '<li><a href="#%s">%s</a></li>';
$result = '';

foreach($nodes as $node) {
    $level = (int)$node->tagName[1]; // extract the digit after h
  
    while($level < $currentLevel['level']) {
        $currentLevel = array_pop($stack);
        $result .= '</ul>';
    }
    
    if ($level === $currentLevel['level']) {
        $currentLevel['count']++;
    } else {
        $stack[] = $currentLevel;
        $currentLevel = ['level' => $level, 'count' => 1];
        $result .= '<ul>';
    }

    $result .= sprintf($format, $node->getAttribute('id'), $node->nodeValue);    
}

$result .= str_repeat('</ul>', count($stack));

演示

为了逐步构建预期的树结构，此代码使用一个堆栈(FILO)来存储数组的级别(h后面的数字)和已经为该级别添加的节点数。当当前节点的级别高于前一个节点时，数组将存储在堆栈中。如果当前节点的级别低于前一个节点，则最后一个元素将被解除堆栈(直到最后一个元素具有更高或相同的级别)。如果当前节点和以前节点的级别相同，则堆栈保持不变，计数项在数组中递增。

主循环之后，代码对堆栈中的其余项进行计数，以正确关闭ul标记。

xpath查询详细信息：

 //*        [contains("h1 h2 h3 h4 h5 h6", name())]
|___|      |_______________________________________|
location   predicate
path

定位路径

从当前位置到DOM树中的所有// (即通过退出根)
*任何元素节点

谓词

name()返回当前元素名
contains(haystack, needle)

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70937827

复制

相似问题

问WordPress:基于内容的标题生成目录
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问WordPress:基于内容的标题生成目录EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问WordPress:基于内容的标题生成目录
EN