文章/答案/技术大牛

发布

社区首页 >问答首页 >找到子字符串后，创建DOM元素-php

问找到子字符串后，创建DOM元素-php
EN

Stack Overflow用户

提问于 2012-03-19 23:21:25

回答 2查看 241关注 0票数 1

我想用regex拆分一个字符串，然后在我找到匹配的地方创建一个dom元素，直到字符串结束。给定一个字符串；

$str="hi there! [1], how are you? [2]";

期望的结果：

<sentence>
hi there! <child1>1</child1>, how are you? <child2>2</child2>
</sentence>

我正在使用php dom -> $dom = new DOMDocument('1.0'); ...

创建根目录；(这可能与此无关，但有些人抱怨没有努力和其他东西……)

        $root= $dom->createElement('sentence', null);
        $root= $dom->appendChild($root);
        $root->setAttribute('attr-1', 'value-1');

我使用了几种方法，比如，还有一些使用preg-split；

$counter=1;
$pos = preg_match('/\[([1-9][0-9]*)\]/', $str);
    if ($pos == true) {
    $substr=$dom->createElement('child', $counter);
    $root->appendChild($substr);
    $counter++;
    }

我知道代码是不值得的，但再次表明它不是一种款待..

任何帮助都将不胜感激..

php

dom

preg-match

preg-split

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-03-19 23:44:16

你的原始代码并没有那么远。但是，您需要使正则表达式也匹配您想要添加的文本(为此您需要一个textnode )。在每次匹配之后，您还需要提前偏移量，以便继续匹配：

$str = "hi there! [1], how are you? [2]";

$dom = new DOMDocument('1.0');
$root= $dom->createElement('sentence', null);
$root= $dom->appendChild($root);
$root->setAttribute('attr-1', 'value-1'); # ...

$counter = 0;
$offset = 0;
while ($pos = preg_match('/(.*?)\[([1-9][0-9]*)\]/', $str, $matches, NULL, $offset)) {
    list(, $text, $number) = $matches;
    if (strlen($text)) {
        $root->appendChild($dom->createTextNode($text));
    }
    if (strlen($number)) {
        $counter++;
        $root->appendChild($dom->createElement("child$counter", $number));

    }
    $offset += strlen($matches[0]);
}

while循环可以与您拥有的if相媲美，只是将它变成了一个循环。如果有一些匹配的文本，也会添加文本节点(例如，字符串中可以有1，因此文本将为空。此示例的输出：

<?xml version="1.0"?>
<sentence attr-1="value-1">
  hi there! <child1>1</child1>, how are you? <child2>2</child2>
</sentence>

编辑在尝试了一下之后，我得出的结论是，您可能想要将问题分开。一部分是解析字符串，另一部分是实际插入节点(例如text上的textnode和elementnode，如果是数字的话)。从后面开始，这立即看起来很实用，第二部分先：

$dom = new DOMDocument('1.0');
$root = $dom->createElement('sentence', null);
$root = $dom->appendChild($root);
$root->setAttribute('attr-1', 'value-1'); # ...

$str = "hi there! [1], how are you? [2] test";

$it = new Tokenizer($str);
$counter = 0;
foreach ($it as $type => $string) {
    switch ($type) {
        case Tokenizer::TEXT:
            $root->appendChild($dom->createTextNode($string));
            break;

        case Tokenizer::NUMBER:
            $counter++;
            $root->appendChild($dom->createElement("child$counter", $string));
            break;

        default:
            throw new Exception(sprintf('Invalid type %s.', $type));
    }
}

echo $dom->saveXML();

在这个例子中，我们根本不关心解析。我们要么得到一个文本，要么得到一个数字($type)，我们可以决定是插入文本节点还是插入元素。因此，无论字符串的解析如何完成，此代码都将始终有效。如果它有问题(例如，$counter不再有趣了)，它将与字符串的解析/标记化无关。

解析本身已被封装到一个名为Tokenizer的Iterator中。它包含将字符串分解为文本和数字元素的所有内容。它处理所有的细节，比如如果在最后一个数字后面有一些文本会发生什么，等等：

class Tokenizer implements Iterator
{
    const TEXT = 1;
    const NUMBER = 2;
    private $offset;
    private $string;
    private $fetched;

    public function __construct($string)
    {
        $this->string = $string;
    }

    public function rewind()
    {
        $this->offset = 0;
        $this->fetch();
    }

    private function fetch()
    {
        if ($this->offset >= strlen($this->string)) {
            return;
        }
        $result = preg_match('/\[([1-9][0-9]*)\]/', $this->string, $matches, PREG_OFFSET_CAPTURE, $this->offset);
        if (!$result) {
            $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset));
            $this->offset = strlen($this->string);
            return;
        }
        $pos = $matches[0][1];
        if ($pos != $this->offset) {
            $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset, $pos - $this->offset));
        }
        $this->fetched[] = array(self::NUMBER, $matches[1][0]);
        $this->offset = $pos + strlen($matches[0][0]);
    }

    public function current()
    {
        list(, $current) = current($this->fetched);
        return $current;
    }

    public function key()
    {
        list($key) = current($this->fetched);
        return $key;
    }

    public function next()
    {
        array_shift($this->fetched);
        if (!$this->fetched) $this->fetch();
    }

    public function valid()
    {
        return (bool)$this->fetched;
    }
}

这样就把这两个问题分开了。除了迭代器类，也可以创建一个数组或类似的数组，但我发现迭代器更有用，所以我很快就写了一个。

同样，此示例在末尾输出XML，因此这里是示例性的。注意，我在最后一个元素后面添加了一些文本：

<?xml version="1.0"?>
<sentence attr-1="value-1">
  hi there! <child1>1</child1>, how are you? <child2>2</child2> test
</sentence>

票数 3

Stack Overflow用户

发布于 2012-03-19 23:29:16

首先使用正则表达式执行替换，然后解析文档。

$xml = preg_replace('/\[(\d+)\]/', '<child$1>$1</child$1>', $str);
$doc = new DOMDocument('1.0');
$doc->loadXML("<sentence>$xml</sentence>");

Here's a demo.

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/9772851

复制

相似问题

问找到子字符串后，创建DOM元素-php
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问找到子字符串后，创建DOM元素-phpEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问找到子字符串后，创建DOM元素-php
EN