文章/答案/技术大牛

发布

社区首页 >问答首页 >Perl:如何将最后一个句子分割成另一个数组？

问Perl:如何将最后一个句子分割成另一个数组？
EN

Stack Overflow用户

提问于 2018-05-01 09:35:41

回答 1查看 239关注 0票数 0

我试图将<Description>文本拆分为Bit编号，并将其放入特定的Bit number元素中。这是文件，我正在解析。

        <Register>
                <Name>abc</Name>
                <Abstract></Abstract>
                <Description>Bit 6  random description
                    Bit 5 msg octet 2
                    Bit 4-1 
                    Bit 0 msg octet 4
                    These registers containpart of the Upstream Message. 
                    They should be written only after the cleared by hardware.
                    </Description>
        <Field>
        <Name>qwe</Name>

        <Description></Description>
        <BitFieldOffset>6</BitFieldOffset>
        <Size>1</Size>
        <AccessMode>Read/Write</AccessMode>

        </Field>
    <Field>
        <Name>qwe</Name>

        <Description></Description>
        <BitFieldOffset>5</BitFieldOffset>
        <Size>1</Size>
        <AccessMode>Read/Write</AccessMode>

        </Field>
<Field>
....
</Field>
                </Register>
            <Register>
                <Name>xyz</Name>
                <Abstract></Abstract>
                <Description>Bit 3  msg octet 1
                    Bit 2 msg octet 2
                    Bit 1 msg octet 3
                    Bit 0 msg octet 4
                    These registers. 
                    They should be written only after the cleared by hardware.
                </Description>
<Field>
....
</Field>
<Field>
....
</Field>
            </Register>

预期产出将是：

Expected output:

<Register>
<long_description>
These registers containpart of the Upstream Message. 
    They should be written only after the cleared by hardware.
</long_description>

<bit_field position="6" width=" 1">
<long_description>
<p> random description</p>
</long_description>
<bit_field position="5" width=" 1">
<long_description>
<p>...</p>
</long_description>
<bit_field position="1" width=" 4">
<long_description>
<p>...</p>
</long_description>

</Register>

<Register>
.
.
.
</Register>

我正在使用splitting包来解析这个文件，但却陷入了分裂。

foreach my $register ( $twig->get_xpath('//Register') ) # get each <Register>
    {

        my $reg_description= $register->first_child('Description')->text;
        .
        .
        .
          foreach my $xml_field ($register->get_xpath('Field'))
          {
             .
             .
             my @matched = split ('Bit\s+[0-9]', $reg_description);
             .
             .
           }
   }

我不知道如何相应地创建<bit_field>并将除Bit之外的文本保存到<Register> <long_description>中。有人能帮忙吗？

编辑：<Description>中的Bit可以有多行。在下面的例子中，Bit 10-9的描述是Bit 8的开始。

<Description>Bit 11 GOOF 
Bit 10-9 Clk Selection:
 00 :  8 MHz
 01 :  4 MHz
 10 :  2 MHz
 11 :  1 MHz
Bit 8 Clk Enable : 1 = Enable CLK
<Description>

regex

perl

xml-parsing

xml-twig

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-01 11:40:54

如果我做得对，你可以逐行查看整个文本。

使用正则表达式检查一行是否与模式匹配。抓取相关的部分。在保存散列的数组中一点一点地缓存，存储每一位的细节。

不包含位模式的缓冲区行。如果后面另一行包含位模式，则缓冲区必须属于最近的位。把它附加到那里。所有其他行都必须是整体描述的一部分。注意：--这并不区分对最后一位描述的任何附加行。如果有这样的一点，它的附加行将成为整体描述的开始。(但你说过这类事情不在你的数据中。)

概念证明：

#!/usr/bin/perl
use strict;
use warnings;

my $description_in = 'Bit 6  random description
                    Bla bla additional line bla bla
                    bla bla
                    Bit 5 msg octet 2
                    Empty line below

                    Bla bla set to gain instant world domination bla bla
                    Bit 4-1
                    Bit 0 msg octet 4
                    These registers containpart of the Upstream Message.
                    They should be written only after the cleared by hardware.

                    Empty line above
                    Bla bla bla...';

my @bits = ();
my $description_overall = '';

my $line_buffer = '';
foreach my $line (split("\n", $description_in)) {
  # if line
  #  begins with optional white spaces
  #  followed by "Bit"
  #  followed by at least one white space
  #  followed by at least one digit (we capture the digits)
  #  followed by an optional sequence of optional white spaces, "-", optional white spaces and at least one digit (we capture the digits)
  #  followed by an optional sequence of at least one white space and any characters (we capture the characters)
  #  followed by the end of the line
  if ($line =~ m/^\s*Bit\s+(\d+)(?:\s*-\s*(\d+))?(?:\s+(.*?))?$/) {
    my ($position_begin, $position_end, $description) = ($1, $2, $3);
    my $width;

    # if there already are bits we've processed
    if (scalar(@bits)) {
      # the lines possibly buffered belong to the bit before the current one, so append them to its description
      $bits[$#bits]->{description} .= (length($bits[$#bits]->{description}) ? "\n" : '') . $line_buffer;
      # and reset the line buffer to collect the additional lines of the current bit;
      $line_buffer = '';
    }

    # $position_end is defined only if it was a "Bit n-m"
    # otherwise set it to $position_begin
    $position_end = defined($position_end) ? $position_end : $position_begin;

    $width = abs($position_end - $position_begin) + 1;

    # set description to the empty string if not defined (i.e. no description was found)
    $description = defined($description) ? $description : '';

    # push a ref to a new hash with the keys position, description and width into the list of bits
    push(@bits, { position => (sort({$a <=> $b} ($position_begin, $position_end)))[0], # always take the lower position
                  description => $description,
                  width => $width });
  }
  else {
    # it's not a bit pattern, so just buffer the line
    $line_buffer .= (length($line_buffer) ? "\n" : '') . $line;
  }
}
# anything still in the buffer must belong to the overall description
$description_overall .= $line_buffer;

print("<Register>\n  <long_description>\n$description_overall\n  </long_description>\n");
foreach my $bit (@bits) {
  print("  <bit_field position=\"$bit->{position}\" width=\"$bit->{width}\">\n    <long_description>\n$bit->{description}\n    </long_description>\n  </bit_field>\n")
}
print("</Register>\n");

指纹：

<Register>
  <long_description>
                        These registers containpart of the Upstream Message.
                        They should be written only after the cleared by hardware.

                        Empty line above
                        Bla bla bla...
  </long_description>
  <bit_field position="6" width="1">
    <long_description>
random description
                        Bla bla additional line bla bla
                        bla bla
    </long_description>
  </bit_field>
  <bit_field position="5" width="1">
    <long_description>
msg octet 2
                        Empty line below

                        Bla bla set to gain instant world domination bla bla
    </long_description>
  </bit_field>
  <bit_field position="1" width="4">
    <long_description>

    </long_description>
  </bit_field>
  <bit_field position="0" width="1">
    <long_description>
msg octet 4
    </long_description>
  </bit_field>
</Register>

我把它写成独立的脚本，这样我就可以测试它了。你得把它改编成你的剧本。

也许添加一些处理的整体描述，消除了这些长序列的空白。

首先，我尝试使用一个连续模式(while ($x =~ m/^...$/gc))，但不知怎么地，这会使行尾消失，结果只匹配第二行。为了使它们与实际的匹配保持一致，查找不起作用(说它没有实现；我猜，我必须在这台计算机上检查我的Perl？)，所以显式拆分成行是一项工作。

还可以使用grep()s、map()s等缩短它。但我认为，冗长的版本更好地证明了这些想法。所以我甚至都没调查过。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50114268

复制

相似问题

问Perl:如何将最后一个句子分割成另一个数组？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Perl:如何将最后一个句子分割成另一个数组？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Perl:如何将最后一个句子分割成另一个数组？
EN