文章/答案/技术大牛

发布

社区首页 >问答首页 >带有perl的子字符串(Regex?)

问带有perl的子字符串(Regex?)
EN

Stack Overflow用户

提问于 2019-02-01 10:39:34

回答 1查看 99关注 0票数 0

根据以下两种情况，我需要帮助从字符串中提取"BODY“部分：

案例1：

Var1 = 
Content-Type: text/plain; charset="UTF-8"

BODY 

--000000000000ddc1610580816add


Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

BODY56 text/html

--000000000000ddc1610580816add-

案例2：

Var1=
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

BODY

--000000000000ddc1610580816add--



Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

BODY56 text/html

--000000000000ddc1610580816add-

我想做：

如果Var1包含：Content-Type: text/plain; charset="UTF-8"，那么提取Content-Type: text/plain; charset="UTF-8"和--000000000000ddc1610580816add之间的文本

如果Var1包含：

Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

然后在以下之间提取文本：

Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

和--000000000000ddc1610580816add--。

我的代码，如果有人能修好的话，我需要修复它：

 if (index($body, "Content-Type: text\/plain; charset=\"UTF-8\"\n
Content-Transfer-Encoding: quoted-printable") != -1) {
    $body =~ /Content-Type: text\/plain; charset="UTF-8"\n
Content-Transfer-Encoding: quoted-printable(.*?)--00.*/s ;
                        $body=$1;

}
    elsif   (index($body, "Content-Type: text\/plain; charset=\"UTF-8\"") != -1)
                              {
    $body =~ /Content-Type: text\/plain; charset="UTF-8"(.*?)--00.*/s ;
                        $body=$1;

}

perl

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-01 11:17:07

一种解决方案:使用/ms修饰符，请参阅佩雷

#!/usr/bin/perl
use strict;
use warnings;

my $regex = qr/\AContent-Type: [^\n]+\n(?:^Content-Transfer-Encoding: [^\n]+\n)?(.+)^--.+\Z/ms;
my $body;

my $input = <<'END_OF_STRING';
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

INPUT 1 BODY

--000000000000ddc1610580816add--
END_OF_STRING

($body) = ($input =~ $regex)
    or die "mismatch in INPUT 1!\n";
print "INPUT 1 '${body}'\n";

$input = <<'END_OF_STRING';
Content-Type: text/plain; charset="UTF-8"

INPUT 2 BODY

--000000000000ddc1610580816add--
END_OF_STRING

($body) = ($input =~ $regex)
    or die "mismatch in INPUT 2!\n";
print "INPUT 2 '${body}'\n";

exit 0;

测试运行：

$ perl dummy.pl
INPUT 1 '
INPUT 1 BODY

'
INPUT 2 '
INPUT 2 BODY

'

更新:带有OP提供的新输入字符串的：

#!/usr/bin/perl
use strict;
use warnings;

# multipart MIME content as single string
my $input = <<'END_OF_STRING';
--0000000000007bcdff05808169f5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

BODY text/plain

--0000000000007bcdff05808169f5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

BODY text/html

--0000000000007bcdff05808169f5
END_OF_STRING

# split into multiple parts at the separator
foreach my $part (split(/^--[^\n]+\n/ms, $input)) {
    # skip empty parts
    next if $part =~ /\A\s*\Z/m;

    # split header and body
    my($header, $body) = split("\n\n", $part, 2);

    # Only match parts with text/plain content
    # "Content-Type" must be matched case-insensitive
    if ($header =~ m{^(?i)Content-Type(?-i):\s+text/plain[;\s]}ms) {
        print "plain text BODY: '${body}'\n";
    }
}

exit 0;

测试输出：

$ perl dummy.pl
plain text BODY: 'BODY text/plain

'

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54477772

复制

相似问题

问带有perl的子字符串(Regex?)
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有perl的子字符串(Regex?)EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有perl的子字符串(Regex?)
EN