文章/答案/技术大牛

发布

社区首页 >问答首页 >用file1从file2中删除单词

问用file1从file2中删除单词
EN

Stack Overflow用户

提问于 2015-10-12 15:24:48

回答 2查看 170关注 0票数 2

我正在使用perl脚本来删除文本中的所有秒词。停止词是逐行存储的。我使用的是Mac命令行，perl安装正确。

此脚本工作不正常，存在边界问题。

#!/usr/bin/env perl -w
# usage: script.pl words text >newfile
use English;

# poor man's argument handler
open(WORDS, shift @ARGV) || die "failed to open words file: $!";
open(REPLACE, shift @ARGV) || die "failed to open replacement file: $!";

my @words;
# get all words into an array
while ($_=<WORDS>) { 
  chop; # strip eol
  push @words, split; # break up words on line
}

# (optional)
# sort by length (makes sure smaller words don't trump bigger ones); ie, "then" vs "the"
@words=sort { length($b) <=> length($a) } @words;

# slurp text file into one variable.
undef $RS;
$text = <REPLACE>;

# now for each word, do a global search-and-replace; make sure only words are replaced; remove possible following space.
foreach $word (@words) { 
     $text =~ s/\b\Q$word\E\s?//sg;
}

# output "fixed" text
print $text;

sample.txt

$ cat sample.txt
how about i decide to look at it afterwards what
across do you think is it a good idea to go out and about i 
think id rather go up and above

stopwords.txt

I
a
about
an
are
as
at
be
by
com
for
from
how
in
is
it
..

输出：

$ ./remove.pl stopwords.txt sample.txt 
i decide look fterwards cross do you think good idea go out d i 
think id rather go up d bove

正如您所看到的，它在后面使用As fterward替换。认为这是一个正则表达式问题。谁能帮我快点修补一下吗？谢谢你的帮助:J

regex

linux

perl

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-10-12 15:37:23

在$word的两边使用单词边界。目前，您只是在开始检查它。

您不需要具有\s?的\b条件：

$text =~ s/\b\Q$word\E\b//sg;

票数 1

Stack Overflow用户

发布于 2015-10-12 15:37:47

你的判罚不够严格。

$text =~ s/\b\Q$word\E\s?//sg;

当$word是a时，命令实际上是s/\ba\s?//sg。这意味着，删除以a开头的所有新单词的出现，后面跟着零或更多的空格。在afterwards中，这将成功地匹配第一个a。

通过用另一个\b结束单词，可以使匹配更加严格。喜欢

$text =~ s/\b\Q$word\E\b\s?//sg;

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/33084749

复制

相似问题

问用file1从file2中删除单词
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用file1从file2中删除单词EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用file1从file2中删除单词
EN