文章/答案/技术大牛

发布

社区首页 >问答首页 >删除第一个制表符和最后一个分号之间的所有内容

问删除第一个制表符和最后一个分号之间的所有内容
EN

Stack Overflow用户

提问于 2012-12-20 22:16:05

回答 1查看 168关注 0票数 0

我有一个文件，它的行是这样的：

EF457507|S000834932     Root;Bacteria;"Acidobacteria";Acidobacteria_Gp4;Gp4
EF457374|S000834799     Root;Bacteria;"Acidobacteria";Acidobacteria_Gp14;Gp14
AJ133184|S000323093     Root;Bacteria;Cyanobacteria/Chloroplast;Cyanobacteria;Family I;GpI
DQ490004|S000686022     Root;Bacteria;"Armatimonadetes";Armatimonadetes_gp7
AF268998|S000340459     Root;Bacteria;TM7;TM7_genera_incertae_sedis

我想打印第一个制表符和最后一个分号之间的任何内容，就像这样

EF457507|S000834932     Gp4
EF457374|S000834799     Gp14
AJ133184|S000323093     GpI
DQ490004|S000686022     Armatimonadetes_gp7
AF268998|S000340459     TM7_genera_incertae_sedis

我试着使用正则表达式，但它不起作用，有没有办法用Linux，awk或Perl来做呢？

perl

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-12-20 22:22:10

您可以使用sed

sed 's/\t.*;/\t/' file

## This matches a tab character '\t'; followed by any character '.' any number of
## times '*'; followed by a semicolon; and; replaces all of this with a tab 
## character '\t'.

sed 's/[^\t]*;//' file

## Things inside square brackets become a character class. For example, '[0-9]' 
## is a character class. Obviously, this would match any digit between zero and
## nine. However, when the first character in the character class is a '^', the
## character class becomes negated. So '[^\t]*;' means match anything not a tab
## character any number of times followed by a semicolon.

或awk

awk 'BEGIN { FS=OFS="\t" } { sub(/.*;/,"",$2) }1' file

awk '{ sub(/[^\t]*;/,"") }1' file

结果：

EF457507|S000834932     Gp4
EF457374|S000834799     Gp14
AJ133184|S000323093     GpI
DQ490004|S000686022     Armatimonadetes_gp7
AF268998|S000340459     TM7_genera_incertae_sedis

根据下面的评论，“删除最后一个分号之后的所有内容”，使用sed

sed 's/[^;]*$//' file

## '[^;]*$' will match anything not a semicolon any number of times anchored to 
## the end of the line.

或awk

awk 'BEGIN { FS=OFS="\t" } { sub(/[^;]*$/,"",$2) }1' file

awk '{ sub(/[^;]*$/,"") }1' file

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13974083

复制

相似问题

问删除第一个制表符和最后一个分号之间的所有内容
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除第一个制表符和最后一个分号之间的所有内容EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除第一个制表符和最后一个分号之间的所有内容
EN