首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >sed:重命名文件中的选择性字符串

sed:重命名文件中的选择性字符串
EN

Stack Overflow用户
提问于 2021-02-04 14:47:08
回答 2查看 38关注 0票数 2

我有一个名为protein.faa的文件,其内容是:

代码语言:javascript
复制
>WP_004066472.1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>WP_004066568.1 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>WP_004066764.1 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>WP_004067064.1 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE

我只想在每个之后将名称重命名为filename+its订单号,即:

代码语言:javascript
复制
>protein_1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein_2 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>protein_3 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein_4 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE

我的代码是

代码语言:javascript
复制
name="$(echo protein.faa | sed 's/....$//')"
sed "s/>.*/>${name}/" protein.faa 

这让我只能

代码语言:javascript
复制
>protein
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein
MVEALYKCAKCGKEF
>protein
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE

如何添加订单编号并保留>protein_i后面的任何内容

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-02-04 14:53:27

此职务套餐gnu awk更多:

代码语言:javascript
复制
awk -i inplace 'BEGINFILE {fn=FILENAME; sub(/\..*$/, "", fn); i=0} $1 ~ /^>/{$1 = ">" fn "_" ++i} 1' *.faa

>protein_1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein_2 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>protein_3 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein_4 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE

使gnu awk更易读

代码语言:javascript
复制
awk -i inplace 'BEGINFILE {
   fn = FILENAME
   sub(/\..*$/, "", fn)
   i = 0
}
$1 ~ /^>/{
   $1 = ">" fn "_" ++i
} 1' *.faa

对于non-gnu awk:

代码语言:javascript
复制
for f in *.faa; do
   awk 'BEGINFILE {fn=FILENAME; sub(/\..*$/, "", fn)} $1 ~ /^>/{$1 = ">" fn "_" ++i} 1' "$f" > _tmp && mv _tmp "$f"
done
票数 1
EN

Stack Overflow用户

发布于 2021-02-04 15:46:33

使用这个Perl一行程序:

代码语言:javascript
复制
perl -pe 'BEGIN { $i = 1; chomp( $basename = `basename $ARGV[0] .faa` ); } s{^>\S+}{>${basename}_${i}} and $i++; ' in.faa > out.faa

若要就地更改文件:

代码语言:javascript
复制
perl -i.bak -pe 'BEGIN { $i = 1; chomp( $basename = `basename $ARGV[0] .faa` ); } s{^>\S+}{>${basename}_${i}} and $i++; ' in.faa

Perl一行程序使用以下命令行标志:

-e:告诉Perl在行中查找代码,而不是在文件中。

-p:每次循环输入一行,默认情况下将其分配给$_。在每个循环迭代之后添加print $_

-i.bak:就地编辑输入文件(覆盖输入文件).在重写之前,将扩展名.bak附加到原始文件的备份副本中。

还请参见:

perldoc perlrun: how to execute the Perl interpreter: command line switches

perldoc perlre: Perl regular expressions (regexes)

perldoc perlrequick: Perl regular expressions quick start

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66047949

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档