我有一个名为protein.faa的文件,其内容是:
>WP_004066472.1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>WP_004066568.1 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>WP_004066764.1 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>WP_004067064.1 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE我只想在每个之后将名称重命名为filename+its订单号,即:
>protein_1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein_2 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>protein_3 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein_4 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE我的代码是
name="$(echo protein.faa | sed 's/....$//')"
sed "s/>.*/>${name}/" protein.faa 这让我只能
>protein
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein
MVEALYKCAKCGKEF
>protein
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE如何添加订单编号并保留>protein_i后面的任何内容
发布于 2021-02-04 14:53:27
此职务套餐gnu awk更多:
awk -i inplace 'BEGINFILE {fn=FILENAME; sub(/\..*$/, "", fn); i=0} $1 ~ /^>/{$1 = ">" fn "_" ++i} 1' *.faa
>protein_1 MULTISPECIES: NADH-quinone oxidoreductase subunit K [Thermococcus]
MIPLQFVTAFLMIFMGIYAFLYKRNLIKLILALNLI
LVLTSIVIGVCVLSLAMALTINAYRHYGTLDVNKLRRLRG
>protein_2 MULTISPECIES: DNA-directed RNA polymerase subunit P [Thermococcus]
MVEALYKCAKCGKEF
>protein_3 MULTISPECIES: Lrp/AsnC ligand binding domain-containing protein [Thermococcus]
MVTAFILMVTAAGKEREVMEKLLTYPEVKEAYVVYG
>protein_4 MULTISPECIES: hypothetical protein [Thermococcus]
MEITIEKFKPKVTRPFKRKNEYWVKL
PSAKELVDEYFSE使gnu awk更易读
awk -i inplace 'BEGINFILE {
fn = FILENAME
sub(/\..*$/, "", fn)
i = 0
}
$1 ~ /^>/{
$1 = ">" fn "_" ++i
} 1' *.faa对于non-gnu awk:
for f in *.faa; do
awk 'BEGINFILE {fn=FILENAME; sub(/\..*$/, "", fn)} $1 ~ /^>/{$1 = ">" fn "_" ++i} 1' "$f" > _tmp && mv _tmp "$f"
done发布于 2021-02-04 15:46:33
使用这个Perl一行程序:
perl -pe 'BEGIN { $i = 1; chomp( $basename = `basename $ARGV[0] .faa` ); } s{^>\S+}{>${basename}_${i}} and $i++; ' in.faa > out.faa若要就地更改文件:
perl -i.bak -pe 'BEGIN { $i = 1; chomp( $basename = `basename $ARGV[0] .faa` ); } s{^>\S+}{>${basename}_${i}} and $i++; ' in.faaPerl一行程序使用以下命令行标志:
-e:告诉Perl在行中查找代码,而不是在文件中。
-p:每次循环输入一行,默认情况下将其分配给$_。在每个循环迭代之后添加print $_。
-i.bak:就地编辑输入文件(覆盖输入文件).在重写之前,将扩展名.bak附加到原始文件的备份副本中。
还请参见:
perldoc perlrun: how to execute the Perl interpreter: command line switches
https://stackoverflow.com/questions/66047949
复制相似问题