我试图改变基因和mRNA在我的gff中的坐标。我希望像CDS和mRNA这样的其他条目不受我的代码的影响,并且仍然在我的输出中列出。我使用的代码给了我语法错误。需要知道如何才能得到所需的输出。
我的投入gff:
Chr01 xyz gene 210262 212819 . - . ID=Chr01.g13944
Chr01 xyz mRNA 210262 212819 . - . ID=Chr01.g13944;Parent=Chr01.g13944
Chr01 xyz CDS 210262 210528 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210262 210528 . - . ID=Chr01.g13944.exon4;Parent=Chr01.g13944
Chr01 xyz CDS 210622 210728 . - 2 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210622 210728 . - . ID=Chr01.g13944.exon3;Parent=Chr01.g13944
Chr01 xyz CDS 210933 212121 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210933 212121 . - . ID=Chr01.g13944.exon2;Parent=Chr01.g13944
Chr01 xyz CDS 212730 212819 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 212730 212819 . - . ID=Chr01B.g13944.exon1;Parent=Chr01B.g13944期望产出:
Chr01 xyz gene 210162 212919 . - . ID=Chr01.g13944
Chr01 xyz mRNA 210162 212919 . - . ID=Chr01.g13944;Parent=Chr01.g13944
Chr01 xyz CDS 210262 210528 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210262 210528 . - . ID=Chr01.g13944.exon4;Parent=Chr01.g13944
Chr01 xyz CDS 210622 210728 . - 2 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210622 210728 . - . ID=Chr01.g13944.exon3;Parent=Chr01.g13944
Chr01 xyz CDS 210933 212121 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 210933 212121 . - . ID=Chr01.g13944.exon2;Parent=Chr01.g13944
Chr01 xyz CDS 212730 212819 . - 0 ID=Chr01.g13944.cds;Parent=Chr01.g13944
Chr01 xyz exon 212730 212819 . - . ID=Chr01B.g13944.exon1;Parent=Chr01B.g13944awk -F '\t' '{if ($3 ~ /gene/ || $3 ~ /mRNA/) print $1,$2,$3,$4-100,$5+100,$6,$7,$8,$9 || if ($3 ~ /CDS/ || $3 ~ /exon/) print$0}' input.gff > out.gff
发布于 2022-10-27 17:37:18
尝试:
awk 'BEGIN{ FS=OFS="\t" }
($3=="gene" || $3=="mRNA"){ $4-=100; $5+=100 }1' infile这只会改变“基因”和"mRNA“型基因组的坐标,使其他类型保持不变。
https://unix.stackexchange.com/questions/722703
复制相似问题