我有一个带有特定ID的文件。
ID.txt
aaa
bbb
ccc我还有一个类似于File.txt的文件
Query: ABC1
aaa
abc
bbb
ccc
Query: CAB1
bbb
ccc
abc
Query: CBB1
ass
aaa
bbc
**Expected output:**
Query: ABC1
aaa
bbb
ccc
Query: CAB1
bbb
ccc
Query: CBB1
aaa实际例子:
**IDs**
LYSC_CHICK
LACB_BOVIN
B5B0D4_BOVIN
DEF1_ARAHY
DEF2_ARAHY
DEF3_ARAHY
TRFL_BOVIN
Q0PKR4_ARAHY
Q0GM57_ARAHY
Q647G5_ARAHY
Q6JYQ7_HEVBR
AMP2_FAGES
**File**
Query: PROKKA_00022 hypothetical protein - 36 aa
Hit: AMP1_FAGES UniProt Fag e 4 UniProt P0DKH7 http://www.u
100.0% identity
Hit: AMP2_FAGES UniProt Fag e 4 UniProt P0DKH8 http://www.u
100.0% identity
Hit: O49860_HEVBR UniProt Hev b 6 UniProt O49860 http://www
100.0% identity
Hit: Q6JYQ7_HEVBR UniProt Hev b 6 UniProt Q6JYQ7 http://www
100.0% identity
Hit: HEVE_HEVBR UniProt Hev b 6 UniProt P02877 http://www.u
Query: PROKKA_00572 hypothetical protein - 36 aa
Hit: AMP1_FAGES UniProt Fag e 4 UniProt P0DKH7 http://www.u
100.0% identity
Hit: AMP2_FAGES UniProt Fag e 4 UniProt P0DKH8 http://www.u
100.0% identity
Hit: O49860_HEVBR UniProt Hev b 6 UniProt O49860 http://www
100.0% identity
Hit: Q6JYQ7_HEVBR UniProt Hev b 6 UniProt Q6JYQ7 http://www
100.0% identity
Query: PROKKA_01572 hypothetical protein - 36 aa
Hit: AMP1_FHYES UniProt Fag e 4 UniProt P0DKH7 http://www.u
100.0% identity
Hit: AMX5_FAGES UniProt Fag e 4 UniProt P0DKH8 http://www.u
100.0% identity
Hit: O87860_HLLBR UniProt Hev b 6 UniProt O49860 http://www
100.0% identity
Hit: JHYYQ7_HEVBR UniProt Hev b 6 UniProt Q6JYQ7 http://www
100.0% identity
**Expected output:**
Query: PROKKA_00022 hypothetical protein - 36 aa
Hit: Q6JYQ7_HEVBR UniProt Hev b 6 UniProt Q6JYQ7 http://www
Hit: AMP2_FAGES UniProt Fag e 4 UniProt P0DKH8 http://www.u
Query: PROKKA_00572 hypothetical protein - 36 aa
Hit: Q6JYQ7_HEVBR UniProt Hev b 6 UniProt Q6JYQ7 http://www
Hit: AMP2_FAGES UniProt Fag e 4 UniProt P0DKH8 http://www.u我需要在循环中完成这个任务吗?我试过这样的方法,但运气不佳:
for i in `cat ID.txt`
do
awk '/Query/{bar=$2} /"$i"/{print bar}' File.txt > output.txt
done(原员额更新,以反映预期的实际产出)。非常感谢你的帮助。02-01-2020更新,包括ID和文件以及预期输出文件的其他详细信息)
发布于 2020-01-30 13:11:52
你能试一下吗。
awk '
FNR==NR{
a[$1]
next
}
/^Query/ || $0 in a
' id.txt file.txt输出如下。
Query: ABC1
aaa
bbb
ccc
Query: CAB1
bbb
ccc
Query: CBB1
aaa发布于 2020-02-01 10:41:18
对于那些对@RavinderSingh13感兴趣的人来说,答案是正确的,只是对@EdMorton所述的一个小改动进行了一个实际的例子:
awk '
FNR==NR{
a[$1]
next
}
/^Query/ || $2 in a
' IDs File另一个输出的“查询”行列表中没有所需的匹配ID。因此,我使用了以下方法来删除其他行:
awk '
FNR==NR{
a[$1]
next
}
/^Query/ || $2 in a
' IDs File | grep -B 1 'Hit: ' | grep -v '^--' > output_filehttps://stackoverflow.com/questions/59986630
复制相似问题