示例文本文件将如下所示
ID Z4WTH3_9ACTN Unreviewed; 182 AA.
AC Z4WTH3; A0SD0SDF;
AC Z12SDFG3; ADFFGDF;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64;
MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN
//
ID Z4WXU8_9ACTN Unreviewed; 203 AA.
AC Z4WXU8;
AC QWERDFV1;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64;
MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG
//
ID Z9JHX1_9GAMM Unreviewed; 132 AA.
AC Z9JHX1;
SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64;
MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV
//实际文件是一个100 is的文件,该文件只包含一个"ID“行,并且始终以"ID”行开头。以"//“结尾
"AC“行可以是多个。我们必须以第一个"AC“行的第一个元素作为文件名。
需要根据"//“将该文件拆分为多个文件。每个文件都应该命名为行中以AC开头的文本。
所以输出文件看起来就像
Z4WTH3.txt
ID Z4WTH3_9ACTN Unreviewed; 182 AA.
AC Z4WTH3; A0SD0SDF;
AC Z12SDFG3; ADFFGDF;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 182 AA; 20675 MW; B85D18AC3B1F0E75 CRC64;
MNFLEYNKDE KLHFNYKKSC GLWLIVVALI IFAATVIGGK QIINMSVFSF GYVAAFLSIN
//Z4WXU8.txt
ID Z4WXU8_9ACTN Unreviewed; 203 AA.
AC Z4WXU8;
AC QWERDFV1;
DT 11-JUN-2014, integrated into UniProtKB/TrEMBL.
SQ SEQUENCE 203 AA; 23224 MW; 35F1AE4342F6B3AC CRC64;
MDCKSIRSEV LWQVVRLREK LMNFLEYNKD EKLCFNYKKS CGLWLIVVAL IIFAATVIGG
//Z9JHX1.txt
ID Z9JHX1_9GAMM Unreviewed; 132 AA.
AC Z9JHX1;
SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64;
MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV
//发布于 2018-02-27 06:25:28
遵循awk可能会帮助您在同样的。
awk '/^ID/{close(filename);val=$2;sub(/_.*/,"",val);filename=val".txt"} {print > filename}' Input_file解决方案2:根据OP文件名,应该来自字符串AC,因此现在也添加了以下解决方案。
awk '/^ID/{close(filename);first=$0 ORS;next} /^AC/{val=$2;sub(";","",val);filename=val".txt";print first $0 > filename;next} {print > filename}' Input_file或者,如果Input_file不是在所有部分都有ID标记,那么我们可以用AC标记编写close函数,如下所示:
awk '/^ID/{first=$0 ORS;next} /^AC/{close(filename);val=$2;sub(";","",val);filename=val".txt";print first $0 > filename;next} {print > filename}' Input_fileExplanation:现在也添加了对解决方案的解释:
awk '
/^ID/{ ##Searching string ID here if it is present in any line then do following:
first=$0 ORS; ##Creating variable named first whose value is current line with ORS(output record separator).
next} ##next is awk default keyword which will sip further statements.
/^AC/{ ##Checking here condition if a line contains string AC then do following:
close(filename); ##Closing the file which was previously written heer so that we will NOT get too many open files issues.
val=$2; ##Creating variable named val and keeping its value as 2nd field of current line.
sub(";","",val); ##Using sub utility of awk to subsitute semi colon with NULL in variable val here.
filename=val".txt"; ##Creating variable named filename whose value is variable val and .txt(creating output file names here).
print first $0 > filename; ##Printing variable first and current line in the output file here.
next ##next will skip all further statements now.
}
{
print > filename ##Printing the current lines into output file whoever are NOT satisfying the above 2 conditions.
}
' Input_file ##Mentioning the Input_file name here.发布于 2018-02-27 07:26:58
另一种方法是使用RS (GNU,由于RS)来分离记录:
$ gawk '
BEGIN {
RS=ORS="\n//\n" # record separators
}
{
for(i=1;i<=NF;i++) # go thru each field in record
if($i=="AC") { # once AC found
f=$(i+1) "TXT" # next one is the filename
sub(/;/,".",f) # replace ; with .
print > f # print to file (multiple AC:s lead to multiple files)
close(f) # close to avoid problem with too many open files
# overwrites files when files with same name
}
}' file文件:
$ ls -l Z*
-rw-r--r-- 1 james james 254 Feb 27 09:23 Z4WTH3.TXT
-rw-r--r-- 1 james james 254 Feb 27 09:23 Z4WXU8.TXT
-rw-r--r-- 1 james james 202 Feb 27 09:23 Z9JHX1.TXT在文件中:
$ cat Z9JHX1.TXT
ID Z9JHX1_9GAMM Unreviewed; 132 AA.
AC Z9JHX1;
SQ SEQUENCE 132 AA; 13880 MW; 0E09988C0F3ED155 CRC64;
MKISVDTNVL ARAVLQDDAN QGRSASTLLK DASLIAVSLP CLCELVWILS RGAKLSKEDV
//发布于 2018-02-27 13:09:54
与GNU awk一起用于多个字符RS和RT:
awk -v RS='\n//\n' -v ORS= -F'[[:space:];]+' '{print $0 RT > ($7".txt")}' file用任何一只头巾:
awk -F'[[:space:];]+' '
$1 == "AC" { out = $2".txt" }
{ rec = rec $0 ORS }
$0 == "//" {
printf "%s", rec > out
close out
rec = ""
}
' filehttps://stackoverflow.com/questions/49002341
复制相似问题