如果我在下面有一个输入文件,在Linux中有没有什么命令/方法可以将它转换成我想要的文件,如下所示?
输入文件:
Column_1 Column_2
scaffold_A SNP_marker1
scaffold_A SNP_marker2
scaffold_A SNP_marker3
scaffold_A SNP_marker4
scaffold_B SNP_marker5
scaffold_B SNP_marker6
scaffold_B SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9
scaffold_A SNP_marker10所需的输出文件:
Column_1 Column_2
scaffold_A SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4
scaffold_B SNP_marker5;SNP_marker6;SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9;SNP_marker10我正在考虑使用grep,uniq等,但仍然不知道如何完成这项工作。
发布于 2013-07-24 19:37:43
Perl解决方案:
perl -lane 'sub output {
print "$last\t", join ";", @buff;
}
$last //= $F[0];
if ($F[0] ne $last) {
output();
undef @buff;
$last = $F[0];
}
push @buff, $F[1];
}{ output();'发布于 2013-07-24 21:51:00
python解决方案(假设在命令行中传入了文件名)
from __future__ import print_function #not needed with Python3
with open('infile') as infile, open('outfile', 'w') as outfile:
outfile.write(infile.readline()) # transfer the header
col_one, col_two = infile.readline().split()
col_two = [col_two] # make it a list
for line in infile:
data = line.split()
if col_one != data[0]:
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
col_one = data[0]
col_two = [data[1]]
else:
col_two.append(data[1])
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)发布于 2013-07-24 21:19:28
bash脚本中的awk解决方案
#!/bin/bash
awk '
BEGIN{
str = ""
}
{
if ( str != $1 ) {
if ( NR != 1 ){
printf("\n")
}
str = $1
printf("%s\t%s",$1,$2)
} else if ( str == $1 ) {
printf(";%s",$2)
}
}
END{
printf("\n")
}' your_file.txthttps://stackoverflow.com/questions/17832631
复制相似问题