我有这样的CSV输入文件格式,字段1中有核苷酸序列,字段2中有文本,字段4中有整数:
ATGC,CD3,56
ATGC,CD4,67
ATGC,IgD,126
ATGC,IgM,127
AGTC,CD3,67
AGTC,CD4,78
AGTC,IgD,102
AGTC,IgM,89
TCGA,CD3,334
TCGA,CD4,123
TCGA,IgD,456
TCGA,IgM,80
CGTA,CD3,54
CGTA,CD4,32
CGTA,IgD,82
CGTA,IgM,117我使用Mac的3列格式使用数字打开了这个CSV文件,但是,我想将它转换为表(或矩阵)格式(也是CSV文件),将第一列核苷酸序列转换为一个标题,并希望结果看起来也像一个表(或矩阵):
ATGC AGTC TCGA CGTA
CD3 56 67 334 54
CD4 67 78 123 32
IgD 126 102 456 82
IgM 127 89 80 117下面是我实际输入的CSV文件中的一个部分(示例input.txt):
AGAATAGTCTGATTCT,-,,38
AGAATAGTCTGATTCT,AnnexinV,,51
AGAATAGTCTGATTCT,CD127,,39
AGAATAGTCTGATTCT,CD138,,3
AGAATAGTCTGATTCT,CD14,,2
AGAATAGTCTGATTCT,CD16,,4
AGAATAGTCTGATTCT,CD19,,10
AGAATAGTCTGATTCT,CD20,,6
AGAATAGTCTGATTCT,CD24,,21
AGAATAGTCTGATTCT,CD25,,4
AGAATAGTCTGATTCT,CD27,,87
AGAATAGTCTGATTCT,CD3,,235
AGAATAGTCTGATTCT,CD34,,5
AGAATAGTCTGATTCT,CD38,,18
AGAATAGTCTGATTCT,CD4,,412
AGAATAGTCTGATTCT,CD43,,99
AGAATAGTCTGATTCT,CD5,,430
AGAATAGTCTGATTCT,CD56,,3
AGAATAGTCTGATTCT,CD8,,7
AGAATAGTCTGATTCT,IgD,,4
AGAATAGTCTGATTCT,IgM,,2
TGTGGTAGTTCGTCTC,-,,9
TGTGGTAGTTCGTCTC,AnnexinV,,42
TGTGGTAGTTCGTCTC,CD127,,6
TGTGGTAGTTCGTCTC,CD138,,4
TGTGGTAGTTCGTCTC,CD16,,40
TGTGGTAGTTCGTCTC,CD19,,7
TGTGGTAGTTCGTCTC,CD20,,2
TGTGGTAGTTCGTCTC,CD24,,24
TGTGGTAGTTCGTCTC,CD25,,2如何使用Linux文本格式命令来做到这一点?
发布于 2019-06-03 07:35:17
使用Miller (https://github.com/johnkerl/miller)与
mlr --n2p --ifs "," label key,property,emptyfield,value \
then reshape -s key,value \
then unsparsify \
then cut -x -f emptyfield input.csv你会有
property AGAATAGTCTGATTCT TGTGGTAGTTCGTCTC
- 38 9
AnnexinV 51 42
CD127 39 6
CD138 3 4
CD14 2 -
CD16 4 40
CD19 10 7
CD20 6 2
CD24 21 24
CD25 4 2
CD27 87 -
CD3 235 -
CD34 5 -
CD38 18 -
CD4 412 -
CD43 99 -
CD5 430 -
CD56 3 -
CD8 7 -
IgD 4 -
IgM 2 -https://unix.stackexchange.com/questions/522046
复制相似问题