我将数据帧存储在df中。和存储在samp中的标头
samp = ["1C", "1P1", "1P2"]
for i in samp:
c1 = df[df['SampleID']==i]
print(i,"\n", c1['Ranked'].value_counts(),"\n" )这将产生如下输出:
1C
missense_variant 630
frameshift_variant 39
inframe_insertion 24
splice_donor_variant 16
inframe_deletion 15
stop_gained 14
splice_acceptor_variant 4
stop_lost 2
start_lost 2
protein_altering_variant 1
Name: Ranked, dtype: int64
1P1
missense_variant 349
frameshift_variant 21
inframe_insertion 19
inframe_deletion 13
stop_gained 12
splice_donor_variant 11
splice_acceptor_variant 3
stop_lost 1
protein_altering_variant 1
Name: Ranked, dtype: int64
1P2
missense_variant 323
frameshift_variant 30
inframe_insertion 21
splice_donor_variant 12
inframe_deletion 6
stop_gained 4
stop_lost 2
start_lost 2
splice_acceptor_variant 2
protein_altering_variant 1
Name: Ranked, dtype: int64 我想通过合并count_value()来创建一个新的df,如下所示:
Type 1C 1P1 1P2
missense_variant 630 349 323
frameshift_variant 39 21 30
inframe_insertion 24 19 21
splice_donor_variant 16 11 12
inframe_deletion 15 13 6
stop_gained 14 12 4
splice_acceptor_variant 4 3 2
stop_lost 2 0 2
start_lost 2 1 2
protein_altering_variant 1 1 1发布于 2021-05-11 19:18:00
首先使用Series.isin过滤SampleID,然后使用SeriesGroupBy.value_counts
samp = ["1C", "1P1", "1P2"]
df2 = df[df['SampleID'].isin(samp)].groupby('SampleID')['Ranked'].value_counts().unstack()或crosstab
df1 = df[df['SampleID'].isin(samp)]
df2 = pd.crosstab(df1['SampleID'], df1['Ranked'])https://stackoverflow.com/questions/67485636
复制相似问题