我有一个脚本,它在条件对之间进行线性建模:数据帧如下所示:
Accession Sequence variable value
0 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 39.300171
1 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 132.637125
2 O14548 [R].gLPDQMLYr.[T] DMSO 1165.245826
3 O14548 [R].gLPDQMLYr.[T] DMSO 642.971908
4 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 83.906058
5 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 160.718841
6 O14548 [R].gLPDQMLYr.[T] DMSO 1240.856710
7 O14548 [R].gLPDQMLYr.[T] DMSO 557.508092
8 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 56.228425
9 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 302.346775
10 O14548 [R].gLPDQMLYr.[T] DMSO 1176.998098
11 O14548 [R].gLPDQMLYr.[T] DMSO 766.993819
12 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.387985
13 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.175678
14 O14548 [R].gLPDQMLYr.[T] CCCP 885.174420
15 O14548 [R].gLPDQMLYr.[T] CCCP 130.458963
16 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.557088
17 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.095801
18 O14548 [R].gLPDQMLYr.[T] CCCP 612.171540
19 O14548 [R].gLPDQMLYr.[T] CCCP 46.449990
20 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 6.016590
21 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.466220
22 O14548 [R].gLPDQMLYr.[T] CCCP 586.392482
23 O14548 [R].gLPDQMLYr.[T] CCCP 303.857624
24 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 44.627773
25 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 0.841494
26 O14548 [R].gLPDQMLYr.[T] C+I 632.355914
27 O14548 [R].gLPDQMLYr.[T] C+I 162.333292
28 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 12.075158
29 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 154.253098
30 O14548 [R].gLPDQMLYr.[T] C+I 159.767999
31 O14548 [R].gLPDQMLYr.[T] C+I 1031.399087
32 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 150.724386
33 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] C+I 260.684163
34 O14548 [R].gLPDQMLYr.[T] C+I 141.459156
35 O14548 [R].gLPDQMLYr.[T] C+I 262.659208我现在想为每一对都拟合一个线性模型。我通过以下代码获得对:
def tessa(source):
result = []
for p1 in range(len(source)):
for p2 in range(p1+1,len(source)):
result.append([source[p1],source[p2]])
return result
unique_conditions = list(set(conditions))
pairs = tessa(unique_conditions)
print(pairs)我循环遍历这些对,并按dataframe过滤条件:
for pair in pairs:
pair.sort()
print(pair)
print(pair[0],pair[1])
temp=melted_Peptides[(melted_Peptides['variable'].str.contains(pair[0]))|(melted_Peptides['variable'].str.contains(pair[1]))]
print(temp)问题来了。它不能正确过滤以下内容的.The输出:
['C+I', 'CCCP']
C+I CCCP
Accession Sequence variable value
12 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.387985
13 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.175678
14 O14548 [R].gLPDQMLYr.[T] CCCP 885.174420
15 O14548 [R].gLPDQMLYr.[T] CCCP 130.458963
16 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.557088
17 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.095801
18 O14548 [R].gLPDQMLYr.[T] CCCP 612.171540
19 O14548 [R].gLPDQMLYr.[T] CCCP 46.449990
20 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 6.016590
21 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.466220
22 O14548 [R].gLPDQMLYr.[T] CCCP 586.392482
23 O14548 [R].gLPDQMLYr.[T] CCCP 303.857624而对于下一次比较,它看起来还可以:
['CCCP', 'DMSO']
CCCP DMSO
Accession Sequence variable value
0 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 39.300171
1 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 132.637125
2 O14548 [R].gLPDQMLYr.[T] DMSO 1165.245826
3 O14548 [R].gLPDQMLYr.[T] DMSO 642.971908
4 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 83.906058
5 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 160.718841
6 O14548 [R].gLPDQMLYr.[T] DMSO 1240.856710
7 O14548 [R].gLPDQMLYr.[T] DMSO 557.508092
8 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 56.228425
9 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 302.346775
10 O14548 [R].gLPDQMLYr.[T] DMSO 1176.998098
11 O14548 [R].gLPDQMLYr.[T] DMSO 766.993819
12 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.387985
13 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.175678
14 O14548 [R].gLPDQMLYr.[T] CCCP 885.174420
15 O14548 [R].gLPDQMLYr.[T] CCCP 130.458963
16 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.557088
17 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.095801
18 O14548 [R].gLPDQMLYr.[T] CCCP 612.171540
19 O14548 [R].gLPDQMLYr.[T] CCCP 46.449990
20 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 6.016590
21 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] CCCP 0.466220
22 O14548 [R].gLPDQMLYr.[T] CCCP 586.392482
23 O14548 [R].gLPDQMLYr.[T] CCCP 303.857624对于第三个,它看起来又很奇怪:
['C+I', 'DMSO']
['C+I', 'DMSO']
C+I DMSO
Accession Sequence variable value
0 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 39.300171
1 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 132.637125
2 O14548 [R].gLPDQMLYr.[T] DMSO 1165.245826
3 O14548 [R].gLPDQMLYr.[T] DMSO 642.971908
4 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 83.906058
5 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 160.718841
6 O14548 [R].gLPDQMLYr.[T] DMSO 1240.856710
7 O14548 [R].gLPDQMLYr.[T] DMSO 557.508092
8 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 56.228425
9 O14548 [K].lAGAWASEAYSPQGLkPVVSTEAPPIIFATPTk.[L] DMSO 302.346775
10 O14548 [R].gLPDQMLYr.[T] DMSO 1176.998098
11 O14548 [R].gLPDQMLYr.[T] DMSO 766.993819我使用相同的代码来表示近似。5000个不同的数据帧,它总是有效的。这两个条件的名称完全相同,但在某些情况下,它会被打破。
有人能帮帮忙吗?
发布于 2021-01-22 19:05:14
可以添加regex=False参数,以避免在Series.str.contains中将值转换为正则表达式
melted_Peptides['variable'].str.contains(pair[0], regex=False)https://stackoverflow.com/questions/65843780
复制相似问题