遇到了matplotlib-venn的小模块之后,我使用它已经有一段时间了,我想知道是否有比我迄今为止所做的更好的方法来做事情。我知道,对于一个非常简单的Venn图,可以使用以下几行:
union = set1.union(set2).union(set3)
indicators = ['%d%d%d' % (a in set1, a in set2, a in set3) for a in union]
subsets = Counter(indicators)..。但也希望在三个集合的不同组合中有条目列表。
import numpy as np
from matplotlib_venn import venn3, venn3_circles
from matplotlib import pyplot as plt
import pandas as pd
# Read data
data = pd.read_excel(input_file, sheetname=sheet)
# Create three sets of the lists to be compared
set_1 = set(data[compare[0]].dropna())
set_2 = set(data[compare[1]].dropna())
set_3 = set(data[compare[2]].dropna())
# Create a third set with all elements of the two lists
union = set_1.union(set_2).union(set_3)
# Gather names of all elements and list them in groups
lists = [[], [], [], [], [], [], []]
for gene in union:
if (gene in set_1) and (gene not in set_2) and (gene not in set_3):
lists[0].append(gene)
elif (gene in set_1) and (gene in set_2) and (gene not in set_3):
lists[1].append(gene)
elif (gene in set_1) and (gene not in set_2) and (gene in set_3):
lists[2].append(gene)
elif (gene in set_1) and (gene in set_2) and (gene in set_3):
lists[3].append(gene)
elif (gene not in set_1) and (gene in set_2) and (gene not in set_3):
lists[4].append(gene)
elif (gene not in set_1) and (gene in set_2) and (gene in set_3):
lists[5].append(gene)
elif (gene not in set_1) and (gene not in set_2) and (gene in set_3):
lists[6].append(gene)
# Write gene lists to file
ew = pd.ExcelWriter('../Gene lists/Venn lists/' + compare[0] + ' & '
+ compare[1] + ' & ' + compare[2] + ' gene lists.xlsx')
pd.DataFrame(lists[0], columns=[compare[0]]) \
.to_excel(ew, sheet_name=compare[0], index=False)
pd.DataFrame(lists[1], columns=[compare[0] + ' & ' + compare[1]]) \
.to_excel(ew, sheet_name=compare[0] + ' & ' + compare[1], index=False)
pd.DataFrame(lists[2], columns=[compare[0] + ' & ' + compare[2]]) \
.to_excel(ew, sheet_name=compare[0] + ' & ' + compare[2], index=False)
pd.DataFrame(lists[3], columns=['All']) \
.to_excel(ew, sheet_name='All', index=False)
pd.DataFrame(lists[4], columns=[compare[1]]) \
.to_excel(ew, sheet_name=compare[1], index=False)
pd.DataFrame(lists[5], columns=[compare[1] + ' & ' + compare[2]]) \
.to_excel(ew, sheet_name=compare[1] + ' & ' + compare[2], index=False)
pd.DataFrame(lists[6], columns=[compare[2]]) \
.to_excel(ew, sheet_name=compare[2], index=False)
ew.save()
# Count the elements in each group
subsets = [len(lists[0]), len(lists[4]), len(lists[1]), len(lists[6]),
len(lists[2]), len(lists[5]), len(lists[3])]
# Basic venn diagram
fig = plt.figure(1)
ax = fig.add_subplot(1, 1, 1)
v = venn3(subsets, (compare[0], compare[1], compare[2]), ax=ax)
c = venn3_circles(subsets)
# Annotation
ax.annotate('Total genes:\n' + str(len(union)),
xy=v.get_label_by_id('111').get_position() - np.array([-0.5,
0.05]),
xytext=(0,-70), ha='center', textcoords='offset points',
bbox=dict(boxstyle='round,pad=0.5', fc='gray', alpha=0.3))
# Title
plt.title(compare[0] + ' & ' + compare[1] + ' & ' + compare[2] +
' gene expression overlap')
plt.show()因此,基本上有很多不同的案例,每个案例都是手动处理的,我想知道是否有更多的“自动化”/更少的冗长/更好的方法来做到这一点。例如,我可以从开头的三行代码片段中提取条目吗?
发布于 2014-11-18 19:39:04
也许像下面这样的东西?
values_to_sets = {a : (a in set1, a in set2, a in set3) for a in union}
sets_to_values = {}
for a, s in values_to_sets.items():
if s not in sets_to_values:
sets_to_values[s] = []
sets_to_values[s].append(a)
print(sets_to_values)这首先用一个元组来标识每一项,指示该项属于哪些设置。然后翻转字典映射,其中每个元组映射到一个属于元组中指示的集合组合的项列表。
您甚至可以将其扩展到任意数量的集合:
sets = [set1, set2, set3, set4]
values_to_sets = {a : (a in s for s in sets) for a in union}发布于 2015-01-28 12:02:28
Gordon (上面)提供的精彩代码确实很神奇,但我刚刚发现,当列表比较包含一个列表中没有唯一条目的情况时,它就不能工作了。在这种情况下,代码片段
for a, s in values_to_sets.items():
if s not in sets_to_values:
sets_to_values[s] = []无法列出所有(1/0, 1/0, 1/0)元组,因为至少有一个元组s不存在于values_to_sets.items()中。我不确定是否有一个好的和一般的解决办法,但我发现简单地删除上面的最后两行,并替换
sets_to_values = {}..。为了..。
sets_to_values = {(1, 0, 0): [], (0, 1, 0): [], (1, 1, 0): [],
(0, 0, 1): [], (1, 0, 1): [], (0, 1, 1): [],
(1, 1, 1): []}..。就能做到这一点。万一有人碰巧碰到了这个线程,解决方案现在应该更完整了!
https://codereview.stackexchange.com/questions/64635
复制相似问题