我正在使用package ete3。我有这样的树:
((Species1_order1,(Species2_order2,Species3_order2)),Species4_order3,Species5_order5);我希望看到与树中特定节点最密切相关的叶子(这里的树是Species1_order1)。在这个例子中,最密切相关的叶子是Species2_order2/ Species3_order2和Species4_order3/Species5_order5。
代码:
tree = ete3.Tree('((Species1_order1, \
(Species2_order2, Species3_order2)), \
Species4_order3, Species5_order5);')新的例子:
tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')我得到的结果是:
A B C D E F G H I
A 0.0 2.0 3.0 4.0 6.0 6.0 6.0 8.0 8.0
B 2.0 0.0 3.0 4.0 6.0 6.0 6.0 8.0 8.0
C 3.0 3.0 0.0 3.0 5.0 5.0 5.0 7.0 7.0
D 4.0 4.0 3.0 0.0 4.0 4.0 4.0 6.0 6.0
E 6.0 6.0 5.0 4.0 0.0 2.0 4.0 6.0 6.0
F 6.0 6.0 5.0 4.0 2.0 0.0 4.0 6.0 6.0
G 6.0 6.0 5.0 4.0 4.0 4.0 0.0 4.0 4.0
H 8.0 8.0 7.0 6.0 6.0 6.0 4.0 0.0 2.0
I 8.0 8.0 7.0 6.0 6.0 6.0 4.0 2.0 0.0但是,例如,E和F与树中的A、B、C和D有一个相等的距离,在结果中它们似乎与D很接近。
一个好的矩阵结果应该是:
A B C D E F G H I
A 0 1 2 3 4 4 5 6 6
B 1 0 2 3 4 4 5 6 6
C 2 2 0 3 4 4 5 6 6
D 3 3 3 0 4 4 5 6 6
E 4 4 4 4 0 1 5 6 6
F 4 4 4 4 1 0 5 6 6
G 5 5 5 5 5 5 0 6 6
H 6 6 6 6 6 6 6 0 1
I 6 6 6 6 6 6 6 1 0不是吗?
发布于 2020-02-27 13:49:09
正如注释中所讨论的,ete3给出了一个名为Tree.get_closest_leaf的函数,但它的输出不是预期的(我不确定这个值在这里代表什么):
>>> t=ete3.Tree('((Species1_order1,(Species2_order2,Species3_order2)),Species4_order3,Species5_order5);')
>>> t.get_closest_leaf('Species2_order2')
(Tree node 'Species4_order3' (0x115b2f29), 0.0)相反,您可以得到这样的节点距离:
import ete3
import pandas as pd
def make_matrix(tree):
def get_root_path(node):
root_path = [node]
if node.up:
root_path.extend(get_root_path(node.up))
return root_path
leaves = tree.get_leaves()
leaf_ct = len(leaves)
paths = {node.name: set(get_root_path(node)) for node in leaves}
col_lbls = [leaf.name for leaf in leaves]
dist_matrix = pd.np.array([pd.np.zeros(leaf_ct)] * leaf_ct)
df = pd.DataFrame(dist_matrix, index=col_lbls, columns=col_lbls)
for node1_name, col in df.iteritems():
for node2_name in col.keys():
path = paths[node2_name].symmetric_difference(paths[node1_name])
dist = sum(node.dist for node in path)
df.at[node1_name, node2_name] = dist
df.at[node2_name, node1_name] = dist
return df注:这是一个次优的解决方案,有几个原因,但这个问题并没有要求最有效的解决方案。有关系统发育距离矩阵方法的更多信息,请参见此链接。
该解决方案还使用过高的pandas,因为它实际上只是为了方便行/列标签。移除pandas依赖项并使用本机列表进行操作并不困难。
这是输出:
>>> tree=ete3.Tree('((Species1_order1, (Species2_order2, Species3_order2)), Species4_order3, Species5_order5);')
>>> make_matrix(tree)
Species1_order1 Species2_order2 Species3_order2 Species4_order3 Species5_order5
Species1_order1 0.0 3.0 3.0 3.0 3.0
Species2_order2 3.0 0.0 2.0 4.0 4.0
Species3_order2 3.0 2.0 0.0 4.0 4.0
Species4_order3 3.0 4.0 4.0 0.0 2.0
Species5_order5 3.0 4.0 4.0 2.0 0.0对于发布的更新,我没有看到任何问题。它似乎提供了正确的结果。下面是由ete3呈现的树(我突出显示了从Interest_sequence到Rhopalosiphum_maidis_Hemiptera之间的4个跳数):

下面是对应于Interest_sequence的矩阵列:
>>> m['Interest_sequence']
Rhopalosiphum_maidis__Hemiptera 4.0
Drosophila_novamexicana__Hemiptera 5.0
Drosophila_arizonae__Hemiptera 6.0
Drosophila_navojoa__Hemiptera 6.0
Interest_sequence 0.0
Heliothis_virescens_droso_3a__nan 5.0
Mythimna_separata_droso__nan 6.0
Heliothis_virescens_droso_3i__nan 6.0
Scaptodrosophila_lebanonensis__Diptera 5.0
Mythimna_unipuncta_droso_A__nan 6.0
Xestia_c-nigrum_droso__nan 8.0
Helicoverpa_armigera_droso__nan 8.0
Mocis_latipes_droso__nan 7.0
Drosophila_busckii__Diptera 4.0
Drosophila_bipectinata__Diptera 5.0
Drosophila_mojavensis__Diptera 7.0
Drosophila_yakuba__Diptera 7.0
Drosophila_hydei__Diptera 7.0
Drosophila_serrata__Diptera 8.0
Drosophila_takahashii__Diptera 9.0
Drosophila_eugracilis__Diptera 11.0
Drosophila_ficusphila__Diptera 11.0
Drosophila_erecta__Diptera 12.0
Drosophila_melanogaster__Diptera 13.0
Sequence_A_nan__nan 14.0
Drosophila_sechellia__Diptera 15.0
Drosophila_simulans__Diptera 15.0
Drosophila_suzukii__Diptera 12.0
Drosophila_biarmipes__Diptera 12.0
Name: Interest_sequence, dtype: float64发布于 2021-03-22 15:12:16
ete3函数按预期工作,只需在树的节点上进行迭代,如下所示:
from ete3 import Tree t = Tree() # Creates an empty tree
A = t.add_child(name="A") # Adds a new child to the current tree root and returns it
B = t.add_child(name="B") # Adds a second child to the current tree root and returns it
C = A.add_child(name="C") # Adds a new child to one of the branches D =
C.add_sister(name="D") # Adds a second child to same branch as before, but using a sister as the starting point
R = A.add_child(name="R") # Adds a third child to the branch. Multifurcations are supported
# Next, I add 6 random leaves to the R branch names_library is an optional argument. If no names are provided, they will be generated randomly.
R.populate(6, names_library=["r1","r2","r3","r4","r5","r6"])
for node in t.traverse():
print(node.get_closest_leaf())注意:如果内部节点是一个终端节点,它将有两个最近的叶子,用node.get_children()获取它们
https://stackoverflow.com/questions/60434327
复制相似问题