bdms.tree#

Birth-death-mutation-sampling (BDMS) process simulation.

Example

>>> import bdms

Initialize a tree with a single root node.

>>> tree = bdms.Tree()
>>> print(tree)

--0

Evolve the tree for a one time unit with default parameters.

>>> tree.evolve(1.0, seed=0)
>>> print(tree)

      /-2
     |
-- /-|      /-6
     |   /-|
      \-|   \-7
        |
         \-5

Module Contents#

Classes#

TreeNode

A tree generated by a BDMS process. Subclasses ete3.TreeNode.

exception bdms.tree.TreeError(value='')#

Bases: ete3.coretype.tree.TreeError

Inheritance diagram of bdms.tree.TreeError

A problem occurred during a TreeNode operation

add_note()#

Exception.add_note(note) – add a note to the exception

with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class bdms.tree.TreeNode(t=0, state=None, state_attr='state', **kwargs)#

Bases: ete3.Tree

Inheritance diagram of bdms.tree.TreeNode

A tree generated by a BDMS process. Subclasses ete3.TreeNode.

Parameters:
  • t (float) – Time of this node.

  • state (Hashable) – State of this node.

  • state_attr (str) – Name of the node attribute to store the state in.

  • kwargs (Any) – Keyword arguments passed to ete3.TreeNode initializer.

t: float#

Time of the node.

state_attr: str#

Name of the node attribute to store the state in.

event: str | None#

Event at this node.

n_mutations: int = 0#

Number of mutations on the branch above this node (zero unless the tree has been pruned above this node, removing mutation event nodes).

evolve(t, birth_process=poisson.ConstantProcess(1), death_process=poisson.ConstantProcess(0), mutation_process=poisson.ConstantProcess(0), mutator=mutators.DiscreteMutator((None,), np.array([[1]])), birth_mutation_prob=0.0, min_survivors=1, capacity=1000, capacity_method=None, init_population=1, seed=None, verbose=False)#

Evolve for time \(\Delta t\).

Parameters:
  • t (float) – Evolve for a duration of \(t\) time units.

  • birth_process (Process) – Birth process function.

  • death_process (Process) – Death process function.

  • mutation_process (Process) – Mutation process function.

  • mutator (Mutator) – Generator of mutation effects at mutation events (and possibly on daughters of birth events if birth_mutation_prob > 0).

  • birth_mutation_prob (float | Callable[[Any], float]) – Probability of a mutation event for each daughter node of a birth event. If a callable, then it should take a node attribute as an argument and return a probability.

  • min_survivors (int) – Minimum number of survivors. If the simulation finishes with fewer than this number of survivors, then a TreeError is raised.

  • capacity (int) – Population carrying capacity.

  • capacity_method (Literal[birth, death, hard] | None) – Method to enforce population carrying capacity. If None, then a TreeError is raised if the population exceeds the carrying capacity. If "stop", then the simulation stops when the population reaches the carrying capacity. If "birth", then the birth rate is logistically modulated such that the process is critical when the population is at carrying capacity. If "death", then the death rate is logistically modulated such that the process is critical when the population is at carrying capacity. If "hard", then a random individual is chosen to die whenever a birth event results in carrying capacity being exceeded.

  • init_population (int) – Initial population size.

  • seed (int | Generator | None) – A seed to initialize the random number generation. If None, then fresh, unpredictable entropy will be pulled from the OS. If an int, then it will be used to derive the initial state. If a numpy.random.Generator, then it will be used directly.

  • verbose (bool) – Flag to indicate whether to print progress information.

Raises:
  • TreeError – If the tree is not a root node, or if the tree has already evolved, or if the number of survivors is less than min_survivors, or if the population exceeds capacity with capacity_method=None.

  • ValueError – If init_population is greater than capacity or if the event processes refer to different node attributes.

Return type:

None

sample_survivors(n=None, p=1.0, seed=None)#

Choose \(n\) survivor leaves from the tree, or each survivor leaf with probability \(p\), to mark as sampled (via the event attribute).

Parameters:
  • n (int | None) – Number of leaves to sample.

  • p (float | None) – Probability of sampling a leaf.

  • seed (int | Generator | None) – A seed to initialize the random number generation. If None, then fresh, unpredictable entropy will be pulled from the OS. If an int, then it will be used to derive the initial state. If a numpy.random.Generator, then it will be used directly.

Raises:

ValueError – If the tree has already been sampled below this node, or if neither n nor p is specified.

Return type:

None

slice(t, attr='x')#
Parameters:
  • t (float) – Slice the tree at time \(t\).

  • attr (str) – Attribute to extract from slice.

Returns:

List of attribute attr values at time \(t\) for all lineages alive at that time.

Raises:

ValueError – If the tree has not evolved or has already been pruned below this node, or if the tree has not been sampled below this node, or if t is before the root time or after the tree end time.

Return type:

list[Any]

prune_unsampled()#

Prune the tree to the subtree subtending the sampled leaves, removing unobserved subtrees.

Raises:
  • ValueError – If the tree has not been sampled below this node, or if the tree has already been pruned below this node.

  • TreeError – If no leaves were sampled.

Return type:

None

remove_mutation_events()#

Remove unifurcating mutation event nodes, preserving branch length, and annotate mutation counts in child node n_mutations attribute.

Raises:

ValueError – If the tree has not been pruned below this node with prune().

Return type:

None

render(file_name, color_by='state', color_map=None, mode='r', scale=None, **kwargs)#

A thin wrapper around ete3.TreeNode.render() that adds some custom decoration and a color bar. See also ETE’s tree rendering tutorial and linked API docs pages there.

If tree is not pruned (or is pruned without removing mutation events), then branches are colored according to the attribute specified by color_by, extinct lineages are indicated as dotted branches, unsampled non-extint lineages are indicated as solid branches, and sampled lineages are indicated as thick solid branches. Sampled leaves are indicated with a circle.

If tree is pruned without retaining mutation events, then nodes are colored according to the attribute specified by color_by, branches are annotated above with branch length (in black text) and below with number of mutations (in green text).

Parameters:
  • file_name (str) – Filename to save the rendered tree to. Use "%%inline" to render inline in a notebook.

  • color_by (str | None) – If not None, color tree by this numerical node attribute.

  • color_map (Mapping[Any, str] | None) – mapping from node attribute values to color names or hex codes.

  • mode (Literal[c, r]) – "c" for circular tree, "r" for rectangular tree.

  • scale (float | None) – Scale branch lengths by this factor (ignored if a tree_style is passed in kwargs). If None, then the scale is chosen automatically.

  • kwargs (Any) – Keyword arguments to pass to ete3.TreeNode.render().

Returns:

The return value of ete3.TreeNode.render().

Return type:

Any

add_feature(pr_name, pr_value)#

Add or update a node’s feature.

add_features(**features)#

Add or update several features.

del_feature(pr_name)#

Permanently deletes a node’s feature.

add_child(child=None, name=None, dist=None, support=None)#

Adds a new child to this node. If child node is not suplied as an argument, a new node instance will be created.

Parameters:
  • child (None) – the node instance to be added as a child.

  • name (None) – the name that will be given to the child.

  • dist (None) – the distance from the node to the child.

  • support (None) – the support value of child partition.

Returns:

The child node instance

remove_child(child)#

Removes a child from this node (parent and child nodes still exit but are no longer connected).

add_sister(sister=None, name=None, dist=None)#

Adds a sister to this node. If sister node is not supplied as an argument, a new TreeNode instance will be created and returned.

remove_sister(sister=None)#

Removes a sister node. It has the same effect as `TreeNode.up.remove_child(sister)`

If a sister node is not supplied, the first sister will be deleted and returned.

Parameters:

sister – A node instance

Returns:

The node removed

delete(prevent_nondicotomic=True, preserve_branch_length=False)#

Deletes node from the tree structure. Notice that this method makes ‘disappear’ the node from the tree structure. This means that children from the deleted node are transferred to the next available parent.

Parameters:
  • prevent_nondicotomic (True) – When True (default), delete function will be execute recursively to prevent single-child nodes.

  • preserve_branch_length (False) – If True, branch lengths of the deleted nodes are transferred (summed up) to its parent’s branch, thus keeping original distances among nodes.

Example:

      / C
root-|
     |        / B
      \--- H |
              \ A

> H.delete() will produce this structure:

      / C
     |
root-|--B
     |
      \ A
detach()#

Detachs this node (and all its descendants) from its parent and returns the referent to itself.

Detached node conserves all its structure of descendants, and can be attached to another node through the ‘add_child’ function. This mechanism can be seen as a cut and paste.

prune(nodes, preserve_branch_length=False)#

Prunes the topology of a node to conserve only the selected list of leaf internal nodes. The minimum number of nodes that conserve the topological relationships among the requested nodes will be retained. Root node is always conserved.

Variables:

nodes – a list of node names or node objects that should be retained

Parameters:

preserve_branch_length (False) – If True, branch lengths of the deleted nodes are transferred (summed up) to its parent’s branch, thus keeping original distances among nodes.

Examples:

t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1)
t1.prune(['A', 'B'])


#                /-A
#          /D /C|
#       /F|      \-B
#      |  |
#    /H|   \-E
#   |  |                        /-A
#-root  \-G                 -root
#   |                           \-B
#   |   /-I
#    \K|
#       \-J



t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1)
t1.prune(['A', 'B', 'C'])

#                /-A
#          /D /C|
#       /F|      \-B
#      |  |
#    /H|   \-E
#   |  |                              /-A
#-root  \-G                  -root- C|
#   |                                 \-B
#   |   /-I
#    \K|
#       \-J



t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1)
t1.prune(['A', 'B', 'I'])


#                /-A
#          /D /C|
#       /F|      \-B
#      |  |
#    /H|   \-E                    /-I
#   |  |                      -root
#-root  \-G                      |   /-A
#   |                             \C|
#   |   /-I                          \-B
#    \K|
#       \-J

t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1)
t1.prune(['A', 'B', 'F', 'H'])

#                /-A
#          /D /C|
#       /F|      \-B
#      |  |
#    /H|   \-E
#   |  |                              /-A
#-root  \-G                -root-H /F|
#   |                                 \-B
#   |   /-I
#    \K|
#       \-J
swap_children()#

Swaps current children order.

get_children()#

Returns an independent list of node’s children.

get_sisters()#

Returns an independent list of sister nodes.

iter_leaves(is_leaf_fn=None)#

Returns an iterator over the leaves under this node.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

get_leaves(is_leaf_fn=None)#

Returns the list of terminal nodes (leaves) under this node.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

iter_leaf_names(is_leaf_fn=None)#

Returns an iterator over the leaf names under this node.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

get_leaf_names(is_leaf_fn=None)#

Returns the list of terminal node names under the current node.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

iter_descendants(strategy='levelorder', is_leaf_fn=None)#

Returns an iterator over all descendant nodes.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

get_descendants(strategy='levelorder', is_leaf_fn=None)#

Returns a list of all (leaves and internal) descendant nodes.

Parameters:

is_leaf_fn (None) – See TreeNode.traverse() for documentation.

traverse(strategy='levelorder', is_leaf_fn=None)#

Returns an iterator to traverse the tree structure under this node.

Parameters:
  • strategy ("levelorder") – set the way in which tree will be traversed. Possible values are: “preorder” (first parent and then children) ‘postorder’ (first children and the parent) and “levelorder” (nodes are visited in order from root to leaves)

  • is_leaf_fn (None) – If supplied, is_leaf_fn function will be used to interrogate nodes about if they are terminal or internal. is_leaf_fn function should receive a node instance as first argument and return True or False. Use this argument to traverse a tree by dynamically collapsing internal nodes matching is_leaf_fn.

iter_prepostorder(is_leaf_fn=None)#

Iterate over all nodes in a tree yielding every node in both pre and post order. Each iteration returns a postorder flag (True if node is being visited in postorder) and a node instance.

iter_ancestors()#

versionadded: 2.2

Iterates over the list of all ancestor nodes from current node to the current tree root.

get_ancestors()#

versionadded: 2.2

Returns the list of all ancestor nodes from current node to the current tree root.

describe()#

Prints general information about this node and its connections.

write(features=None, outfile=None, format=0, is_leaf_fn=None, format_root_node=False, dist_formatter=None, support_formatter=None, name_formatter=None, quoted_node_names=False)#

Returns the newick representation of current node. Several arguments control the way in which extra data is shown for every node:

Parameters:
  • features – a list of feature names to be exported using the Extended Newick Format (i.e. features=[“name”, “dist”]). Use an empty list to export all available features in each node (features=[])

  • outfile – writes the output to a given file

  • format – defines the newick standard used to encode the tree. See tutorial for details.

  • format_root_node (False) – If True, it allows features and branch information from root node to be exported as a part of the newick text string. For newick compatibility reasons, this is False by default.

  • is_leaf_fn – See TreeNode.traverse() for documentation.

Example:

t.write(features=["species","name"], format=1)
get_tree_root()#

Returns the absolute root node of current tree structure.

get_common_ancestor(*target_nodes, **kargs)#

Returns the first common ancestor between this node and a given list of ‘target_nodes’.

Examples:

t = tree.Tree("(((A:0.1, B:0.01):0.001, C:0.0001):1.0[&&NHX:name=common], (D:0.00001):0.000001):2.0[&&NHX:name=root];")
A = t.get_descendants_by_name("A")[0]
C = t.get_descendants_by_name("C")[0]
common =  A.get_common_ancestor(C)
print common.name
iter_search_nodes(**conditions)#

Search nodes in an iterative way. Matches are yielded as they are being found. This avoids needing to scan the full tree topology before returning the first matches. Useful when dealing with huge trees.

search_nodes(**conditions)#

Returns the list of nodes matching a given set of conditions.

Example:

tree.search_nodes(dist=0.0, name="human")
get_leaves_by_name(name)#

Returns a list of leaf nodes matching a given name.

is_leaf()#

Return True if current node is a leaf.

is_root()#

Returns True if current node has no parent

get_distance(target, target2=None, topology_only=False)#

Returns the distance between two nodes. If only one target is specified, it returns the distance between the target and the current node.

Parameters:
  • target – a node within the same tree structure.

  • target2 – a node within the same tree structure. If not specified, current node is used as target2.

  • topology_only (False) – If set to True, distance will refer to the number of nodes between target and target2.

Returns:

branch length distance between target and target2. If topology_only flag is True, returns the number of nodes between target and target2.

get_farthest_node(topology_only=False)#

Returns the node’s farthest descendant or ancestor node, and the distance to it.

Parameters:

topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances.

Returns:

A tuple containing the farthest node referred to the current node and the distance to it.

get_farthest_leaf(topology_only=False, is_leaf_fn=None)#

Returns node’s farthest descendant node (which is always a leaf), and the distance to it.

Parameters:

topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances.

Returns:

A tuple containing the farthest leaf referred to the current node and the distance to it.

get_closest_leaf(topology_only=False, is_leaf_fn=None)#

Returns node’s closest descendant leaf and the distance to it.

Parameters:

topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances.

Returns:

A tuple containing the closest leaf referred to the current node and the distance to it.

get_midpoint_outgroup()#

Returns the node that divides the current tree into two distance-balanced partitions.

populate(size, names_library=None, reuse_names=False, random_branches=False, branch_range=(0, 1), support_range=(0, 1))#

Generates a random topology by populating current node.

Parameters:
  • names_library (None) – If provided, names library (list, set, dict, etc.) will be used to name nodes.

  • reuse_names (False) – If True, node names will not be necessarily unique, which makes the process a bit more efficient.

  • random_branches (False) – If True, branch distances and support values will be randomized.

  • branch_range ((0,1)) – If random_branches is True, this range of values will be used to generate random distances.

  • support_range ((0,1)) – If random_branches is True, this range of values will be used to generate random branch support values.

set_outgroup(outgroup)#

Sets a descendant node as the outgroup of a tree. This function can be used to root a tree or even an internal node.

Parameters:

outgroup – a node instance within the same tree structure that will be used as a basal node.

unroot(mode='legacy')#

Unroots current node. This function is expected to be used on the absolute tree root node, but it can be also be applied to any other internal node. It will convert a split into a multifurcation.

Parameters:

mode ("legacy") – The value can be “legacy” or “keep”.

If value is “keep”, then function keeps the distance between the leaves by adding the distance associated to the deleted edge to the remaining edge. In the other case the distance value of the deleted edge is dropped

show(layout=None, tree_style=None, name='ETE')#

Starts an interactive session to visualize current node structure using provided layout and TreeStyle.

copy(method='cpickle')#

Returns a copy of the current node.

Variables:

method (cpickle) – Protocol used to copy the node

structure. The following values are accepted:

  • “newick”: Tree topology, node names, branch lengths and branch support values will be copied by as represented in the newick string (copy by newick string serialisation).

  • “newick-extended”: Tree topology and all node features will be copied based on the extended newick format representation. Only node features will be copied, thus excluding other node attributes. As this method is also based on newick serialisation, features will be converted into text strings when making the copy.

  • “cpickle”: The whole node structure and its content is cloned based on cPickle object serialisation (slower, but recommended for full tree copying)

  • “deepcopy”: The whole node structure and its content is copied based on the standard “copy” Python functionality (this is the slowest method but it allows to copy complex objects even if attributes point to lambda functions, etc.)

get_ascii(show_internal=True, compact=False, attributes=None)#

Returns a string containing an ascii drawing of the tree.

Parameters:
  • show_internal – includes internal edge names.

  • compact – use exactly one line per tip.

  • attributes – A list of node attributes to shown in the ASCII representation.

ladderize(direction=0)#

Sort the branches of a given tree (swapping children nodes) according to the size of each partition.

t =  Tree("(f,((d, ((a,b),c)),e));")

print t

#
#      /-f
#     |
#     |          /-d
# ----|         |
#     |     /---|          /-a
#     |    |    |     /---|
#     |    |     \---|     \-b
#      \---|         |
#          |          \-c
#          |
#           \-e

t.ladderize()
print t

#      /-f
# ----|
#     |     /-e
#      \---|
#          |     /-d
#           \---|
#               |     /-c
#                \---|
#                    |     /-a
#                     \---|
#                          \-b
sort_descendants(attr='name')#

Sort the branches of a given tree by node names. After the tree is sorted. Note that if duplicated names are present, extra criteria should be added to sort nodes.

get_cached_content(store_attr=None, container_type=set, leaves_only=True, _store=None)#

Returns a dictionary pointing to the preloaded content of each internal node under this tree. Such a dictionary is intended to work as a cache for operations that require many traversal operations.

Parameters:
  • store_attr (None) – Specifies the node attribute that should be cached (i.e. name, distance, etc.). When none, the whole node instance is cached.

  • _store – (internal use)

robinson_foulds(t2, attr_t1='name', attr_t2='name', unrooted_trees=False, expand_polytomies=False, polytomy_size_limit=5, skip_large_polytomies=False, correct_by_polytomy_size=False, min_support_t1=0.0, min_support_t2=0.0)#

Returns the Robinson-Foulds symmetric distance between current tree and a different tree instance.

Parameters:
  • t2 – reference tree

  • attr_t1 (name) – Compare trees using a custom node attribute as a node name.

  • attr_t2 (name) – Compare trees using a custom node attribute as a node name in target tree.

  • unrooted_trees (False) – If True, consider trees as unrooted.

  • expand_polytomies (False) – If True, all polytomies in the reference and target tree will be expanded into all possible binary trees. Robinson-foulds distance will be calculated between all tree combinations and the minimum value will be returned. See also, NodeTree.expand_polytomy().

Returns:

(rf, rf_max, common_attrs, names, edges_t1, edges_t2, discarded_edges_t1, discarded_edges_t2)

compare(ref_tree, use_collateral=False, min_support_source=0.0, min_support_ref=0.0, has_duplications=False, expand_polytomies=False, unrooted=False, max_treeko_splits_to_be_artifact=1000, ref_tree_attr='name', source_tree_attr='name')#

compare this tree with another using robinson foulds symmetric difference and number of shared edges. Trees of different sizes and with duplicated items allowed.

returns: a Python dictionary with results

iter_edges(cached_content=None)#

New in version 2.3.

Iterate over the list of edges of a tree. Each edge is represented as a tuple of two elements, each containing the list of nodes separated by the edge.

get_edges(cached_content=None)#

New in version 2.3.

Returns the list of edges of a tree. Each edge is represented as a tuple of two elements, each containing the list of nodes separated by the edge.

standardize(delete_orphan=True, preserve_branch_length=True)#

New in version 2.3.

process current tree structure to produce a standardized topology: nodes with only one child are removed and multifurcations are automatically resolved.

get_topology_id(attr='name')#

New in version 2.3.

Returns the unique ID representing the topology of the current tree. Two trees with the same topology will produce the same id. If trees are unrooted, make sure that the root node is not binary or use the tree.unroot() function before generating the topology id.

This is useful to detect the number of unique topologies over a bunch of trees, without requiring full distance methods.

The id is, by default, calculated based on the terminal node’s names. Any other node attribute could be used instead.

convert_to_ultrametric(tree_length=None, strategy='balanced')#

Converts a tree into ultrametric topology (all leaves must have the same distance to root). Note that, for visual inspection of ultrametric trees, node.img_style[“size”] should be set to 0.

check_monophyly(values, target_attr, ignore_missing=False, unrooted=False)#

Returns True if a given target attribute is monophyletic under this node for the provided set of values.

If not all values are represented in the current tree structure, a ValueError exception will be raised to warn that strict monophyly could never be reached (this behaviour can be avoided by enabling the ignore_missing flag.

Parameters:
  • values – a set of values for which monophyly is expected.

  • target_attr – node attribute being used to check monophyly (i.e. species for species trees, names for gene family trees, or any custom feature present in the tree).

  • ignore_missing (False) – Avoid raising an Exception when missing attributes are found.

Parameters:

unrooted (False) – If True, tree will be treated as unrooted, thus allowing to find monophyly even when current outgroup is splitting a monophyletic group.

Returns:

the following tuple IsMonophyletic (boolean), clade type (‘monophyletic’, ‘paraphyletic’ or ‘polyphyletic’), leaves breaking the monophyly (set)

get_monophyletic(values, target_attr)#

New in version 2.2.

Returns a list of nodes matching the provided monophyly criteria. For a node to be considered a match, all target_attr values within and node, and exclusively them, should be grouped.

Parameters:
  • values – a set of values for which monophyly is expected.

  • target_attr – node attribute being used to check monophyly (i.e. species for species trees, names for gene family trees).

expand_polytomies(map_attr='name', polytomy_size_limit=5, skip_large_polytomies=False)#

New in version 2.3.

Given a tree with one or more polytomies, this functions returns the list of all trees (in newick format) resulting from the combination of all possible solutions of the multifurcated nodes.

http://ajmonline.org/2010/darwin.php

resolve_polytomy(default_dist=0.0, default_support=0.0, recursive=True)#

Resolve all polytomies under current node by creating an arbitrary dicotomic structure among the affected nodes. This function randomly modifies current tree topology and should only be used for compatibility reasons (i.e. programs rejecting multifurcated node in the newick representation).

Parameters:
  • default_dist (0.0) – artificial branch distance of new nodes.

  • default_support (0.0) – artificial branch support of new nodes.

  • recursive (True) – Resolve any polytomy under this node. When False, only current node will be checked and fixed.

cophenetic_matrix()#

Generate a cophenetic distance matrix of the treee to standard output

The cophenetic matrix <https://en.wikipedia.org/wiki/Cophenetic> is a matrix representation of the distance between each node.

if we have a tree like

—-A

_____________|y | | | —-B

________|z
—-C
|____________|x —–D
|______|w


—–E

Where w,x,y,z are internal nodes. d(A,B) = d(y,A) + d(y,B) and d(A, E) = d(z,A) + d(z, E) = {d(z,y) + d(y,A)} + {d(z,x) + d(x,w) + d(w,E)}

We use an idea inspired by the ete3 team: https://gist.github.com/jhcepas/279f9009f46bf675e3a890c19191158b :

For each node find its path to the root.

e.g.

A -> A, y, z E -> E, w, x,z

and make these orderless sets. Then we XOR the two sets to only find the elements that are in one or other sets but not both. In this case A, E, y, x, w.

The distance between the two nodes is the sum of the distances from each of those nodes to the parent

One more optimization: since the distances are symmetric, and distance to itself is zero we user itertools.combinations rather than itertools.permutations. This cuts our computes from theta(n^2) 1/2n^2 - n (= O(n^2), which is still not great, but in reality speeds things up for large trees).

For this tree, we will return the two dimensional array:

A B C D E

A 0 d(A-y) + d(B-y) d(A-z) + d(C-z) d(A-z) + d(D-z) d(A-z) + d(E-z) B d(B-y) + d(A-y) 0 d(B-z) + d(C-z) d(B-z) + d(D-z) d(B-z) + d(E-z) C d(C-z) + d(A-z) d(C-z) + d(B-z) 0 d(C-x) + d(D-x) d(C-x) + d(E-x) D d(D-z) + d(A-z) d(D-z) + d(B-z) d(D-x) + d(C-x) 0 d(D-w) + d(E-w) E d(E-z) + d(A-z) d(E-z) + d(B-z) d(E-x) + d(C-x) d(E-w) + d(D-w) 0

We will also return the one dimensional array with the leaves in the order in which they appear in the matrix (i.e. the column and/or row headers).

Parameters:

filename – the optional file to write to. If not provided, output will be to standard output

Returns:

two-dimensional array and a one dimensional array

add_face(face, column, position='branch-right')#

Add a fixed face to the node. This type of faces will be always attached to nodes, independently of the layout function.

Parameters:
  • face – a Face or inherited instance

  • column – An integer number starting from 0

  • position ("branch-right") – Posible values are: “branch-right”, “branch-top”, “branch-bottom”, “float”, “aligned”

set_style(node_style)#

Set ‘node_style’ as the fixed style for the current node.

static from_parent_child_table(parent_child_table)#

Converts a parent-child table into an ETE Tree instance.

Parameters:

parent_child_table – a list of tuples containing parent-child relationships. For example: [(“A”, “B”, 0.1), (“A”, “C”, 0.2), (“C”, “D”, 1), (“C”, “E”, 1.5)]. Where each tuple represents: [parent, child, child-parent-dist]

Returns:

A new Tree instance

Example:

>>> tree = Tree.from_parent_child_table([("A", "B", 0.1), ("A", "C", 0.2), ("C", "D", 1), ("C", "E", 1.5)])
>>> print tree
static from_skbio(skbio_tree, map_attributes=None)#

Converts a scikit-bio TreeNode object into ETE Tree object.

Parameters:
  • skbio_tree – a scikit bio TreeNode instance

  • map_attributes (None) – A list of attribute nanes in the scikit-bio tree that should be mapped into the ETE tree instance. (name, id and branch length are always mapped)

Returns:

A new Tree instance

Example:

>>> tree = Tree.from_skibio(skbioTree, map_attributes=["value"])