hypertree


Tree Class Documentation

The Tree class is designed to manage and manipulate phylogenetic trees using the treeswift package, while offering additional utilities for embedding, logging, and tree operations. It provides methods to load trees from files, compute distances, normalize branch lengths, and embed trees in Euclidean or hyperbolic spaces.

This class is particularly useful for handling large-scale tree data, embedding them into various geometric spaces, and generating visualizations, making it an essential tool for researchers working in phylogenetics or comparative genomics.

Usage

The Tree class supports two modes of initialization:

  1. From a file path (to load a tree).

  2. From an existing treeswift.Tree object with a custom name.

Example: Initialize from a file path

from your_package import tree_collections

tree = tree_collections.Tree('path/to/treefile.nwk’)

Example: Initialize with a name and treeswift.Tree object

import treeswift as ts

t = ts.read_tree_newick('path/to/treefile.nwk’)

tree = tree_collections.Tree('Tree_Name’, t)

Logging

The class supports logging for debugging and tracing execution. By setting enable_logging=True, a log file will be created in the specified directory (default: as configured in your package’s config)

Example: Enable logging when initializing the tree

tree = tree_collections.Tree('path/to/treefile.nwk’, enable_logging=True)

Distance Matrix and Tree Operations

Once a tree is loaded, the Tree class provides functions to compute key properties like the distance matrix, diameter, and terminal node names.

  1. distance_matrix(): Computes the pairwise distance matrix between terminal nodes.

  2. diameter(): Computes the diameter of the tree, defined as the longest path between any two terminal nodes.

  3. terminal_names(): Returns a list of terminal node names.

Example: Compute distance matrix and normalize the tree

terminals = tree.terminal_names()

distance_matrix = tree.distance_matrix()

diameter = tree.diameter()

Tree Normalization

The normalize() method scales the branch lengths of the tree such that the maximum distance (tree diameter) is set to 1. This is particularly useful before performing embeddings, to ensure consistent scaling across different trees.

Example: Normalize tree to have a diameter of 1

tree.normalize()

Embedding

The embed() method allows embedding the phylogenetic tree into either Euclidean or hyperbolic geometry using different dimensions. This is useful for visualizing or analyzing trees in a geometric space. This method has several optional parameters that allow customization of the embedding process. Below is a detailed explanation of each parameter.

  • dimension (int) [Required]: Defines the number of dimensions for the embedding space. For example, dimension=2 creates a 2D embedding. Example: A 2D hyperbolic embedding is common for visualization purposes, while higher dimensions (e.g., dimension=3) might be used for more complex analyses.

  • geometry (str) [Default: 'hyperbolic’]: Specifies the geometric space to use for embedding.

    • 'hyperbolic’: Embeds the tree in hyperbolic space, which is often suitable for representing hierarchical or tree-like structures.

    • 'euclidean’: Embeds the tree in Euclidean space, more appropriate for flat, linear structures.

  • accurate (bool) [Default: False]: Determines whether to use a more accurate and optimized embedding method. The False method is quicker but less precise, while the True method uses an advanced, optimization-based approach for more accurate results.

  • total_epochs (int) [Default: 2000]: The number of epochs used for optimization during the embedding process. Relevant when accurate=True is selected. More epochs typically yield more accurate results at the cost of increased computation time.

  • max_diameter (float) [Default: 10.0]: The maximum diameter used to scale the distance matrix before embedding. This helps to ensure consistent scaling across different trees, which is especially useful for comparing multiple trees embedded in the same space.

  • initial_lr (float) [Default: 0.1]: The initial learning rate for the optimizer during the embedding process. This value controls how quickly the optimization process converges. A higher learning rate can speed up convergence but may overshoot optimal values, while a lower learning rate ensures a smoother, more gradual approach.

  • enable_save (bool) [Default: True]: Whether to save intermediate states during the embedding process. This can be useful for tracking progress or debugging, especially when using long running optimizations with accurate = True.

  • enable_movie (bool) [Default: True]: Whether to generate a visual representation (movie) of the embedding process. This can be helpful for understanding the evolution of the embedding as it progresses through optimization steps.

  • learning_rate (float, optional) [Default: None]: Specifies a fixed learning rate for the optimization process. If not provided, an adaptive learning rate is used by default. This allows more control over the optimization process.

  • scale_learning (callable, optional) [Default: None]: A custom function to dynamically adjust the scale during the optimization process. This provides a way to influence how distances are scaled throughout the optimization steps.

Example

def default_scale_learning(epoch: int, total_epochs: int, loss_list = List[float]) -> bool:

if total_epochs <= 1:

raise ValueError(“Total epochs must be greater than 1.”)

return epoch < int(0.5 * total_epochs)

Example: Embed the tree into a 3D Euclidean space

embedding = tree.embed(dimension=3, geometry='euclidean’) tree.embed(dimension=2, accurate=True)

Saving and Copying Trees

The class provides utility functions to save and copy trees:

save_tree(): Save the tree to a file in Newick format. copy(): Create a deep copy of the tree object, useful for making modifications without affecting the original tree.

Example: Save and copy a Tree

tree.save_tree(“/path/to/save/treefile.nwk”)

tree_copy = tree.copy()

Visualization

Example Workflow

tree = tree_collections.Tree('path/to/treefile.nwk’) tree.normalize() embedding = tree.embed(dimension=2, geometry='hyperbolic’)

>>> print 'Interactive Python code.’

from your_package import tree_collections

tree = tree_collections.Tree('path/to/treefile.nwk’)

import treeswift as ts

t = ts.read_tree_newick('path/to/treefile.nwk’) tree = tree_collections.Tree('Tree_Name’, t, enable_logging=True)