Add documentation about the internal workings.

This documentation is meant mainly for developers. It tries to explain the general structure of the trees and their visualization, how they are represented, who is in charge of what, what are the main code pathways, and so on.
etetoolkit · Jan 22, 2025 · 0aee76c · 0aee76c
1 parent 22bd8ed
commit 0aee76c
Show file tree

Hide file tree

Showing 13 changed files with 419 additions and 0 deletions.
diff --git a/doc/images/node_id.png b/doc/images/node_id.png
diff --git a/doc/images/preorder.png b/doc/images/preorder.png
diff --git a/doc/images/size.png b/doc/images/size.png
diff --git a/doc/images/tree.png b/doc/images/tree.png
diff --git a/doc/images/tree_parts.png b/doc/images/tree_parts.png
diff --git a/doc/images/walk.png b/doc/images/walk.png
diff --git a/doc/index.rst b/doc/index.rst
@@ -13,6 +13,7 @@ Welcome to ETE's documentation!
    about
    tutorial/index
    reference/index
+   internals/index
    faqs
 
 

diff --git a/doc/internals/index.rst b/doc/internals/index.rst
@@ -0,0 +1,11 @@
+Internals
+=========
+
+.. toctree::
+   :maxdepth: 2
+
+   internals_overview
+   internals_essentials
+   internals_api
+   internals_detailed_layout
+   internals_drawing
diff --git a/doc/internals/internals_api.rst b/doc/internals/internals_api.rst
@@ -0,0 +1,55 @@
+API
+===
+
+The server exposes a `RESTful API
+<https://en.wikipedia.org/wiki/Representational_state_transfer#Applied_to_web_services>`_,
+with the following endpoints (defined in ``explorer.py``)::
+
+  GET:
+  /api  # get info about the api endpoints
+  /trees  # get info about all the existing trees
+  /trees/<name>  # get info about the given tree
+  /trees/<name>/draw  # get graphical commands to draw the tree
+  /trees/<name>/layouts  # get available layouts for the tree
+  /trees/<name>/style  # get tree style
+  /trees/<name>/newick  # get newick representation
+  /trees/<name>/size  # get inner width and height of the full tree
+  /trees/<name>/properties  # names of extra ones defined in any node
+  /trees/<name>/nodecount  # total count of nodes and leaves
+  /trees/<name>/search  # search for nodes
+
+  PUT:
+  /trees/<name>/sort  # sort branches
+  /trees/<name>/set_outgroup  # set node as outgroup (1st child of root)
+  /trees/<name>/move  # move branch
+  /trees/<name>/remove  # prune branch
+  /trees/<name>/rename  # change the name of a node
+  /trees/<name>/edit  # edit any node property
+  /trees/<name>/clear_searches  # clear stored searches (free server memory)
+  /trees/<name>/to_dendrogram  # convert tree to dendogram (no distances)
+  /trees/<name>/to_ultrametric  # convert tree to ultrametric (equidistant leaves)
+
+  POST:
+  /trees  # add a new tree
+
+  DELETE:
+  /trees/<name>  # remove tree
+
+In addition to the api endpoints, the server has these other ones::
+
+  /  # redirects to /static/gui.html?tree=<name>
+  /help  # gives some pointers for using ete
+  /static/<path>  # all the static content, including javascript like gui.js
+
+The api can be directly queried with the browser (for some endpoints
+that accept a GET request), or with tools such as `curl
+<https://curl.se/>`_ or `httpie <https://httpie.io/>`_.
+
+The frontend uses those endpoints to draw and manipulate the trees. It
+works as a web application, which mainly translates the list of
+graphical commands coming from ``/trees/<name>/draw`` into svgs.
+
+It is possible to use the same backend and write a different frontend
+(as a desktop application, or in a different language, or using a
+different graphics library), while still taking advantage of all the
+optimizations done for the drawing.
diff --git a/doc/internals/internals_detailed_layout.rst b/doc/internals/internals_detailed_layout.rst
@@ -0,0 +1,77 @@
+Detailed Layout
+===============
+
+There are several parts to the project.
+
+The module ``ete4`` has submodules to create trees (``core/tree.pyx``)
+and parse newicks (``parser/newick.pyx``) and other formats, do
+tree-related operations (``core/operations.pyx``), and more.
+
+The ``smartview`` module contains an http server based on `bottle
+<https://bottlepy.org/>`_ (in ``explorer.py``). It exposes in an api
+all the operations that can be done to manage and represent the trees,
+and also provides access to ``gui.html``, which shows a gui on the
+browser to explore the trees. It uses the code in ``gui.js`` and all the
+other imported js modules in the same directory.
+
+It also serves an entry page with a short description and an easy way
+to upload new trees, ``upload.html`` (which uses ``upload.js``).
+
+There are tests for most of the python code in ``tests``. They can be
+run with pytest.
+
+A more complete layout with the relevant parts for tree exploration::
+
+  README.md
+  pyproject.toml  # build system (see PEP 518)
+  setup.py
+  ete4/  # the module directory
+    core/
+      tree.pyx  # the Tree class
+      operations.pyx  # tree-related operations
+      text_viz.py  # text visualization of trees
+    parser/
+      newick.pyx  # newick parser
+      nexus.py  # functions to handle trees in the nexus format
+      ete_format.py  # ete's own optimized format
+      indent.py  # parser for indented trees
+    smartview/
+      draw.py  # drawing classes and functions to represent a tree
+      explorer.py  # http server that exposes an api to interact with trees
+      layout.py
+      faces.py
+      coordinates.py
+      graphics.py
+      static/  # files served for the gui and uploading
+        gui.html  # entry point for the interactive visualization (html)
+        gui.css
+        upload.html  # landing page with the upload tree interface
+        upload.css
+        images/
+          icon.png
+          spritesheet.png
+          spritesheet.json
+        js/
+          gui.js  # entry point for the interactive visualization (js)
+          menu.js  # initialize the menus
+          draw.js  # call to the api to get drawing items and convert to svg
+          pixi.js
+          minimap.js  # handle the current tree view on the minimap
+          zoom.js
+          drag.js
+          download.js
+          contextmenu.js  # what happens when one right-clicks on the tree
+          events.js  # hotkeys, mouse events
+          search.js
+          collapse.js
+          label.js
+          tag.js
+          api.js  # handle calls to the server's api
+          upload.js  # upload trees to the server
+        external/  # where we keep a copy of external libraries
+          readme.md  # description of where to find them
+          tweakpane.min.js
+          sweetalert2.min.js
+          pixi.min.mjs
+  tests/  # tests for the existing functionality, to run with pytest
+  docs/  # documentation
diff --git a/doc/internals/internals_drawing.rst b/doc/internals/internals_drawing.rst
@@ -0,0 +1,102 @@
+Drawing
+=======
+
+This document explains the internal details about how ETE draws trees
+on the browser.
+
+The :mod:`smartview` module contains a :func:`draw` function that
+takes mainly a tree and a viewport, and can produce a list of
+graphical commands.
+
+
+Graphical Elements
+------------------
+
+The :func:`draw` function is a generator that yields graphical
+commands. They are mainly a description of basic graphic items, which
+are lists that look like::
+
+  ['hz-line', [0.0, 0.5], [8.0, 0.5], [], '']
+
+It always starts with the name of the graphical command, followed by
+its parameters. (This is similar to creating a `DSL
+<https://en.wikipedia.org/wiki/Domain-specific_language>`_ with
+functions that draw, and which will be interpreted in the frontend).
+
+
+gui.js
+------
+
+When the browser opens a tree, it will see the file
+``ete4/smartview/static/gui.html``. This is a simple web page, which
+loads ``js/gui.js`` to provide all the functionality.
+
+The code in ``gui.js`` loads many different modules to visualize and
+interact with trees. It contains an object ``view`` (with information
+on the current view of the tree) which is a sort of global variable
+repository. This is mainly used in the menus to expose and control its
+values.
+
+
+Drawing areas
+-------------
+
+They can be seen in ``gui.html``. They are::
+
+   div_tree
+   div_aligned
+   div_legend
+   div_minimap
+
+
+API calls
+---------
+
+When starting to browse a tree, we see this order of :doc:`api calls
+<internals_api>`::
+
+  /trees
+
+  /trees/tree-1/layouts
+
+  /trees/tree-1/style?active=["basic","Example+layout"]
+
+  /trees/tree-1/size
+
+  /trees/tree-1/nodecount
+
+  /trees/tree-1/properties
+
+  /static/images/spritesheet.json
+
+  /static/images/spritesheet.png
+
+  /trees/tree-1/draw?shape=rectangular&node_height_min=30&content_height_min=4&zx=1.2&zy=362.7&x=-0.3&y=-0.1&w=3.3&h=3.3&collapsed_shape=skeleton&collapsed_ids=[]&layouts=["basic","Example+layout"]&labels=[]
+
+and from that moment, when moving and zooming the tree, typically many
+similar draw calls like::
+
+  /trees/tree-1/draw?shape=rectangular&node_height_min=30&content_height_min=4&zx=1.4[...]
+
+And there are different kind of api calls made when editing or changing
+the tree in different ways.
+
+
+Code paths
+~~~~~~~~~~
+
+The initial api calls come from the following places in the code:
+
+::
+
+  gui.js
+    main
+      init_trees  # /trees
+      populate_layouts  # /trees/tree-1/layouts
+      set_tree_style  # /trees/tree-1/style?active=["basic","Example+layout"]
+      set_consistent_values  # /trees/tree-1/size
+      store_node_count  # /trees/tree-1/nodecount
+      store_node_properties  # /trees/tree-1/properties
+      init_pixi  # /static/images/spritesheet.json /static/images/spritesheet.png
+      update  # (in draw.js)
+        draw_tree  # /trees/tree-1/draw?[...]
diff --git a/doc/internals/internals_essentials.rst b/doc/internals/internals_essentials.rst
@@ -0,0 +1,134 @@
+.. currentmodule:: ete4
+
+Essentials
+==========
+
+Tree Representation
+-------------------
+
+In general, a *tree* is `a set of linked nodes
+<https://en.wikipedia.org/wiki/Tree_(data_structure)>`_ that simulates
+a hierarchical `tree structure
+<https://en.wikipedia.org/wiki/Tree_structure>`_.
+
+The trees we are interested in are `phylogenetic trees
+<https://en.wikipedia.org/wiki/Phylogenetic_tree>`_, which show the
+relationships among various biological species. Every node can have a
+*name* and other *node properties*, and is connected to other nodes by
+branches with some *length* that measures in some form the
+evolutionary distance between the nodes, and other *branch
+properties*.
+
+.. image:: ../images/tree.png
+
+In the representation that we have chosen, a :class:`Tree` is a
+structure with some content (internally stored in a dictionary
+``props`` of properties) and a list of ``children``, which can be
+viewed as trees themselves. It can have a parent (``up``), which is
+the tree that has it as a child.
+
+.. image:: ../images/tree_parts.png
+
+(This representation is quite straightforward and is based on how
+phylogenetic trees are normally described as `newicks
+<https://en.wikipedia.org/wiki/Newick_format>`_. But it has some
+drawbacks: the concepts of *node* and *tree* are blurred together, the
+*branch* is considered somehow part of the node (which is possible
+since there is only one branch per node linking to its parent), and
+there is no clear distinction between *node properties* and *branch
+properties*. Other representations may be more appropriate.)
+
+
+Size
+----
+
+A :class:`Tree` also has a ``size``, which is a tuple ``(dx, dy)``
+formed by the distance to its further leaf (including its own length),
+and the total number of descendant leaves.
+
+This concept is exploited when drawing with different representations,
+as it will help discover when a node (including its descendants) is
+visible in the current viewport.
+
+We distinguish between this size, also called ``node_size``, and the
+size occupied only by the contents of the node itself, which we call
+``content_size``.
+
+.. image:: ../images/size.png
+
+Both sizes have the same ``dy``, since the number of descendant leaves
+is the same. They differ in their ``dx``, which is just the node's
+length (distance) for ``content_size``. Since we can easily compute
+``content_size``, it is not stored separately in the tree itself.
+
+Finally, there is another distance that becomes relevant for the
+description of the node when drawing: its *branch dy* (``bdy``). It is
+the distance from the top of the node to the branch, which is computed
+so it is halfway between the line that encompasses all of its
+children.
+
+
+Traversal
+---------
+
+We can traverse all the nodes of a tree by using the tree ``traverse``
+function to get an iterator.
+
+The following code would visit the different nodes in preorder::
+
+  for node in tree.traverse():
+      ... # do things with node
+
+.. image:: ../images/preorder.png
+
+There is a more versatile way of traversing the tree, which is very
+useful for drawing: ``walk(tree)``. Its name comes from the standard
+library function ``os.walk``, which traverses a directory tree.
+
+It visits all internal nodes twice: when they appear first in
+preorder, and after having visited all their descendants. It provides
+an iterator `it` that can be manipulated to stop visiting the
+descendants of a node (by setting ``it.descend = False``). The
+iterator also provides the current node (``it.node``), says if it is
+the first time that the node is being visited (``it.first_visit``),
+and can give an id that identifies the position of the node within the
+original tree (``it.node_id``).
+
+::
+
+  import operations as ops
+
+  for it in ops.walk(tree):
+      ... # do things with  it.node
+      ... # possibly depending on  it.first_visit
+      ... # maybe using  it.node_id  too
+      ... # and maybe set  it.descend = False  to skip its children
+
+.. image:: ../images/walk.png
+
+
+node_id
+~~~~~~~
+
+The ``node_id`` is a tuple that looks like ``(0, 1, 0)`` for the node
+that comes from the root's 1st child, then its 2nd child, and then its
+1st child.
+
+A tree can be used as a list to access directly one of its nodes. The
+syntax ``tree[name]``, where ``name`` is a string, will return the
+first node whose name matches the given one. And ``tree[node_id]``,
+where `node_id` is a tuple as described before, will return the
+corresponding node at that position.
+
+.. image:: ../images/node_id.png
+
+The syntax is composable, in the sense::
+
+  tree[0,1,0] == tree[0][1,0] \
+              == tree[0,1][0] \
+              == tree[0][1][0] \
+              == tree[][0][1][0]
+
+This simplifies working with subtrees, since they can be treated as
+independent trees and are easily recovered from the original tree at
+any moment.
-Original file line number
+Diff line change
@@ Expand Up / @@ -13,6 +13,7 @@ Welcome to ETE's documentation! @@
        about
        tutorial/index
        reference/index
+       internals/index
        faqs
@@ Expand Down @@