Phylogenetics
Adapted from Wikipedia · Discoverer experience
Phylogenetics is the study of the evolutionary history of life. It looks at things like DNA, proteins, and physical features to figure out how different living things are related. Scientists use this information to draw diagrams called phylogenetic trees, which show how species might have evolved from common ancestors.
These trees have points that represent different animals, plants, or even fossils. Some trees show a common ancestor, while others just show relationships without indicating where they started.
Phylogenetics helps us understand biodiversity, evolution, and how species interact with their environment. It is also important in areas like cancer research, where it helps track how tumors change over time. In drug discovery, it helps scientists find useful traits in different species, such as compounds in animal venoms that can be turned into medicines.
Phylogenetics is even used in forensic science to analyze DNA evidence. For example, it helps track differences in HIV genes to see how the virus spreads between people. However, it cannot prove who gave the virus to whom, only how related different samples are.
Taxonomy and classification
Main article: Taxonomy
Taxonomy is the science of naming and grouping living things. Long ago, a scientist named Carolus Linnaeus created a system to classify organisms based on their physical features. Today, scientists often use DNA to help classify living things, in addition to looking at their physical traits.
There are different ways scientists group organisms. Some focus just on how similar organisms look, while others try to show how they are related through evolution. Each method helps us understand the connections between different types of life on Earth.
Inference of a phylogenetic tree
Main article: Computational phylogenetics
Scientists use special computer programs to figure out how different plants and animals are related. They look at things like the code inside their cells and the shapes of their bodies. These programs help them decide which creatures share a common ancestor by using rules to find the best matches.
A long time ago, before 1950, scientists would just tell stories about how animals might be related. But now, they have better ways to make sure their ideas are right. One old way was to look at how similar creatures looked on the outside, but this method isn’t used much anymore. Instead, scientists now use DNA and other data to build family trees that show how all living things are connected.
Impacts of taxon sampling
In phylogenetic analysis, scientists pick a smaller group of organisms to study the evolutionary history of a larger group. This is called stratified sampling or clade-based sampling. Choosing the right organisms to study is very important because it can be hard to look at every single species, and computers can only handle so much data at once. If the wrong organisms are chosen, the results might not be correct.
There is some debate about whether it is better to study more organisms or to look at more details from each organism. Studying more details from each organism often gives better results. A graphic shows that, in most cases, looking at more details from fewer organisms gives more accurate results than looking at fewer details from more organisms. This was tested using different methods to build family trees of life.
History
The word "phylogeny" comes from a German word used by Haeckel in 1866. Scientists have studied how living things are related for a long time, going back to Aristotle.
A big idea in the past was Ernst Haeckel's theory that a baby's growth follows the steps of its ancestors' lives. But scientists later found this was not true. They learned that while babies do share traits with their ancestors, you can't always see the whole history just by watching how a baby grows.
Many important ideas helped shape our understanding of how species are related:
- In the 1300s, a principle said we should look for the simplest answers.
- In 1763, a way to measure how likely something is happened.
- In 1809, a French scientist talked about how species change over time.
- In 1837, Charles Darwin drew the first tree showing how species might be related.
- In 1840, an American scientist made one of the earliest "Tree of Life" drawings.
Many other scientists added new ideas and methods over the years to help us understand and show these relationships better.
-
14th century, lex parsimoniae (parsimony principle), William of Ockam, English philosopher, theologian, and Franciscan friar, but the idea actually goes back to Aristotle, as a precursor concept. He introduced the concept of Occam's razor, which is the problem solving principle that recommends searching for explanations constructed with the smallest possible set of elements. Though he did not use these exact words, the principle can be summarized as "Entities must not be multiplied beyond necessity." The principle advocates that when presented with competing hypotheses about the same prediction, one should prefer the one that requires fewest assumptions.
-
1763, Bayesian probability, Rev. Thomas Bayes, a precursor concept. Bayesian probability began a resurgence in the 1950s, allowing scientists in the computing field to pair traditional Bayesian statistics with other more modern techniques. It is now used as a blanket term for several related interpretations of probability as an amount of epistemic confidence.
-
18th century, Pierre Simon (Marquis de Laplace), perhaps first to use ML (maximum likelihood), precursor concept. His work gave way to the Laplace distribution, which can be directly linked to least absolute deviations.
-
1809, evolutionary theory, Philosophie Zoologique, Jean-Baptiste de Lamarck, precursor concept, foreshadowed in the 17th century and 18th century by Voltaire, Descartes, and Leibniz, with Leibniz even proposing evolutionary changes to account for observed gaps suggesting that many species had become extinct, others transformed, and different species that share common traits may have at one time been a single race, also foreshadowed by some early Greek philosophers such as Anaximander in the 6th century BC and the atomists of the 5th century BC, who proposed rudimentary theories of evolution
-
1837, Darwin's notebooks show an evolutionary tree
-
1840, American Geologist Edward Hitchcock published what is considered to be the first paleontological "Tree of Life". Many critiques, modifications, and explanations would follow.
-
1843, distinction between homology and analogy (the latter now referred to as homoplasy), Richard Owen, precursor concept. Homology is the term used to characterize the similarity of features that can be parsimoniously explained by common ancestry. Homoplasy is the term used to describe a feature that has been gained or lost independently in separate lineages over the course of evolution.
-
1858, Paleontologist Heinrich Georg Bronn (1800–1862) published a hypothetical tree to illustrating the paleontological "arrival" of new, similar species. following the extinction of an older species. Bronn did not propose a mechanism responsible for such phenomena, precursor concept.
-
1858, elaboration of evolutionary theory, Darwin and Wallace, also in Origin of Species by Darwin the following year, precursor concept.
-
1866, Ernst Haeckel, first publishes his phylogeny-based evolutionary tree, precursor concept. Haeckel introduces the now-disproved recapitulation theory. He introduced the term "Cladus" as a taxonomic category just below subphylum.
-
1893, Dollo's Law of Character State Irreversibility, precursor concept. Dollo's Law of Character State Irreversibility states that "an organism never comes back exactly to its previous state due to the indestructible nature of the past, it always retains some trace of the transitional stages through which it has passed."
-
1912, ML (maximum likelihood recommended, analyzed, and popularized by Ronald Fisher, precursor concept. Fisher is one of the main contributors to the early 20th-century revival of Darwinism, and has been called the "greatest of Darwin's successors" for his contributions to the revision of the theory of evolution and his use of mathematics to combine Mendelian genetics and natural selection in the 20th century "modern synthesis".
-
1921, Tillyard uses term "phylogenetic" and distinguishes between archaic and specialized characters in his classification system.
-
1940, Lucien Cuénot coined the term "clade" in 1940: "terme nouveau de clade (du grec κλάδοςç, branche) [A new term clade (from the Greek word _klado_s, meaning branch)]". He used it for evolutionary branching.
-
1947, Bernhard Rensch introduced the term Kladogenesis in his German book Neuere Probleme der Abstammungslehre Die transspezifische Evolution, translated into English in 1959 as Evolution Above the Species Level (still using the same spelling).
-
1949, Jackknife resampling, Maurice Quenouille (foreshadowed in '46 by Mahalanobis and extended in '58 by Tukey), precursor concept.
-
1950, Willi Hennig's classic formalization. Hennig is considered the founder of phylogenetic systematics, and published his first works in German of this year. He also asserted a version of the parsimony principle, stating that the presence of amorphous characters in different species 'is always reason for suspecting kinship, and that their origin by convergence should not be presumed a priori'. This has been considered a foundational view of phylogenetic inference.
-
1952, William Wagner's ground plan divergence method.
-
1957, Julian Huxley adopted Rensch's terminology as "cladogenesis" with a full definition: "Cladogenesis I have taken over directly from Rensch, to denote all splitting, from subspeciation through adaptive radiation to the divergence of phyla and kingdoms." With it he introduced the word "clades", defining it as: "Cladogenesis results in the formation of delimitable monophyletic units, which may be called clades."
-
1960, Arthur Cain and Geoffrey Ainsworth Harrison coined "cladistic" to mean evolutionary relationship,
-
1963, first attempt to use ML (maximum likelihood) for phylogenetics, Edwards and Cavalli-Sforza.
-
1965
- Camin-Sokal parsimony, first parsimony (optimization) criterion and first computer program/algorithm for cladistic analysis both by Camin and Sokal.
- Character compatibility method, also called clique analysis, introduced independently by Camin and Sokal (loc. cit.) and E. O. Wilson.
-
1966
- English translation of Hennig.
- "Cladistics" and "cladogram" coined (Webster's, loc. cit.)
-
1969
- Dynamic and successive weighting, James Farris.
- Wagner parsimony, Kluge and Farris.
- CI (consistency index), Kluge and Farris.
- Introduction of pairwise compatibility for clique analysis, Le Quesne.
-
1970, Wagner parsimony generalized by Farris.
-
1971
- First successful application of ML (maximum likelihood) to phylogenetics (for protein sequences), Neyman.
- Fitch parsimony, Walter M. Fitch. These gave way to the most basic ideas of maximum parsimony. Fitch is known for his work on reconstructing phylogenetic trees from protein and DNA sequences. His definition of orthologous sequences has been referenced in many research publications.
- NNI (nearest neighbour interchange), first branch-swapping search strategy, developed independently by Robinson and Moore et al.
- ME (minimum evolution), Kidd and Sgaramella-Zonta (it is unclear if this is the pairwise distance method or related to ML as Edwards and Cavalli-Sforza call ML "minimum evolution").
-
1972, Adams consensus, Adams.
-
1976, prefix system for ranks, Farris.
-
1977, Dollo parsimony, Farris.
-
1979
- Nelson consensus, Nelson.
- MAST (maximum agreement subtree)((GAS) greatest agreement subtree), a consensus method, Gordon.
- Bootstrap, Bradley Efron, precursor concept.
-
1980, PHYLIP, first software package for phylogenetic analysis, Joseph Felsenstein. A free computational phylogenetics package of programs for inferring evolutionary trees (phylogenies). One such example tree created by PHYLIP, called a "drawgram", generates rooted trees. This image shown in the figure below shows the evolution of phylogenetic trees over time.
-
1981
-
Majority consensus, Margush and MacMorris.
-
Strict consensus, Sokal and Rohlf
first computationally efficient ML (maximum likelihood) algorithm. Felsenstein created the Felsenstein Maximum Likelihood method, used for the inference of phylogeny which evaluates a hypothesis about evolutionary history in terms of the probability that the proposed model and the hypothesized history would give rise to the observed data set.
-
-
1982
- PHYSIS, Mikevich and Farris
- Branch and bound, Hendy and Penny
-
1985
- First cladistic analysis of eukaryotes based on combined phenotypic and genotypic evidence Diana Lipscomb.
- First issue of Cladistics.
- First phylogenetic application of bootstrap, Felsenstein.
- First phylogenetic application of jackknife, Scott Lanyon.
-
1986, MacClade, Maddison and Maddison.
-
1987, neighbor-joining method Saitou and Nei
-
1988, Hennig86 (version 1.5), Farris
- Bremer support (decay index), Bremer.
-
1989
- RI (retention index), RCI (rescaled consistency index), Farris.
- HER (homoplasy excess ratio), Archie.
-
1990
- combinable components (semi-strict) consensus, Bremer.
- SPR (subtree pruning and regrafting), TBR (tree bisection and reconnection), Swofford and Olsen.
-
1991
- DDI (data decisiveness index), Goloboff.
- First cladistic analysis of eukaryotes based only on phenotypic evidence, Lipscomb.
-
1993, implied weighting Goloboff.
-
1994, reduced consensus: RCC (reduced cladistic consensus) for rooted trees, Wilkinson.
-
1995, reduced consensus RPC (reduced partition consensus) for unrooted trees, Wilkinson.
-
1996, first working methods for BI (Bayesian Inference) independently developed by Li, Mau, and Rannala and Yang and all using MCMC (Markov chain-Monte Carlo).
-
1998, TNT (Tree Analysis Using New Technology), Goloboff, Farris, and Nixon.
-
1999, Winclada, Nixon.
-
2003, symmetrical resampling, Goloboff.
-
2004, 2005, similarity metric (using an approximation to Kolmogorov complexity) or NCD (normalized compression distance), Li et al., Cilibrasi and Vitanyi.
Uses of phylogenetic analysis
Pharmacology
Phylogenetic analysis helps scientists study groups of related organisms. With better computer programs and molecular techniques, researchers can now identify species that might have useful medicines. For example, they studied plants in the Apocynaceae family, like Catharanthus, which produces vincristine, a drug that helps treat certain diseases. Modern methods let scientists look at close relatives of known species to find more of these helpful compounds.
Biodiversity
Phylogenetic analysis is also used to study biodiversity, especially in fungi. It helps scientists understand how different species are related and predict how they might change in the future. New imaging and analysis techniques help discover more genetic links, which can guide efforts to protect rare species and support healthy ecosystems around the world.
Infectious disease epidemiology
Data from the full genetic makeup of pathogens during disease outbreaks can give clues about how infections spread and help plan public health responses. Recent studies look at genetic data alone to understand transmission patterns using phylodynamics, which examines the properties of pathogen family trees. Coalescent theory, which studies probabilities in these trees, has also been used for epidemiology. The way these trees are shaped can show different spread patterns, helping scientists understand how diseases move through populations.
The connections between hosts during outbreaks affect how diseases spread, and understanding these patterns helps with management. Pathogens spreading through different contact networks, like chains or networks with super-spreaders, show different mutation patterns in their family trees. Researchers have studied these tree shapes to classify outbreak types. These properties help create tools to analyze real outbreaks, and predictions often match known data.
Different spread networks lead to different tree shapes. Researchers simulated bacterial genome evolution across three outbreak types—homogeneous, super-spreading, and chain-like—and measured five tree shape metrics. Figures show clear differences in tree topology based on the host contact network.
Super-spreader networks create trees with higher imbalance, longer patterns, lower Δw, and deeper trees compared to homogeneous networks. Trees from chain-like networks are less variable, deeper, more imbalanced, and narrower.
Scatter plots can show relationships between variables like the number of infected people and time since infection, helping identify trends. Box plots can display data ranges and help spot important features in transmission patterns.
Linguistic and Cultural Phylogenetics
Overview
Phylogenetic methods are also used in fields outside biology, like language and culture. Some researchers think languages and cultural forms may change in similar ways to biological species. By studying these changes, scientists can build family trees or networks to show how they evolved from common ancestors.
Languages
Languages work well for phylogenetic analysis. Just like biological species, languages share similarities because they come from a common ancestor. When groups of people separate, their languages change, forming new ones. Languages that split recently look more alike than those that split long ago. These shared words, called cognates, help scientists build datasets to study language family trees. Languages with more shared cognates are usually more closely related.
Phylogenetic methods are part of quantitative comparative linguistics. Historians, archaeologists, and anthropologists use these models to study past populations. Language changes can hint at when and where people moved. Family trees can be dated using known history to estimate when older splits happened and where people traveled. This has been done for language families like Indo-European, African Bantu, Austronesian, and Australian Pama-Nyungan.
Culture
Cultural forms may also change in ways similar to biological species, and phylogenetic methods help study their history. Examples include manuscripts, stories, rituals, and objects. In archaeology, these methods have been used to study stone tools and ceramics.
Phylogenetic methods can test ideas about cultural adaptation by using language family trees to remove the effect of shared history. Examples include studies of family structures, rituals, and political systems.
Issues and Criticisms
Cultural and language changes aren’t always passed down directly; sometimes they mix sideways, like borrowing words from neighbors. Linguists focus on words and grammar that change less often. Mixing between cultures is common, and repeated contact can make cultures look similar. Some traits may develop separately in different places, leading to wrong ideas about shared history. Some cultures change faster than others, which can also confuse family trees. Creating datasets for these studies needs skill, and modern computer methods help show uncertainty in the results.
Methods and models
Bayesian phylogenetic methods use probability to create family trees that fit the data best. Tools like BEAST and MrBayes are used in biology, language, and culture studies because they can model different change scenarios and show uncertainty. Another approach measures distances between groups and uses algorithms like neighbor joining to show possible histories. NeighbourNet is one model used to study mixed inheritance in both biology and culture.
Main article: Comparative phylogenetic methods
Images
Related articles
This article is a child-friendly adaptation of the Wikipedia article on Phylogenetics, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia