by Javier Ortega-Hernández *1
The principle of parsimony, also known as Occam’s razor, has been widely attributed to the English Franciscan friar William of Occam (c. 1288–1348). It states Pluralitas non est ponenda sine necessitate, which translates to ‘Plurality is not to be assumed without necessity’. In other words, when one is faced with a problem or question that can have several different answers, the solution that requires the fewest assumptions is most likely to be correct, unless there is evidence that proves that it is false. Parsimony has an enduring influence in most scientific activities, as it allows researchers to make comparisons and choose between different hypotheses that aim to explain a phenomenon using the same body of evidence. The incomplete nature of the rock record and the loss of information associated with the fossilization of biological remains make palaeobiology a heavily interpretative science. Because of this uncertainty, parsimony has an important role in testing the conclusions drawn from the fossil record, and encourages consideration of alternative scenarios when applicable.
This article provides a brief overview of the application of parsimony, particularly in the field of evolutionary biology, and uses examples from the scientific literature to illustrate some of the more abstract aspects of this principle.
The simplicity criterion:
The simplicity criterion is the most widespread aspect of parsimony in research and everyday life; in the latter context, this criterion is more or less equivalent to ‘common sense’, in that it will lead to the least complicated solution required by the evidence, at least in most cases. It is particularly useful for scientists when they are interpreting the available data, because it serves as a methodological guideline for providing the best explanation possible without the need to invoke excessively complicated scenarios.
A common misconception about the application of parsimony is that the simplest explanation is likely to be the correct one. In fact, there is a difference between providing the simplest explanation, which is problematic because there is no standard definition of ‘simple’, and proposing an explanation that best fits the observed data and requires the fewest possible assumptions.
A good example of this can be found by taking a glimpse at some of the research into the cause of the Cretaceous–Palaeogene (K–Pg) mass extinction that took place 65.5 million years ago (Ma): the fateful event that eradicated almost all dinosaurs, along with several other groups of animals and plants (Fig. 1). In 1980, a team of scientists led by Nobel prizewinning physicist Luis Alvarez found that the strata that compose the K–Pg boundary in several locations around the world contain unusually high concentrations of iridium (Fig. 1A), a dense metal that is exceedingly rare in Earth’s crust; however, iridium is abundant in asteroids and other celestial objects. On the basis of this, Alvarez and his team hypothesized that a large, iridium-rich meteorite had clashed with Earth (Fig. 1B), ejecting massive amounts of this element into the atmosphere. The atmospheric debris spread worldwide, said the team, producing dramatic changes in the environment that resulted in the mass-extinction event. At the time this interpretation was met with considerable scepticism, and a number of alternatives were proposed. In 1995, for example, physicists John Ellis and David Schramm put forward an interesting case for the mass extinction having been caused by the cosmic radiation produced by a nearby supernova (Fig. 1C). This hypothesis accounted for the iridium anomaly; the explosion would have certainly had devastating effects that could have led to a mass extinction; and calculations that estimated how long a nearby cosmic explosion would take to reach our planet correlated relatively well with the distance from Earth of known supernova remains. However, a wider acceptance of the impact hypothesis has come with more recent findings, such as the discovery in several locations around the world of large impact craters that correspond to the age of the extinction event; furthermore, supporters of the supernova hypothesis have failed to find evidence of certain isotopes that the model predicted should be found alongside the iridium in the K–Pg boundary. On the basis of the simplicity criterion the impact hypothesis can be interpreted as a more likely, although not necessarily the only, cause of the K–Pg mass-extinction event. This example also helps to illustrate the difference between a simple explanation, and the best explanation supported by the evidence. It is hard to describe a massive meteorite impact or a deadly supernova as simple events, but the body of evidence that supports the impact hypothesis provides a more satisfactory explanation given the available data, and thus currently represents the most parsimonious scenario.
The other main application of parsimony, this time almost exclusive to evolutionary biology, is phylogenetic or maximum parsimony. The objective is to utilize the properties of parsimony to reconstruct the phylogenetic relationships between organisms, or in other words, to better understand their evolutionary history by analysing how the different species relate to each other. The main rationale behind phylogenetic parsimony is that, given the available data (genes, morphology, geographic distribution) for a number of different species, the evolutionary tree that requires the fewest number of character transformations is likely to be the one that best represents the relationships between those species. As with the simplicity criterion, the interpretations that result are but a reflection of the body of evidence available, so the input of new information can lead to previous hypotheses being proved or disproved.
The correct application of phylogenetic parsimony is heavily reliant on how the information is interpreted. The morphology, for example, of a species can be broken down into characters, which can be compared with those of other species and ultimately result in a hypothesis about phylogenetic relationships. Unlike approaches that take into account only the overall similarity between the different species (such as phenetics), phylogenetic parsimony emphasizes the importance of shared characters that are directly comparable between the different species (see synapomorphy below), and uses that information as the main parameter for reconstructing relationships. To better understand this process, it is necessary to define some important concepts that apply to how characters are interpreted in terms of their phylogenetic value.
Phylogenetic homology is defined as similarity due to common ancestry. In other words, two (or more) different species share homologous characters if these traits can be traced back to the last common ancestor of said species. The recognition of homologous characters is of prime importance for reconstructing the evolutionary history of any given group, because they allow researchers to recognize relationships between different species even if those species have developed drastically dissimilar morphologies as a result of living in different environments and experiencing distinct ecological pressures.
Let us use the nostrils, or nose openings, as an example of a character that has become drastically modified in some groups (Fig. 2). The nostrils have an important biological role in the tetrapods (amphibians, reptiles, birds, mammals): they are involved in the mechanisms of inhalation and exhalation of air during breathing. The presence of these structures can be traced back to the last common ancestor of all tetrapods, and thus it is possible to recognize the nostrils of all extant tetrapod species as a homologous character. These openings are generally located at the front of the snout (Figs 2A–E); however, in cetaceans (dolphins and whales) they have migrated from their original position to facilitate breathing in the aquatic environment, and are now found on top of the head as blowholes (Figs 2F). This is an impressive change, and without the context provided by the phylogenetic relationships between cetaceans and other mammals it would be hard to picture how this transformation took place. Fortunately, the evolutionary history of cetaceans contains several fossils that clearly show this precise transition. For instance, in pakicetiids (Fig. 2D), a group of terrestrial hoofed carnivorous mammals with a rather wolf-like appearance that lived during the early to middle Eocene epoch (55.8–40.4 Ma) and were distantly related to modern cetaceans, the nostrils were located at the front of the snout (Fig. 2D1), similarly to other tetrapods. Another important group, this time more closely related to modern cetaceans, is the protocetids (Fig. 2E). These extinct animals also lived during the early and middle Eocene, and had already made the transition from the terrestrial to the aquatic environment. Here, the nostrils were somewhat displaced backwards.
This example can also be used to introduce two important concepts that are intimately associated with the interpretation of homologous characters: synapomorphy and symplesiomorphy. A synapomorphy is a character that is shared by all members of a particular group, but not with the members of other closely related groups. In the case of the nostrils, even though these openings are homologous for all tetrapods, their position on the top of the head is exclusive to cetaceans (Fig. 2F), and thus represents a synapomorphic character that defines Cetacea as a group. The recognition of synapomorphic characters in extant and extinct organisms is very helpful for reconstructing their phylogenetic relationships, because it lets researchers define discrete groups. On the other hand, a symplesiomorphy, or plesiomorphic character, is an ancestral character that is shared by several species. The presence of nostrils at the front of the snout (Figs 2A–E) represents the condition inherited from the last common ancestor of all tetrapods, and thus it is a character that does not by itself provide detailed information about how the different groups of tetrapods are related to each other.
Phylogenetic homoplasy is generally viewed as the opposite of homology. Whereas homology reflects similarity due to common ancestry, homoplasy indicates superficial similarity due to analogy or common function. Homoplastic traits represent a source of analytical error in phylogenetic parsimony, because they can lead researchers to conclude that two or more species belong to the same group on the basis of a character that actually has not been inherited from their last common ancestor, but has been acquired independently. The recognition of homoplastic characters is important when reconstructing the evolutionary history of any group of organisms. Unfortunately, there are several cases in which this is easier said than done, particularly when dealing with the incomplete morphological information available from the fossil record.
There are two types of homoplasy. The first is convergent evolution, which occurs when two or more very distantly related species develop strikingly similar biological traits that were not present in their last common ancestor. To illustrate this we will look at two organisms as phylogenetically distant as possible: the Ediacaran fossil Avalofractus abaculus (Fig. 3A) and the extant Romanesco broccoli, a cultivar of Brassica oleracea (Fig. 3B). Avalofractus abaculus is a rangeomorph, one of a group of enigmatic fossil organisms with a frond-like appearance, described from the Ediacaran (approximately 565 Ma) of Newfoundland, Canada. As with many other body fossils from this age, the precise phylogenetic affinities of rangeomorphs are shrouded in mystery, but some researchers have proposed that they may be distantly related to animals and fungi. Avalofractus neatly illustrates an important aspect of this group’s morphology: a fractal organization in which each of the branches that comprise the body is a smaller version of the entire frond (Fig. 3A), and thus the fossil possesses a self-similar pattern. The precise biological significance of this organization is still a subject of debate, but it has been suggested that it may have served to increase the surface area of the organism and facilitate the absorption of nutrients from the environment. By contrast, the Romanesco broccoli is an angiosperm, a member of the group that contains all flowering plants. The inflorescence (cluster of flowers around the stem) of the Romanesco broccoli also displays a striking, and delicious, fractal pattern composed of self-similar helically arranged cones. The fossil record of angiosperms dates back to only the early Cretaceous period (approximately 146 Ma), which indicates that the last common ancestor of both rangeomorphs and broccoli is older than 565 Ma, and was probably a type of single-celled organism that almost certainly did not feature a fractal organization. Despite the fact that Avalofractus and the Romanesco broccoli are as distantly related as two organisms can possibly be, they have independently acquired a very similar and complex organization pattern under completely different ecological and adaptative pressures. It would be erroneous to conclude that these two species are closely related solely on the basis of this character, and thus it is more parsimonious to recognize this as a case of evolutionary convergence.
The second type of homoplasy is parallel evolution, which occurs when two or more closely related species acquire a very similar trait that was absent from their last common ancestor. To exemplify this, we can revisit the position of the nostrils in cetaceans and other tetrapods. The cetaceans are not in fact the only tetrapods in which the nostrils have migrated from their original position at the front of the snout to the top of the head (Fig. 2): some non-cetacean tetrapods have developed similar adaptations. One example is the sauropod dinosaur group Macronaria (Latin for ‘big nose’), which includes the well known genus Brachiosaurus (Fig. 4A1). These large dinosaurs roamed the Earth from the middle Jurassic period to the late Cretaceous period (approximately 175–65 Ma), and are characterized not only by their tremendous body size and long necks, but also by the presence of large nostrils positioned high on the skull (Fig. 4A). The distinctive location of the nostrils of macronarian dinosaurs was once thought to serve a function similar to that in cetaceans, and so these animals were thought to live in an aquatic environment. This interpretation is now defunct, and it has been suggested instead that the nostrils were probably covered by a fleshy membrane, and acted as a resonance chamber for emitting powerful calls. The last common ancestor of dinosaurs and cetaceans would have been a reptile-like amphibian, whose nostrils were located at the front of the snout. As before, it would be unparsimonious to group cetaceans and macronarian sauropods together solely on the basis of the position of their nostrils; rather, the feature has clearly evolved more than once in these different lineages (Fig. 4B). The careful reader will notice that the main difference between convergent evolution and parallel evolution is the rather arbitrary interpretation of how closely related the species are. As a general rule, it is more straightforward to envisage parallel evolution as the independent acquisition of traits with a similar function (analogous traits) within the context of relatively small taxonomic ranks, such as between closely related species, genera or families, whereas convergence tends to occur at much larger scales, such as classes, phyla and kingdoms.
Complications of phylogenetic parsimony:
Phylogenetic parsimony is a powerful tool, but there are caveats. The cases presented so far have been uncontroversial examples of the basic applications of parsimony to the study of evolutionary history. There are, however, more than a handful of scenarios in which parsimony is of limited use with the available data, and it can even become a source of error. For the final case study, we will take a very brief look at the picturesque field of arthropod phylogeny. Living arthropods are divided into four main groups: Chelicerata (horseshoe crabs and arachnids), Myriapoda (centipedes, millipedes and lesser known forms), Hexapoda (insects and their kin) and Crustacea (shrimps, true crabs, woodlice and many others). The precise phylogenetic relationships among these four groups have been subject to continuous debate, and practically every possible combination has been proposed at some point during the last century. The general agreement is that myriapods, hexapods and crustaceans form a group collectively known as Mandibulata, defined by the presence of mandible-like appendages on the head. Within Mandibulata, it was thought until the mid-1990s that myriapods and hexapods were the most closely related, forming a sub-group called Atelocerata or Tracheata (Fig. 5A), defined by the presence of single-branched legs, a tracheal system for breathing air and specialized excretion organs. However, with the widespread implementation of molecular-biology techniques that allow researchers to analyse the gene sequences of numerous organisms, increasing support has been found for a new group that included hexapods and crustaceans, denominated Tetraconata or Pancrustacea (Fig. 5B). This is supported by several similarities of the nervous and optical systems, and the structure of the mandibles. Currently the evidence is in favour of Tetraconata/Pancrustacea, and it has been suggested that the major morphological similarities shared by hexapods and myriapods are actually not homologous, but rather are the result of parallel evolution. This serves as a cautionary tale for evolutionary biologists: the validity of Atelocerata/Tracheata as a natural group was not questioned for several years, and even received strong support from parsimony. It was not until further sources of data became available that it was possible to reconsider previous assumptions and propose a new interpretation that better reflects their evolutionary history.
The widespread implementation of molecular techniques has allowed researchers to explore vast amounts of new information. However, this has not come without a cost: evolutionary biologists who rely mainly on molecular data have encountered an unsuspected source of error known as long-branch attraction, which is intricately associated with parsimony. Long-branch attraction is a phenomenon, sometimes encountered in phylogenetic analyses based on parsimony, in which two different lineages are clustered together, and thus inferred to be closely related, regardless of their true evolutionary relationships. Molecular phylogenies use gene sequences as the primary source of data, and every gene is composed of a combination of the same four nucleotides (adenine, thymine, guanine and cytosine); hence, the probability of two or more similar nucleotide sequences developing independently in rapidly evolving groups is very high. Parsimony-based analyses will cause these species to be grouped together, as without a broader evolutionary context it is often difficult to recognize if the similarity is due to common ancestry or chance. Long-branch attraction can be thought of as the result of convergent evolution at the level of the nucleotide sequences. Recent studies of arthropod phylogeny provide a clear example of this: although molecular phylogenies have recovered the association between hexapods and crustaceans, some studies have also found support for an unconventional group known as Paradoxopoda or Myriochelata (Fig. 5C), which includes chelicerates and myriapods despite their lack of morphological similarity. This result stands in marked contrast to the widely recognized Mandibulata hypothesis (Myriapoda + Hexapoda + Crustacea), and thus considerable effort has been put into resolving this issue. The latest research suggests that Paradoxopoda/Myriochelata is an artificial group produced by long-branch attraction, and thus current opinion favours the authenticity of Mandibulata; however, this is an ongoing debate, as recent studies have reported that some components of the nervous system of chelicerates and myriapods are much more similar to each other than to those of hexapods and crustaceans.
Suggestions for further reading:
Alvarez, L. W., Alvarez, W., Asaro, F. & Michel, H. V. 1980 Extraterrestrial cause for the Cretaceous–Tertiary extinction. Science 208, 1095–1108. (doi:10.1126/science.208.4448.1095)
Barnes, E. C. 2000 Ockham’s razor and the anti-superfluity principle. Erkenntnis 53, 353–374. (doi:10.1023/A:1026464713182)
Bergsten, J. 2005 A review of long-branch attraction. Cladistics 21, 163–193. (doi:10.1111/j.1096-0031.2005.00059.x)
Edgecombe, G. D. 2010 Arthropod phylogeny: An overview from the perspectives of morphology, molecular data and the fossil record. Arthropod Structure & Development 39, 74–87. (doi: 10.1016/j.asd.2009.10.002)
Ellis, J. & Schramm, D. N. 1995 Could a nearby supernova explosion have caused a mass extinction? Proceedings of the National Academy of Sciences, USA 92, 235–238. (doi: 10.1073/pnas.92.1.235)
Farris, J. S. 1983 The logical basis of phylogenetic analysis. In: Platnick, N. I. & Funk, V. A. (Eds) Advances in Cladistics 2: Proceedings of the Second Meeting of the Willi Hennig Society. 7–36 New York: Columbia University Press. IBSN 9780231048088
Gatesey, J. & O’Leary, M. A. 2001 Deciphering whale origins with molecules and fossils. TRENDS in Ecology & Evolution. 16, 562–570. (doi:10.1016/S0169-5347(01)02236-4)
Narbonne, G. M., Laflamme, M., Greentree, C. & Trusler, P. 2009 Reconstructing a lost world: Ediacaran rangeomorphs from Spaniard’s Bay, Newfoundland. Journal of Paleontology 83, 503–523. (doi:10.1666/08-072R1.1)
Nixon, K. C. & Carpenter, J. M. 2011 On homology. Cladistics 27, 1–10. (doi:10.1111/j.1096-0031.2011.00371.x)
Telford, M. J. & Thomas, R. H. 1995 Demise of the Atelocerata? Nature 376, 123–124. (doi:10.1038/376123a0)
Sober, E. 1990 Let’s razor Ockham’s razor. In: Knowles, D. Explanation and its Limits. 73–94. Cambridge University Press. IBSN 9780521395984
1 Department of Earth Sciences, University of Cambridge, Downing Street, Cambridge, CB2 3EQ, UK. Email.