Skip To Content
Cambridge University Science Magazine

Chris Howe is a professor of biology, specialising in photosynthetic organisms. Although photosynthesis is his life’s passion, his group explores topics outside of typical biology. 

He recalls a dinner debate at High Table with visiting fellow, Prof. Linne Mooney, and Dr. Fred Ratcliffe, the University Librarian. At the time, Prof. Mooney was an English scholar, interested in the copying history of mediaeval manuscripts — specifically, Chaucer’s The Canterbury Tales. Before the advent of the printing press, mediaeval manuscripts had to be copied painstakingly by hand. During the process, scribes often accidentally (or deliberately) introduced errors, sometimes even changing the meaning of the text from the original. To see the copying history, scholars manually construct ‘stemmata’: tree-like diagrams, showing the relationships of various versions of the text to earlier manuscripts.

Chris’ group realised textual scholars and biologists face the same fundamental problems in trying to understand relationships between texts, or organisms. Biologists construct trees and networks that represent an organism’s ancestral lineage, showing how it has evolved, in a field called ‘phylogenetics’: ‘phylogenetic tree building is basically the same as stemma building in texts’. There already exists a breadth of computational tools available for analysing biological sequence data - so, ‘it should be relatively simple to translate them to the analysis of texts’, which are also just sequences of letters!

Chaucer’s The Canterbury Tales: The Wife of Bath’s Prologue

Fast-forward a year: in 1998, the interdisciplinary team of humanists and biologists published an article in Nature, representing the first ever proof-of-concept application of phylogenetic tree building to texts. 

They analysed 850 lines of 58 surviving manuscripts of Chaucer’s The Wife of Bath’s Prologue and constructed trees using two different analytic methods (Fig. 1A) — work, which would have taken traditional scholars months. Each node in the tree represents a common ancestor and the branch length represents the number of changes between texts (how accurately it was copied) — longer branches mean more changes.

They found manuscripts group together, descending from a single and distinct common ancestor. Interestingly, analysis suggested their ancestor (Chaucer’s original copy) was likely not finished, but a working draft which contained Chaucer’s own notes and alternative drafts of sections. Like Chinese whispers, the message changed subtly, leading editors to produce radically different copies of The Canterbury Tales.

Orlando Gibbons’ Prelude in G

Phylogenetic techniques can also be applied to music. The complexities involved in printing musical manuscripts meant music was propagated by handwritten manuscripts, even past the invention of the printing press. Chris collaborated with musicologists to investigate Orlando Gibbons’ Prelude in G — the final piece in the Parthenia collection of keyboard music (the first printed keyboard music collection in England). Changes introduced in copying often resulted in audibly different versions. For each location in the music, each variation was classified by pitch, note pattern, ties, ornaments, and rhythm. They found sources can be split into two main groups, suggesting two main versions

of the text were circulating — which could have been derived from Parthenia itself.

Little Red Riding Hood and Persian carpets

Chris discussed how phylogenetic analysis has been extended by others in anthropology. Anthropologist Jamshid J. Tehrani was interested in classifying the evolution of Little Red Riding Hood-like folktales from around the world. Phylogenetic analysis became a powerful tool used by Jamshid to reveal their geography of circulation and oral evolution — overcoming previous problems in tracking the history of orally transmitted stories. Where does the ancestral tale come from? East Asian tales could be the missing link, found to form a new separate group. The ancestral tale could have originated in the East, and later spread to the West by trade, finally splitting into two lineages that gave rise to the familiar Little Red Riding Hood. 

Jamshid also applied phylogenetic analysis to the evolution of Turkish carpet designs. Tree analysis revealed textiles arose through natural design evolution, rather than trade and cultural borrowing. The ‘consistency index’ is used as a measure of retention of a trait through the tree — a low index indicates a trait is gained or lost independently in separate lineages over time. Jamshid was interested in how war affected textiles following the Turks’ defeat in 19C; the lower consistency index indicated more borrowing of designs and motifs. Thus, phylogenetics revealed a greater dependency on trade which changed the driving force behind textile design.

Similarities with biology

For Chris, the most remarkable finding is the striking similarity of copy-transmission to biological evolution (Fig. 1B): ‘we use words as analogies of mutations in DNA in textbooks, so why don’t we use DNA as an analogy of words?’ Each error is replicated in future editions of a text — and, as errors accumulate, each text lineage adopts a subtly different meaning. The same happens during evolution: mutations in DNA are passed down and accumulate, creating new species. Chris’ team found scribes sometimes switch exemplars mid-way, resulting in a final copy that is a hybrid of two sources — something which scholars termed ‘contamination’. This is akin to the biological process of ‘recombination’, where DNA molecules mix between organisms to produce new genetic mixtures, creating variation in traits.

Trying to reconstruct the ancestral text is akin to ancestral protein reconstruction: where we try to find the original protein sequence, based on the sequences of current and past proteins. A change in sequence in one protein makes changes in another interacting protein more likely. We see the same in music, where one change in notation causes a ‘domino effect’, where subsequent notes adjust to the time signature or alternative chords. Like silent mutations in DNA (cause no change in protein sequence), some changes in musical notation have no audible effect; for example, a semibreve is performed identically to tied minim!

Tepid reactions

Have scholars found phylogenetic techniques useful? Chris says the reactions have been polarising: ‘the ones that hate it often misunderstand what we are trying to do’. Scholars and biologists alike think computer-found relationships are prescriptive and binary. Computational results can be particularly uncomfortable when findings go against years of qualitative intuition from studying the artefacts. In reality, each evolutionary prediction is associated with some degree of uncertainty. In fact, different algorithms may produce subtly different trees. Tree building can seem like a statistical black box, and Chris believes scientists are partly to blame for mixed receptions: ‘we must try to better explain what we are really doing’.

Future perspectives

Since the seminal Canterbury Tales paper, phylogenetic analysis has been applied to a wide variety of artefacts, mapping histories across literature, music and textiles. So, what can we expect for the future? Chris alluded to future phylogenetic analyses, led by Dr. Heather Windram, in collaboration with musicologist Yo Tomita (Queen's University, Belfast) and Terence Charlston (Royal College of Music), on Bach’s The Well-Tempered Clavier music collection. Thus — in a field now coined ‘phylomemetics’, describing an application to non-biological objects — the tales of phylogenetics continue… 


Bartek Witek is a third-year undergraduate in biochemistry at St. Catherine's College. Artwork by Sumit Sen.