Skip To Content
Cambridge University Science Magazine
Different diseases are associated with different pathogens and a diverse range of genetic, social, and environmental factors. The distribution and abundance of diseases have changed throughout the 19th to 21st century due to shifts in climate and lifestyle. Adaptable and innovative epidemiological approaches are required to navigate this ever changing ‘disease-scape’.

The process of seeking connections becomes paramount during an outbreak investigation. Epidemiologists are under pressure to find links between cases, asking questions about shared socio demographic characteristics and common exposures to identify the causative agent and halt transmission. As the threat posed by emerging infectious diseases rises, epidemiology’s toolkit must be sharpened, involving the enhancement of traditional methods and the integration of data analytics, genomic sequencing, and interdisciplinary collaboration.


John Snow’s pioneering investigation of the 1854 cholera epidemic in London’s Soho district is a classic historical example of how connection-finding lies at the core of epidemiology. During this outbreak, Snow meticulously mapped cholera deaths, allowing the visualisation of case clustering around the Broad Street water pump. Snow hypothesised that contaminated water from the Broad Street pump was the source of cholera; two other astute observations strengthened his hypothesis. Firstly, a local brewery and workhouse had much lower cholera death rates than the surrounding area. Further investigation revealed that the water sources for these two establishments were separate from the Broad Street pump. Secondly, Snow’s map contained an outlier, a cholera case far from the pump. This ‘outlier’ was a 59-year-old woman who sent for Broad Street pump water daily as she liked the taste. Snow’s removal of the pump handle led to the cessation of the outbreak and assisted in breaking the dogma that cholera and other diseases were spread by miasma or ‘bad air’.

Through his population studies, John Snow inferred the existence of disease causing microbes and many of their properties before the development of modern germ theory and before the causative agent of cholera, Vibrio cholera, had been isolated. Soon after, Louis Pasteur and Robert Koch began to prove that specific microbes were responsible for causing specific diseases. Koch’s postulates set the criteria for disease causality. To be causal, the microbe must:
  1. be present in all diseased organisms,
  2. be isolated from a diseased organism in pure culture,
  3. produce disease in healthy organisms upon inoculation,
  4. be re-isolated from the inoculated host individuals.

With these criteria, the microbial basis of many infectious diseases was established.

The second half of the 20th century saw a shift in emphasis to non communicable diseases, reflecting a decline in infectious diseases and an increase in cancer and cardiovascular diseases in many countries. As most non-communicable diseases are multifactorial (not having a single cause), Koch’s postulates could no longer be used to establish causality. To unravel the complex aetiologies of non-communicable diseases, the field of epidemiology began transitioning from purely descriptive observations to the development and widespread use of analytical approaches. These included case-control and cohort studies, combined with the refinement of statistical methods. Qualitative connections between time, place, and people became quantitative causality.

The use of these analytical approaches was critical in providing incontrovertible evidence, implicating smoking as a major risk factor for lung cancer. This process began by noticing that the marked increase in tobacco consumption during the 20th century, primarily due to the invention of the cigarette, was followed by a large increase in lung cancer cases. In the 1950s, two large case-control studies indicated a strong association between smoking and lung cancer. However, these studies were not enough to halt the tobacco epidemic, and tobacco industries actively resisted and tiredly campaigned against the scientific consensus.

Whilst case-control studies were critical in establishing the link between smoking and lung cancer, critics, including in the tobacco industry, argued that these studies were vulnerable to recall bias due to their retrospective nature. Richard Doll and Austin Bradford Hill undertook the landmark ‘British Doctors Study’ to address these concerns and strengthen the evidence. This was a longitudinal study conducted over 50 years on a population of doctors who were questioned on their smoking habits. The results were clear and hard to disprove: the longer you smoke and the more you smoke, the greater your risk of lung cancer. After prolonged court battles, the implementation of anti-smoking campaigns and policies against tobacco use eventually turned the tide on smoking and lung cancer.

With such examples, the initial hypothesis put forward by connection observing individuals challenged prevailing beliefs and practices and was subsequently met with resistance. Resistance comes in all forms and may be due to economic interests or cultural factors and the inertia of established practices. However, as evidence accumulates, acceptance grows, and paradigms eventually shift. Epidemiology now firmly relies on case control studies and cohort studies to establish statistical significance and causality. Despite this, these studies can only begin after astute connections and observations have been made by epidemiologists and pursued with diligence.


Advances in technology, laboratory techniques, and data analysis have added to epidemiologists’ “connection finding” toolkit. Genomic sequencing is one such revolutionary technology that resulted in the blossoming of genomic and genetic epidemiology.

Genetic epidemiology seeks to reveal the role of genetic factors and their interplay with the environment in determining health and disease. In the pre-genomic era, family and twin studies were used to investigate the relative contributions of genetic and environmental factors on disease occurrence. Family studies use linkage analysis to narrow down the chromosomal location of disease genes. Studies rely on specific polymorphic markers co-segregating with the disease due to being tightly linked to the causal locus.

Linkage analysis allowed the identification of rare alleles causing monogenic disorders, which are diseases caused by the inheritance of single gene mutations. However, unveiling low penetrance loci contributing to complex diseases required genome wide association studies (GWAS). GWAS involve taking large samples of cases and controls and testing hundreds of thousands of genetic variants to find those statistically associated with a specific trait or disease. Integrating GWAS findings with functional genomics data is required to move beyond association and establish causal relationships between genetic variants and disease. Establishing causality is a critical step in translating GWAS discoveries into clinical applications.

Genomic epidemiology uses pathogen genomic data to determine the distribution and spread of an infectious disease in a specified population. The power of genomic epidemiology was clearly demonstrated during the COVID-19 pandemic. SARS-CoV-2 genome sequencing allowed the construction of phylogenetic trees, which show the evolutionary relationships between variants. By combining phylogenetic analysis and epidemiological variables, researchers could identify outbreak sources, monitor evolutionary changes, and assess the impact of these evolutionary changes on transmission, clinical severity, and vaccine efficacy.

Genomic sequencing has allowed the development of new vaccine techniques and the identification of potential targets. In fact, two of the main COVID-19 vaccines in circulation, Pfizer and Moderna, relied on genome sequencing to generate mRNA encoding the viral spike (S) glycoprotein of SARS-CoV-2. The collaboration between genomics and epidemiology will no doubt continue to assist our understanding of the dynamics of diseases, leading to more effective control and prevention strategies.

As the field adapts to the changing landscape of diseases and the emergence of new technologies, the fundamental importance of connection finding will continue to remain the bedrock of epidemiological inquiry.

Article by Bethan Powell

Artwork by Mariadaria Ianni Ravn