Experience
Boston University
- Integrating diverse, large-scale biological datasets from a variety of experimental sources in different formats.
- Building mathematical models of biological processes, such as protein folding and alternative splicing, and evaluating them through comparisons with data.
- Developed a Bayesian network model of a high-throughput experimental method to detect protein interactions which estimated experimental parameters such as false-positive and false-negative rates.
- Identified relationships between the structure of proteins and their evolutionary history, using robust statistical testing.
CERN
- Analyzed a petabyte-scale dataset of particle collisions, using thousands of CPUs in parallel.
- Worked with highly sophisticated models of particle collisions, improving them and quantifying the uncertainty on the model parameters through comparisons with experimental data.
- Developed statistical software in Python to conduct hypothesis tests between different particle physics models and calculate confidence intervals on model parameters, taking into account multiple sources of measurement uncertainty.
- Wrote high performance C++ code, run as part of the detector software.
- Trained and optimized a neural network to classify particle collisions.
Education
University College London
Awarded the UCL High Energy Physics Group Prize for outstanding research.
University of Warwick
Skills
Python and C++ development experience Over 4 years of writing software to analyze data.
Statistics Strong experience using advanced statistics including Bayesian methods, hypothesis testing and machine learning. Responsible for the statistical methodology of a CERN publication.
Analysis on petabyte-scale datasets Using large computing clusters.
Publication standard data visualization Produced figures published in reputable scientific journals.
Machine Learning Developed the use of a neural network to classify different types of particle collisions.
Presentations From weekly meetings to speaking at 3 national conferences and 1 international conference, to audiences of up to 200 people.
Software UNIX environments, shell scripting, R, SQL, GIT, LaTeX.