Overview

Experience and Education

Senior Data Science Engineer, SimSpace

Principal Data Scientist, Geneia

Medical Science Liaison (MSL), Rheumatology, Bristol-Myers Squibb

Medical Science Liaison (MSL), Neurology, EMD Serono

University of Pennsylvania, Philadelphia, PA, Ph.D., Neuroscience


LinkedIn

Github

Publications

Thesis Lab


Professional Outlets

AI and Bias In Healthcare – a video discussion about social bias in artificial intelligence and how to address it

AI interpretability is especially critical in healthcare – a blog post about model interpretability

Model interpretability and healthcare – highlights from a podcast about data science, model interpretability, COVID-19, and healthcare


Personal Projects

State-Space Models: Learning the Kalman Filter – Different research fields may speak different mathematical languages. There’s nothing like rigorous software testing for accurate translation. Go here for the code.

Beyond Point Estimates – When we need to predict more than just a mean or a median, full posterior distributions from Bayesian models are often the way to go. But sometimes, that’s too computationally intensive and we need some shortcuts. Quantile regression is a handy alternative. For even more efficiency, we can use multi-task learning so that a single model produces all the quantiles we want. Go here for the code.

Weather and climate API – Using mock testing and FastAPI to query, create, and test web APIs. Go here for the code.

Pandas vs. Polars, Python vs. Rust: Who will win? – Benchmarks are nice, but how fast are our favorite data tools on realistic data workflows? Go here for the code.

Bayesian Updating with a Beta-Binomial Model: Basketball Edition – We start the season thinking our team is this good (or bad). But as the wins and losses pile up, how do we update our priors? Go here for the code.

Bayesian Updating with a Dirichlet-Multinomial Model: Visualizing More Outcomes – As we add outcomes to our model, the concepts stay the same but the dynamics grow more complex. Viewing animations of the model can help us develop intuitions about how it works. Go here for the code.

Investment Performance Metrics Dashboard – Plotly Dash app for tracking profit/loss and other investment performance per transaction or over time. Go here for the code.

Monitoring Data Pipelines with Airflow and Tcl/Tk – Airflow is terrific for scheduling and monitoring data pipeline components. But we also want to monitor in real-time what’s happening inside those components. Go here for the code.

Add Columns to Polars Dataframes Quickly – There are straightforward, slow ways to do things, and then there are faster ways. Know how to choose. Go here for the code.

Deep Reinforcement Learning and Rainbow – How does a computer learn to play video games?

Information Theory for Toddlers – A low-entropy bedtime story

SHAP Tutorial – How do we use Shapley values to interpret machine learning models? Go here for the code.

Case Study: How to Translate a Healthcare Problem into a Predictive Modeling Problem – How do we correctly select cases for our training data?

The Peanuts Project – Charlie Brown, Snoopy, Lucy, Linus . . . who was the most important character? Which of their relationships was the strongest? Indulge some nostalgia and hum some Guaraldi!

Classifying Medicine – How do patients experience conventional and alternative medicine differently? Yelp, random forests, ROC curves, and so much more!


Recent posts

Recent posts, mostly links to interesting articles that I have been reading:

  • What I Read: data migrations
    https://yorickpeterse.com/articles/building-a-better-and-scalable-system-for-data-migrations Building a better and scalable system for data migrationsYorick PeterseOctober 24, 2024 “I’ve been thinking about what a better solution to data migrations might look like for a while,Continue readingWhat I Read: data migrations
  • What I Read: myths, randomisation
    https://www.bps.org.uk/psychologist/dispelling-myths-about-randomisation Dispelling myths about randomisation14 October 2024 “The first myth is that randomisation works because it balances confounders…. This leads to the second myth, which is that we should testContinue readingWhat I Read: myths, randomisation
  • What I Read: Sampling, SQL
    https://blog.moertel.com/posts/2024-08-23-sampling-with-sql.html Sampling with SQLTom MoertelAugust 23, 2024 “Sampling is one of the most powerful tools you can wield to extract meaning from large datasets…. If you know how to takeContinue readingWhat I Read: Sampling, SQL
  • What I Read: Gaussians
    https://gestalt.ink/gaussians Understanding Gaussians “The Gaussian distribution, or normal distribution is a key subject in statistics, machine learning, physics, and pretty much any other field that deals with data and probability.”
  • What I Read: evaluation quicksand
    https://www.interconnects.ai/p/building-on-evaluation-quicksand Building on evaluation quicksandNathan LambertOct 16, 2024 “In my article on “Big Tech’s LLM evals are just marketing,” I didn’t uncover the deeper reasons as to why can’t fullyContinue readingWhat I Read: evaluation quicksand
  • What I Read: Mamba, State Space
    https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state A Visual Guide to Mamba and State Space ModelsMaarten GrootendorstFeb 19, 2024 “To further improve LLMs, new architectures are developed that might even outperform the Transformer architecture. One ofContinue readingWhat I Read: Mamba, State Space
  • What I Read: Unit Disk Sampling
    https://towardsdatascience.com/unit-disk-uniform-sampling-91880f3740fa?gi=4b73b464a4d0 Unit Disk Uniform SamplingThomas RouchSep 16, 2024 “Discover the optimal transformations to apply on the standard [0,1] uniform random generator for uniformly sampling a 2D disk”
  • What I Read: Multi Objective Optimisation
    https://blog.flipkart.tech/multi-objective-optimisation-in-suggestions-ranking-flipkart-49099b951eae?gi=04415d605535 Multi Objective Optimisation in Suggestions Ranking @ FlipkartPranjal SanjanwalaApr 19, 2024 “…we aim to provide a perfectly tailored set of suggestions for that user at that point in time.Continue readingWhat I Read: Multi Objective Optimisation
  • What I Read: cosine similarity
    https://tomhazledine.com/cosine-similarity-alternatives Alternatives to cosine similarityTom Hazledine9/20/24 8:00 PM “Cosine similarity is the recommended way to compare vectors, but what other distance functions are there? And are any of them better?”
  • What I Read: Bounded Kernel Density Estimation
    https://towardsdatascience.com/bounded-kernel-density-estimation-2082dff3f47f Bounded Kernel Density EstimationThomas RouchFeb 28, 2024 “…when it comes to estimating continuous densities, people often resort to treating it as a mysterious black box. However, understanding this conceptContinue readingWhat I Read: Bounded Kernel Density Estimation

Browse posts