Overview

Experience and Education

Senior Data Science Engineer, SimSpace

Principal Data Scientist, Geneia

Medical Science Liaison (MSL), Rheumatology, Bristol-Myers Squibb

Medical Science Liaison (MSL), Neurology, EMD Serono

University of Pennsylvania, Philadelphia, PA, Ph.D., Neuroscience


LinkedIn

Github

Publications

Thesis Lab


Professional Outlets

AI and Bias In Healthcare – a video discussion about social bias in artificial intelligence and how to address it

AI interpretability is especially critical in healthcare – a blog post about model interpretability

Model interpretability and healthcare – highlights from a podcast about data science, model interpretability, COVID-19, and healthcare


Personal Projects

State-Space Models: Learning the Kalman Filter – Different research fields may speak different mathematical languages. There’s nothing like rigorous software testing for accurate translation. Go here for the code.

Beyond Point Estimates – When we need to predict more than just a mean or a median, full posterior distributions from Bayesian models are often the way to go. But sometimes, that’s too computationally intensive and we need some shortcuts. Quantile regression is a handy alternative. For even more efficiency, we can use multi-task learning so that a single model produces all the quantiles we want. Go here for the code.

Weather and climate API – Using mock testing and FastAPI to query, create, and test web APIs. Go here for the code.

Pandas vs. Polars, Python vs. Rust: Who will win? – Benchmarks are nice, but how fast are our favorite data tools on realistic data workflows? Go here for the code.

Bayesian Updating with a Beta-Binomial Model: Basketball Edition – We start the season thinking our team is this good (or bad). But as the wins and losses pile up, how do we update our priors? Go here for the code.

Bayesian Updating with a Dirichlet-Multinomial Model: Visualizing More Outcomes – As we add outcomes to our model, the concepts stay the same but the dynamics grow more complex. Viewing animations of the model can help us develop intuitions about how it works. Go here for the code.

Investment Performance Metrics Dashboard – Plotly Dash app for tracking profit/loss and other investment performance per transaction or over time. Go here for the code.

Monitoring Data Pipelines with Airflow and Tcl/Tk – Airflow is terrific for scheduling and monitoring data pipeline components. But we also want to monitor in real-time what’s happening inside those components. Go here for the code.

Add Columns to Polars Dataframes Quickly – There are straightforward, slow ways to do things, and then there are faster ways. Know how to choose. Go here for the code.

Deep Reinforcement Learning and Rainbow – How does a computer learn to play video games?

Information Theory for Toddlers – A low-entropy bedtime story

SHAP Tutorial – How do we use Shapley values to interpret machine learning models? Go here for the code.

Case Study: How to Translate a Healthcare Problem into a Predictive Modeling Problem – How do we correctly select cases for our training data?

The Peanuts Project – Charlie Brown, Snoopy, Lucy, Linus . . . who was the most important character? Which of their relationships was the strongest? Indulge some nostalgia and hum some Guaraldi!

Classifying Medicine – How do patients experience conventional and alternative medicine differently? Yelp, random forests, ROC curves, and so much more!


Recent posts

Recent posts, mostly links to interesting articles that I have been reading:

  • What I Read: S3, Age
    https://materializedview.io/p/s3-is-showing-its-age S3 Is Showing Its AgeChris RiccominiMay 22, 2024 “There’s no denying that S3 is a feat of engineering…. But S3’s feature set is falling behind its competitors.”
  • What I Read: history, tidyverse
    https://hadley.github.io/25-tidyverse-history A personal history of the tidyverseHadley WickhamJanuary 28, 2025 “I’ll explore the defining features that make the tidyverse unique, highlight the contributions that I’m most proud of, and examineContinue readingWhat I Read: history, tidyverse
  • What I Read: AI-Ready Data
    https://www.montecarlodata.com/blog-3-steps-to-ai-ready-data 3 Steps to AI-Ready DataBarr MosesDec 12 2024 “But data leaders understand something that’s often lost on most C-Suites: GenAI products are only as valuable as the first-party dataContinue readingWhat I Read: AI-Ready Data
  • What I Read: Short, Nvidia
    https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda The Short Case for Nvidia StockJeffrey EmanuelJanuary 25, 2025 “…NVIDIA faces an unprecedented convergence of competitive threats that make its premium valuation increasingly difficult to justify… The company’s supposedContinue readingWhat I Read: Short, Nvidia
  • What I Read: VAE
    https://www.rehansheikh.com/blog/vae What the F*** is a VAE?Rehan SheikhJanuary 23, 2025 “A disentangled variational autoencoder aims for each latent dimension… to correspond to a single factor of variation in your dataset.”
  • What I Read: commonality analysis
    https://lnalborczyk.github.io/blog/2025-01-07-commonality/index.html Making sense of commonality analysisLadislas Nalborczyk2025-01-07 “Commonality analysis provides a valuable tool for addressing such questions by partitioning the explained variance in multiple regression into distinct components.”
  • What I Read: Single-Node Processing
    https://practicaldataengineering.substack.com/p/the-rise-of-single-node-processing The Rise of Single-Node Processing: Challenging the Distributed-First MindsetAlireza SadeghiJan 06, 2025 “As we move away from the “big data” era’s distributed-first mindset, many businesses are discovering that single-nodeContinue readingWhat I Read: Single-Node Processing
  • What I Read: memorization, novelty
    https://blog.kjamistan.com/how-memorization-happens-novelty.html How memorization happens: Novelty09 Dezember 2024 “…repeated text and images incentivize training data memorization, but that’s not the only training data that machine learning models memorize. Let’s take aContinue readingWhat I Read: memorization, novelty
  • What I Read: Adaptive LLMs
    https://sakana.ai/transformer-squared Transformer²: Self-Adaptive LLMssakana.aiJanuary 15, 2025 “Imagine a machine learning system that could adjust its own weights dynamically to thrive in unfamiliar settings, essentially illustrating a system that evolves asContinue readingWhat I Read: Adaptive LLMs
  • What I Read: Tensor Dimensions, Transformers
    https://huggingface.co/blog/not-lain/tensor-dims Mastering Tensor Dimensions in TransformersHafedh HichriJanuary 12, 2025 “Most generative AI models are built using a decoder-only architecture. In this blog post, we’ll explore a simple text generation model,Continue readingWhat I Read: Tensor Dimensions, Transformers

Browse posts