Overview

Experience and Education

Senior Data Science Engineer, SimSpace

Principal Data Scientist, Geneia

Medical Science Liaison (MSL), Rheumatology, Bristol-Myers Squibb

Medical Science Liaison (MSL), Neurology, EMD Serono

University of Pennsylvania, Philadelphia, PA, Ph.D., Neuroscience


LinkedIn

Github

Publications

Thesis Lab


Professional Outlets

AI and Bias In Healthcare – a video discussion about social bias in artificial intelligence and how to address it

AI interpretability is especially critical in healthcare – a blog post about model interpretability

Model interpretability and healthcare – highlights from a podcast about data science, model interpretability, COVID-19, and healthcare


Personal Projects

State-Space Models: Learning the Kalman Filter – Different research fields may speak different mathematical languages. There’s nothing like rigorous software testing for accurate translation. Go here for the code.

Beyond Point Estimates – When we need to predict more than just a mean or a median, full posterior distributions from Bayesian models are often the way to go. But sometimes, that’s too computationally intensive and we need some shortcuts. Quantile regression is a handy alternative. For even more efficiency, we can use multi-task learning so that a single model produces all the quantiles we want. Go here for the code.

Weather and climate API – Using mock testing and FastAPI to query, create, and test web APIs. Go here for the code.

Pandas vs. Polars, Python vs. Rust: Who will win? – Benchmarks are nice, but how fast are our favorite data tools on realistic data workflows? Go here for the code.

Bayesian Updating with a Beta-Binomial Model: Basketball Edition – We start the season thinking our team is this good (or bad). But as the wins and losses pile up, how do we update our priors? Go here for the code.

Bayesian Updating with a Dirichlet-Multinomial Model: Visualizing More Outcomes – As we add outcomes to our model, the concepts stay the same but the dynamics grow more complex. Viewing animations of the model can help us develop intuitions about how it works. Go here for the code.

Investment Performance Metrics Dashboard – Plotly Dash app for tracking profit/loss and other investment performance per transaction or over time. Go here for the code.

Monitoring Data Pipelines with Airflow and Tcl/Tk – Airflow is terrific for scheduling and monitoring data pipeline components. But we also want to monitor in real-time what’s happening inside those components. Go here for the code.

Add Columns to Polars Dataframes Quickly – There are straightforward, slow ways to do things, and then there are faster ways. Know how to choose. Go here for the code.

Deep Reinforcement Learning and Rainbow – How does a computer learn to play video games?

Information Theory for Toddlers – A low-entropy bedtime story

SHAP Tutorial – How do we use Shapley values to interpret machine learning models? Go here for the code.

Case Study: How to Translate a Healthcare Problem into a Predictive Modeling Problem – How do we correctly select cases for our training data?

The Peanuts Project – Charlie Brown, Snoopy, Lucy, Linus . . . who was the most important character? Which of their relationships was the strongest? Indulge some nostalgia and hum some Guaraldi!

Classifying Medicine – How do patients experience conventional and alternative medicine differently? Yelp, random forests, ROC curves, and so much more!


Recent posts

Recent posts, mostly links to interesting articles that I have been reading:

  • What I Read: composable data platforms
    https://jack-vanlightly.com/blog/2025/2/17/towards-composable-data-platforms Towards composable data platformsJack VanlightlyFebruary 17, 2025 “The key is that it allows for the separation of data from metadata and shared storage from compute. Through metadata, one tableContinue readingWhat I Read: composable data platforms
  • What I Read: Model, Product
    https://vintagedata.org/blog/posts/model-is-the-product The Model is the ProductAlexander Doria“There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality?Continue readingWhat I Read: Model, Product
  • What I Read: Model Calibration
    Understanding Model Calibration: A Gentle Introduction & Visual Exploration Maja PavlovicFeb 11, 2025 “Calibration makes sure that a model’s estimated probabilities match real-world outcomes… if a weather forecasting model predictsContinue readingWhat I Read: Model Calibration
  • What I Read: Gamma Hurdle
    The Gamma Hurdle Distribution The Gamma Hurdle DistributionJeff AllardFeb 7, 2025 “Modeling highly skewed continuous values in marketing experiments”
  • What I Read: BAML
    https://thedataquarry.com/posts/baml-is-building-blocks-for-ai-engineers BAML is like building blocks for AI engineersPrashanth Rao2025-02-10 “I’ll explain more about how BAML, a domain-specific language for helping LLMs generate better structured outputs, provides AI engineers theContinue readingWhat I Read: BAML
  • What I Read: BAML, agentic
    https://thedataquarry.com/posts/baml-and-future-agentic-workflows Why I’m excited about BAML and the future of agentic workflowsPrashanth Rao2025-01-29 “Although there have been new agentic and AI workflow orchestration frameworks coming out seemingly every month latelyContinue readingWhat I Read: BAML, agentic
  • What I Read: RL, PPO, GRPO
    https://yugeten.github.io/posts/2025/01/ppogrpo A vision researcher’s guide to some RL stuff: PPO & GRPOYuge (Jimmy) ShiJanuary 31, 2025 “This is a deep dive into Proximal Policy Optimization (PPO), which is one ofContinue readingWhat I Read: RL, PPO, GRPO
  • What I Read: group relative policy optimization
    https://superb-makemake-3a4.notion.site/group-relative-policy-optimization-GRPO-18c41736f0fd806eb39dc35031758885 group relative policy optimization (GRPO)Apoorv NandanJan 31, 2025 “GRPO became popular primarily due to the success of deepseek r1, which used this algorithm to train reasoning capabilities into theirContinue readingWhat I Read: group relative policy optimization
  • What I Read: Reasoning LLMs
    https://magazine.sebastianraschka.com/p/understanding-reasoning-llms Understanding Reasoning LLMsSebastian Raschka, PhDFeb 05, 2025 “This article describes the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities.”
  • What I Read: individual risk
    https://link.springer.com/content/pdf/10.1007/s11229-015-0953-4.pdf On individual riskPhilip Dawid2017 “We distinguish between “groupist” and “individualist” understandings of probability, and explore both “group to individual” and “individual to group” approaches to characterising individual risk.”

Browse posts