Experience and Education
Senior Data Science Engineer, SimSpace
Principal Data Scientist, Geneia
Medical Science Liaison (MSL), Rheumatology, Bristol-Myers Squibb
Medical Science Liaison (MSL), Neurology, EMD Serono
University of Pennsylvania, Philadelphia, PA, Ph.D., Neuroscience
Professional Outlets
AI and Bias In Healthcare – a video discussion about social bias in artificial intelligence and how to address it
AI interpretability is especially critical in healthcare – a blog post about model interpretability
Model interpretability and healthcare – highlights from a podcast about data science, model interpretability, COVID-19, and healthcare
Personal Projects
State-Space Models: Learning the Kalman Filter – Different research fields may speak different mathematical languages. There’s nothing like rigorous software testing for accurate translation. Go here for the code.
Beyond Point Estimates – When we need to predict more than just a mean or a median, full posterior distributions from Bayesian models are often the way to go. But sometimes, that’s too computationally intensive and we need some shortcuts. Quantile regression is a handy alternative. For even more efficiency, we can use multi-task learning so that a single model produces all the quantiles we want. Go here for the code.
Weather and climate API – Using mock testing and FastAPI to query, create, and test web APIs. Go here for the code.
Pandas vs. Polars, Python vs. Rust: Who will win? – Benchmarks are nice, but how fast are our favorite data tools on realistic data workflows? Go here for the code.
Bayesian Updating with a Beta-Binomial Model: Basketball Edition – We start the season thinking our team is this good (or bad). But as the wins and losses pile up, how do we update our priors? Go here for the code.
Bayesian Updating with a Dirichlet-Multinomial Model: Visualizing More Outcomes – As we add outcomes to our model, the concepts stay the same but the dynamics grow more complex. Viewing animations of the model can help us develop intuitions about how it works. Go here for the code.
Investment Performance Metrics Dashboard – Plotly Dash app for tracking profit/loss and other investment performance per transaction or over time. Go here for the code.
Monitoring Data Pipelines with Airflow and Tcl/Tk – Airflow is terrific for scheduling and monitoring data pipeline components. But we also want to monitor in real-time what’s happening inside those components. Go here for the code.
Add Columns to Polars Dataframes Quickly – There are straightforward, slow ways to do things, and then there are faster ways. Know how to choose. Go here for the code.
Deep Reinforcement Learning and Rainbow – How does a computer learn to play video games?
Information Theory for Toddlers – A low-entropy bedtime story
SHAP Tutorial – How do we use Shapley values to interpret machine learning models? Go here for the code.
Case Study: How to Translate a Healthcare Problem into a Predictive Modeling Problem – How do we correctly select cases for our training data?
The Peanuts Project – Charlie Brown, Snoopy, Lucy, Linus . . . who was the most important character? Which of their relationships was the strongest? Indulge some nostalgia and hum some Guaraldi!
Classifying Medicine – How do patients experience conventional and alternative medicine differently? Yelp, random forests, ROC curves, and so much more!
Recent posts
Recent posts, mostly links to interesting articles that I have been reading:
- What I Read: Classifying pdfshttps://snats.xyz/pages/articles/classifying_a_bunch_of_pdfs.html Classifying all of the pdfs on the internetSantiago Pedroza2024-08-18 “How would you classify all the pdfs in the internet? Well, that is what I tried doing this time.”
- What I Read: Tool Retrieval, RAG
- What I Read: LLM Pre-training Post-training
- What I Read: Difference, Statements and Expressions
- What I Read: Open-endedness, Agentic AI
- What I Read: Turing Test, intelligence
- What I Read: Regularization, polynomial bases
- What I Read: Contextual Bandit, LinUCB:https://truetheta.io/concepts/reinforcement-learning/lin-ucb A Reliable Contextual Bandit Algorithm: LinUCBDJ RichAugust 6, 2024 “A user visits a news website. Which articles should they be shown?”
- What I Read: Big Data is Dead
- What I Read: Visual Guide, Quantization
Browse posts
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- August 2020