Skip to content

Andrew Fairless, Ph.D.

Data, Science, and Tinkering

Overview
Experience and Education
Publications
SHAP Tutorial
Understanding the Basics of Bayesian Linear Regression
Classifying Medicine
The Peanuts Project

Search for:

Search for:

What I Read: Optimizing LLM in production

Home/What I Learn/What I Read: Optimizing LLM in production

By BylineAndrew Fairless on November 6, 2023October 5, 2023

https://huggingface.co/blog/optimize-llm

Optimizing your LLM in production
September 15, 2023
Patrick von Platen

“…efficient LLM deployment…. pros and cons of adopting lower precision, provide a comprehensive exploration of the latest attention algorithms, and discuss improved LLM architectures.”

Cat Links What I Learn Tag Links attention chatbot embedding large language model machine learning neural network quantization transformer

Post navigation

What I Read: Features Are Important?Prev post

What I Read: How make history with LLMsNext post

Categories

Bayesian statistics Machine Learning Statistics What I Learn What I Make

Tags

artificial intelligence attention Bayesian chatbot cognition computer vision database data engineering data science deployment DevOps efficiency embedding generalization generative GPU graph healthcare image interpretability large language model latency linear algebra machine learning medicine memory MLOps monitoring natural language processing neural network neuroscience optimization pipeline probability Python recurrent regression reinforcement learning scalability software engineering SQL statistics training transformer unit test

Copyright © 2025 Andrew Fairless, Ph.D.. All Rights Reserved. | Simple Persona by Catch Themes

Scroll Up