Skip to content

Andrew Fairless, Ph.D.

Data, Science, and Tinkering

Overview
Experience and Education
Publications
SHAP Tutorial
Understanding the Basics of Bayesian Linear Regression
Classifying Medicine
The Peanuts Project

Search for:

Search for:

What I Read: Adversarial Attacks on LLMs

Home/What I Learn/What I Read: Adversarial Attacks on LLMs

By BylineAndrew Fairless on February 6, 2024December 19, 2023

https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/

Adversarial Attacks on LLMs
Lilian Weng
October 25, 2023

“Adversarial attacks are inputs that trigger the model to output something undesired.”

Cat Links What I Learn Tag Links adversarial autoregression embedding gradient large language model loss machine learning natural language processing optimization token

Post navigation

What I Read: Finetuning LLMs Using LoRAPrev post

What I Read: Multi-Modal Retrieval-Augmented GenerationNext post

Categories

Bayesian statistics Machine Learning Statistics What I Learn What I Make

Tags

artificial intelligence attention Bayesian chatbot classification cloud cognition computer vision database data engineering data science deployment efficiency embedding generalization generative GPU graph healthcare image interpretability large language model latency linear algebra machine learning medicine MLOps monitoring natural language processing neural network neuroscience optimization pipeline probability Python recurrent regression reinforcement learning scalability software engineering SQL statistics training transformer unit test

Copyright © 2025 Andrew Fairless, Ph.D.. All Rights Reserved. | Simple Persona by Catch Themes

Scroll Up