https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/
Adversarial Attacks on LLMs
Lilian Weng
October 25, 2023
“Adversarial attacks are inputs that trigger the model to output something undesired.”
Data, Science, and Tinkering
https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/
Adversarial Attacks on LLMs
Lilian Weng
October 25, 2023
“Adversarial attacks are inputs that trigger the model to output something undesired.”