GPU – Andrew Fairless, Ph.D.

What I Read: Short, Nvidia

By Andrew Fairless on May 8, 2025February 2, 2025

https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda The Short Case for Nvidia StockJeffrey EmanuelJanuary 25, 2025 “…NVIDIA faces an unprecedented convergence of competitive threats that make its premium valuation increasingly difficult to justify… The company’s supposedContinue readingWhat I Read: Short, Nvidia

What I Read: optimizing softmax

By Andrew Fairless on April 16, 2025January 11, 2025

https://maharshi.bearblog.dev/optimizing-softmax-cuda Learning CUDA by optimizing softmax: A worklogMaharshi Pandya04 Jan, 2025 “Optimizing softmax, especially in the context of GPU programming with CUDA, presents many opportunities for learning.”

What I Read: GPU Computing

By Andrew Fairless on March 4, 2025November 16, 2024

https://blog.codingconfessions.com/p/gpu-computing What Every Developer Should Know About GPU ComputingAbhinav UpadhyayOct 18, 2023 “GPUs have become incredibly important because of their pervasive use in deep learning. Today, it is essential forContinue readingWhat I Read: GPU Computing

What I Read: Shapes, Matrix Multiplications

By Andrew Fairless on February 27, 2025November 16, 2024

https://www.thonking.ai/p/what-shapes-do-matrix-multiplications What Shapes Do Matrix Multiplications Like?Horace HeApr 01, 2024 “It has become tribal knowledge that the particular shapes chosen for matmuls has a surprisingly large effect on their performance.Continue readingWhat I Read: Shapes, Matrix Multiplications

What I Read: Transformers Inference Optimization

By Andrew Fairless on January 27, 2025October 19, 2024

https://astralord.github.io/posts/transformer-inference-optimization-toolset Transformers Inference Optimization ToolsetAleksandr SamarinOct 1, 2024 “Large Language Models are pushing the boundaries of artificial intelligence, but their immense size poses significant computational challenges. As these models grow,Continue readingWhat I Read: Transformers Inference Optimization

What I Read: bare metal to 70B

By Andrew Fairless on September 25, 2024July 8, 2024

https://imbue.com/research/70b-infrastructure From bare metal to a 70B model: infrastructure set-up and scriptsThe Imbue TeamJune 25, 2024 “…we trained a 70B parameter model from scratch on our own infrastructure that outperformedContinue readingWhat I Read: bare metal to 70B

What I Read: Ring Attention

By Andrew Fairless on July 8, 2024April 23, 2024

https://coconut-mode.com/posts/ring-attention Ring Attention ExplainedKilian Haefeli, Simon Zirui Guo, Bonnie Li10 Apr 2024 “Context length in Large Language Models has expanded rapidly…. What if we we could use multiple devices toContinue readingWhat I Read: Ring Attention

What I Read: Scaling ChatGPT, Engineering Challenges

By Andrew Fairless on April 18, 2024March 1, 2024

https://newsletter.pragmaticengineer.com/p/scaling-chatgpt Scaling ChatGPT: Five Real-World Engineering ChallengesGergely OroszFeb 20, 2024 “Just one year after its launch, ChatGPT had more than 100M weekly users. In order to meet this explosive demand,Continue readingWhat I Read: Scaling ChatGPT, Engineering Challenges

What I Read: misleading GPU, CPU benchmarks

By Andrew Fairless on March 21, 2024February 5, 2024

https://pythonspeed.com/articles/gpu-vs-cpu/ Beware of misleading GPU vs CPU benchmarksItamar Turner-TrauringLast updated 17 Jan 2024, originally created 17 Jan 2024 “Unfortunately, while those speed-ups are impressive, they are also misleading. GPU-based librariesContinue readingWhat I Read: misleading GPU, CPU benchmarks

Tag: GPU