Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

Share:

Similar Tracks

Deep Dive: Quantizing Large Language Models, part 2 Julien Simon

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA PyTorch

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote Snowflake Inc.

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM GPU MODE

Decoder-only inference: a step-by-step deep dive Julien Simon

How might LLMs store facts | DL7 3Blue1Brown

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral MLOps.community

Accelerating LLM Inference with vLLM Databricks

LLM inference optimization: Architecture, KV cache and Flash attention YanAITalk

Deep Dive into Inference Optimization for LLMs with Philip Kiely Software Huddle

Transformers (how LLMs work) explained visually | DL5 3Blue1Brown

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou AI Engineer

A Hackers' Guide to Language Models Jeremy Howard

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024 Neural Magic

How to pick a GPU and Inference Engine? Trelis Research

Visualizing transformers and attention | Talk for TNG Big Tech Day '24 Grant Sanderson

Deep dive - Better Attention layers for Transformer models Julien Simon