Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Share:

Similar Tracks

Adaptive Compute LLMs with Early Exits - Tal Schuster (Google DeepMind) Nadav Timor

LangChain vs LangGraph: A Tale of Two Frameworks IBM Technology

Lecture Series in AI: "An AI Stack: From Scaling AI Workloads to LLM Evaluation” with Ion Stoica Columbia Engineering

Accelerating LLM Inference with vLLM Databricks

Optimizing attention for modern hardware - Tri Dao (Princeton & Together AI) Nadav Timor

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI AMD Developer Central

Turning Academic Open Source into Startup Success ft Databricks Founder Ion Stoica Sequoia Capital

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang Nadav Timor

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA PyTorch

How AI Could Save (Not Destroy) Education | Sal Khan | TED TED

How to pick a GPU and Inference Engine? Trelis Research

Enable Large language model deployment across cloud and edge with ML Compilation - Tianqi Chen Nadav Timor

CUDA Mode Keynote | Lily Liu | vLLM Accel

Blender Tutorial for Complete Beginners - Part 1 Blender Guru

Fast LLM Serving with vLLM and PagedAttention Anyscale