Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

Share:

Similar Tracks

GDC 2024 - GPU Work Graphs: Welcome to the Future of GPU Programming AMD Developer Central

vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley AMD Developer Central

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica Nadav Timor

vLLM Office Hours - Distributed Inference with vLLM - January 23, 2025 Neural Magic

Efficient Inference on MI300X: Our Journey at Microsoft, Rajat Monga, Microsoft, CVP AI Frameworks AMD Developer Central

Accelerating LLM Inference with vLLM Databricks

Inside AI Code Generation with Meta, AMD, and Hugging Face AMD Developer Central

Intermediate English Practice | Improve Your Listening & Speaking | Learn English With Podcast The English Pod Community

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral MLOps.community

CUDA Mode Keynote | Lily Liu | vLLM Accel

AMD's CEO Wants to Chip Away at Nvidia's Lead | The Circuit with Emily Chang Bloomberg Originals

Meta Goes ALL IN on AMD's MI300X AI Chip! Antonio Linares

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA PyTorch

gRPC and AI | Eric Anderson & Yuexin Li, Google gRPC

Fast LLM Serving with vLLM and PagedAttention Anyscale

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou AI Engineer

AI and the Open-Source Revolution | Fireside Chat with SVP of AI at AMD, Vamsi Boppana & Zack Kass AMD Developer Central

How to Build a Local AI Agent With Python (Ollama, LangChain & RAG) Tech With Tim

AI Inference: The Secret to AI's Superpowers IBM Technology

Ollama Structured Outputs with Phi4 is KILLER - Get Started NOW!! Chris Hay