Optimizing vLLM Performance through Quantization | Ray Summit 2024

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Share:

Similar Tracks

Scaling LLMs on Google Cloud: Synergy Between Ray, TPU, and GKE | Ray Summit 2024 Anyscale

How Rubrik Unlocked AI at Scale with Ray Serve | Ray Summit 2024 Anyscale

Databricks' vLLM Optimization for Cost-Effective LLM Inference | Ray Summit 2024 Anyscale

Surrogate model-based algorithms for expensive black-box optimization OPTIMA ARC

AGI Builders Meetup | New York City, October 2024 BentoML

Spotify Harnesses Ray for Next-Gen AI Infrastructure | Ray Summit 2024 Anyscale

How Quantum Computers Break The Internet... Starting Now Veritasium

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024 Anyscale

Multi-tenant Data Processing with Ray: Phaidra's Approach to Industrial AI | Ray Summit 2024 Anyscale

Transformers (how LLMs work) explained visually | DL5 3Blue1Brown

SOSP '24 — Modular Verification of Secure and Leakage-Free Systems — Anish Athalye Anish Athalye

Aviral Kumar: What Do We Need to Scale Up Deep Reinforcement Learning? (2024-03-27) UMass Machine Learning & Friends Lunch

Scaling AI at Autodesk with Ray and Metaflow | Ray Summit 2024 Anyscale

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024 Neural Magic

1 Artificial Intelligence and Machine Learning Armando Tauro

Pricing and Packaging Your AI Products for Scale | Ray Summit 2024 Anyscale

Erlang Factory SF 2016 - Concurrency + Distribution = Scalability + Availability, a ... Erlang Solutions

7 Gerhard Neumann, Karlsruhe Institute of Technology - BMVA: Robotics Foundation & World Models BMVA: British Machine Vision Association

From Spark to Ray: CSS's Data Revolution with Daft | Ray Summit 2024 Anyscale

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED WIRED