vLLM

Name: vLLM
Rating: 4.1 (1 reviews)
Author: Star Stack

High-throughput, memory-efficient inference engine for LLMs

AI & Machine Learning AI Infrastructure

Visit Website GitHub 79.2K

About

vLLM is an inference and serving engine for large language models that uses PagedAttention to manage memory efficiently. It supports a wide range of open-source models across various hardware platforms including NVIDIA, AMD, and Apple Silicon.