Skip to main content
llama.cpp logo

About

Efficient LLM inference in C/C++ with support for CPU, Metal, and CUDA acceleration.