llama.cpp
LLM inference library in C/C++ with CPU, Metal, CUDA, and OpenVINO support
About
llama.cpp provides an efficient C/C++ library for running large language models on local hardware. It supports a wide range of backends including CPU, Metal, CUDA, and OpenVINO for optimized inference.