💡 Get help - ❓FAQ 💭Discussions 💬 Discord 📖 Documentation website
💻 Quickstart 🖼️ Models 🚀 Roadmap 🛫 Examples Try on
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by Ettore Di Giacinto.
Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like:
| Talk Interface | Generate Audio |
|---|---|
![]() | ![]() |
| Models Overview | Generate Images |
|---|---|
![]() | ![]() |
| Chat Interface | Home |
|---|---|
![]() | ![]() |
| Login | Swarm |
|---|---|
![]() | ![]() |
⚠️ Note: The
install.shscript is currently experiencing issues due to the heavy changes currently undergoing in LocalAI and may produce broken or misconfigured installations. Please use Docker installation (see below) or manual binary installation until issue #8032 is resolved.
Run the installer script:
# Basic installation
curl https://localai.io/install.sh | sh
For more installation options, see Installer Options.
Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
💡 Docker Run vs Docker Start
docker runcreates and starts a new container. If a container with the same name already exists, this command will fail.docker startstarts an existing container that was previously created withdocker run.If you've already run LocalAI before and want to start it again, use:
docker start -i local-ai
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# CUDA 13.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# CUDA 12.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# NVIDIA Jetson (L4T) ARM64
# CUDA 12 (for Nvidia AGX Orin and similar platforms)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
# CUDA 13 (for Nvidia DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 13 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
For more information about the AIO images and pre-downloaded models, see Container Documentation.
To load models:
# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with the phi-2 model directly from huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
⚡ Automatic Backend Detection: When you install models from the gallery or YAML files, LocalAI automatically detects your system's GPU capabilities (NVIDIA, AMD, Intel) and downloads the appropriate backend. For advanced configuration options, see GPU Acceleration.
For more information, see 💻 Getting started, if you are interested in our roadmap items and future enhancements, you can see the Issues labeled as Roadmap here
development suffix in the gallery ): https://github.com/mudler/LocalAI/pull/6049 https://github.com/mudler/LocalAI/pull/6119 https://github.com/mudler/LocalAI/pull/6121 https://github.com/mudler/LocalAI/pull/6060Roadmap items: List of issues
llama.cpp, transformers, vllm ... 📖 and more)whisper.cpp)LocalAI supports a comprehensive range of AI backends with multiple acceleration options:
| Backend | Description | Acceleration Support |
|---|---|---|
| llama.cpp | LLM inference in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| vLLM | Fast LLM inference with PagedAttention | CUDA 12/13, ROCm, Intel |
| transformers | HuggingFace transformers framework | CUDA 12/13, ROCm, Intel, CPU |
| MLX | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
| MLX-VLM | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
| Backend | Description | Acceleration Support |
|---|---|---|
| whisper.cpp | OpenAI Whisper in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| faster-whisper | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, CPU |
| moonshine | Ultra-fast transcription engine for low-end devices | CUDA 12/13, Metal, CPU |
| coqui | Advanced TTS with 1100+ languages | CUDA 12/13, ROCm, Intel, CPU |
| kokoro | Lightweight TTS model | CUDA 12/13, ROCm, Intel, CPU |
| chatterbox | Production-grade TTS | CUDA 12/13, CPU |
| piper | Fast neural TTS system | CPU |
| kitten-tts | Kitten TTS models | CPU |
| silero-vad | Voice Activity Detection | CPU |
| neutts | Text-to-speech with voice cloning | CUDA 12/13, ROCm, CPU |
| vibevoice | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU |
| pocket-tts | Lightweight CPU-based TTS | CUDA 12/13, ROCm, Intel, CPU |
| qwen-tts | High-quality TTS with custom voice, voice design, and voice cloning | CUDA 12/13, ROCm, Intel, CPU |
| ace-step | Music generation from text descriptions, lyrics, or audio samples | CUDA 12/13, ROCm, Intel, Metal, CPU |
| Backend | Description | Acceleration Support |
|---|---|---|
| stablediffusion.cpp | Stable Diffusion in C/C++ | CUDA 12/13, Intel SYCL, Vulkan, CPU |
| diffusers | HuggingFace diffusion models | CUDA 12/13, ROCm, Intel, Metal, CPU |
| Backend | Description | Acceleration Support |
|---|---|---|
| rfdetr | Real-time object detection | CUDA 12/13, Intel, CPU |
| rerankers | Document reranking API | CUDA 12/13, ROCm, Intel, CPU |
| local-store | Vector database | CPU |
| huggingface | HuggingFace API integration | API-based |
| Acceleration Type | Supported Backends | Hardware Support |
|---|---|---|
| NVIDIA CUDA 12 | All CUDA-compatible backends | Nvidia hardware |
| NVIDIA CUDA 13 | All CUDA-compatible backends | Nvidia hardware |
| AMD ROCm | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts, ace-step | AMD Graphics |
| Intel oneAPI | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts, ace-step | Intel Arc, Intel iGPUs |
| Apple Metal | llama.cpp, whisper, diffusers, MLX, MLX-VLM, moonshine, ace-step | Apple M1/M2/M3+ |
| Vulkan | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
| NVIDIA Jetson (CUDA 12) | llama.cpp, whisper, stablediffusion, diffusers, rfdetr, ace-step | ARM64 embedded AI (AGX Orin, etc.) |
| NVIDIA Jetson (CUDA 13) | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (DGX Spark) |
| CPU Optimized | All backends | AVX/AVX2/AVX512, quantization support |
Build and deploy custom containers:
WebUIs:
Agentic Libraries:
MCPs:
OS Assistant:
Model galleries
Voice:
Other:
If you utilize this repository, data in a downstream project, please consider citing it with:
@misc{localai, author = {Ettore Di Giacinto}, title = {LocalAI: The free, Open source OpenAI alternative}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/go-skynet/LocalAI}},
Do you find LocalAI useful?
Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.
A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:
A special thanks to individual sponsors that contributed to the project, a full list is in Github and buymeacoffee, a special shout out goes to drikster80 for being generous. Thank you everyone!
LocalAI is a community-driven project created by Ettore Di Giacinto.
MIT - Author Ettore Di Giacinto mudler@localai.io
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
This is a community project, a special thanks to our contributors! 🤗