**LocalAI** is a straightforward, drop-in replacement API compatible with OpenAI for local CPU inferencing, based on [llama.cpp](https://github.com/ggerganov/llama.cpp), [gpt4all](https://github.com/nomic-ai/gpt4all), [rwkv.cpp](https://github.com/saharNooby/rwkv.cpp) and [ggml](https://github.com/ggerganov/ggml), including support GPT4ALL-J which is licensed under Apache 2.0.
**LocalAI** is a drop-in replacement REST API compatible with OpenAI for local CPU inferencing. It allows to run models locally or on-prem with consumer grade hardware. It is based on [llama.cpp](https://github.com/ggerganov/llama.cpp), [gpt4all](https://github.com/nomic-ai/gpt4all), [rwkv.cpp](https://github.com/saharNooby/rwkv.cpp) and [ggml](https://github.com/ggerganov/ggml), including support GPT4ALL-J which is licensed under Apache 2.0.
- OpenAI compatible API
- Supports multiple-models
@ -19,7 +19,13 @@
LocalAI is a community-driven project, focused on making the AI accessible to anyone. Any contribution, feedback and PR is welcome! It was initially created by [mudler](https://github.com/mudler/) at the [SpectroCloud OSS Office](https://github.com/spectrocloud).
### News
- 02-05-2023: Support for `rwkv.cpp` models ( https://github.com/go-skynet/LocalAI/pull/158 ) and for `/edits` endpoint
- 01-05-2023: Support for SSE stream of tokens in `llama.cpp` backends ( https://github.com/go-skynet/LocalAI/pull/152 )
### Socials and community chatter
- Follow [@LocalAI_API](https://twitter.com/LocalAI_API) on twitter.
- [Reddit post](https://www.reddit.com/r/selfhosted/comments/12w4p2f/localai_openai_compatible_api_to_run_llm_models/) about LocalAI.
- [cerebras-GPT with ggml](https://huggingface.co/lxe/Cerebras-GPT-2.7B-Alpaca-SP-ggml)
- [RWKV](https://github.com/BlinkDL/RWKV-LM) with [rwkv.cpp](https://github.com/saharNooby/rwkv.cpp)
- [RWKV](https://github.com/BlinkDL/RWKV-LM) models with [rwkv.cpp](https://github.com/saharNooby/rwkv.cpp)
It should also be compatible with StableLM and GPTNeoX ggml models (untested)
Note: You might need to convert older models to the new format, see [here](https://github.com/ggerganov/llama.cpp#using-gpt4all) for instance to run `gpt4all`.
### RWKV
<details>
For `rwkv` models, you need to put also the associated tokenizer along with the ggml model:
```
ls models
36464540 -rw-r--r-- 1 mudler mudler 1.2G May 3 10:51 rwkv_small
36464543 -rw-r--r-- 1 mudler mudler 2.4M May 3 10:51 rwkv_small.tokenizer.json
```
</details>
## Usage
> `LocalAI` comes by default as a container image. You can check out all the available images with corresponding tags [here](https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest).
![Screenshot from 2023-04-26 23-59-55](https://user-images.githubusercontent.com/2420543/234715439-98d12e03-d3ce-4f94-ab54-2b256808e05e.png)
To see other examples on how to integrate with other projects for instance chatbot-ui, see: [examples](https://github.com/go-skynet/LocalAI/tree/master/examples/).
## Prompt templates
### Advanced configuration
LocalAI can be configured to serve user-defined models with a set of default parameters and templates.
<details>
You can create multiple `yaml` files in the models path or either specify a single YAML configuration file.
Consider the following `models` folder in the `example/chatbot-ui`:
```
base ❯ ls -liah examples/chatbot-ui/models
36487587 drwxr-xr-x 2 mudler mudler 4.0K May 3 12:27 .
36487586 drwxr-xr-x 3 mudler mudler 4.0K May 3 10:42 ..
# template file ".tmpl" with the prompt template to use by default on the endpoint call. Note there is no extension in the files
completion: completion
chat: ggml-gpt4all-j
```
Specifying a `config-file` via CLI allows to declare models in a single file as a list, for instance:
```yaml
- name: list1
parameters:
model: testmodel
context_size: 512
threads: 10
stopwords:
- "HUMAN:"
- "### Response:"
roles:
user: "HUMAN:"
system: "GPT:"
template:
completion: completion
chat: ggml-gpt4all-j
- name: list2
parameters:
model: testmodel
context_size: 512
threads: 10
stopwords:
- "HUMAN:"
- "### Response:"
roles:
user: "HUMAN:"
system: "GPT:"
template:
completion: completion
chat: ggml-gpt4all-j
```
See also [chatbot-ui](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) as an example on how to use config files.
</details>
### Prompt templates
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
@ -145,15 +255,143 @@ The below instruction describes a task. Write a response that appropriately comp
See the [prompt-templates](https://github.com/go-skynet/LocalAI/tree/master/prompt-templates) directory in this repository for templates for some of the most popular models.
For the edit endpoint, an example template for alpaca-based models can be:
```yaml
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{{.Instruction}}
### Input:
{{.Input}}
### Response:
```
</details>
### CLI
You can control LocalAI with command line arguments, to specify a binding address, or the number of threads.
| config-file | CONFIG_FILE | empty | Path to a LocalAI config file. |
</details>
## Setup
Currently LocalAI comes as a container image and can be used with docker or a container engine of choice. You can check out all the available images with corresponding tags [here](https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest).
"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
```
</details>
## Installation
### Windows compatibility
Currently LocalAI comes as container images and can be used with docker or a containre engine of choice.
It should work, however you need to make sure you give enough resources to the container. See https://github.com/go-skynet/LocalAI/issues/2
### Run LocalAI in Kubernetes
LocalAI can be installed inside Kubernetes with helm.
<details>
1. Add the helm repo
@ -198,51 +436,7 @@ Check out also the [helm chart repository on GitHub](https://github.com/go-skyne
</details>
## API
`LocalAI` provides an API for running text generation as a service, that follows the OpenAI reference and can be used as a drop-in. The models once loaded the first time will be kept in memory.
| config-file | CONFIG_FILE | empty | Path to a LocalAI config file. |
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
</details>
### Supported OpenAI API endpoints
## Supported OpenAI API endpoints
You can check out the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create).
@ -253,7 +447,7 @@ Note:
- You can also specify the model as part of the OpenAI token.
- If only one model is available, the API will use it for all the requests.
#### Chat completions
### Chat completions
<details>
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
```
See also [chatbot-ui](https://github.com/go-skynet/LocalAI/tree/master/examples/chatbot-ui) as an example on how to use config files.
Available additional parameters: `top_p`, `top_k`, `max_tokens`
</details>
## Windows compatibility
It should work, however you need to make sure you give enough resources to the container. See https://github.com/go-skynet/LocalAI/issues/2
## Build locally
Pre-built images might fit well for most of the modern hardware, however you can and might need to build the images manually.
In order to build the `LocalAI` container image locally you can use `docker`:
### List models
```
# build the image
docker build -t LocalAI .
docker run LocalAI
```
Or build the binary with `make`:
```
make build
```
## Build on mac
Building on Mac (M1 or M2) works, but you may need to install some prerequisites using brew. The below has been tested by one mac user and found to work. Note that this doesn't use docker to run the server: