| alpaca | ALPACA | true | Set to true for alpaca models. |
| gpt4all | GPT4ALL | false | Set to true for gpt4all models. |
Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
@ -111,9 +113,9 @@ Below is an instruction that describes a task. Write a response that appropriate
## Using other models
You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.
You can specify a model binary to be used for inference with `--model`.
13B and 30B models are known to work:
13B and 30B alpaca models are known to work:
```
# Download the model image, extract the model
@ -121,6 +123,17 @@ You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin
```
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted: