Add gpt4all instructions

add/first-example
mudler 2 years ago
parent c61b023bc8
commit 999676b106
  1. 17
      README.md

@ -38,6 +38,7 @@ llama-cli --model <model_path> --instruction <instruction> [--input <input>] [--
| top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. | | top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. |
| context-size | CONTEXT_SIZE | 512 | Default token context size. | | context-size | CONTEXT_SIZE | 512 | Default token context size. |
| alpaca | ALPACA | true | Set to true for alpaca models. | | alpaca | ALPACA | true | Set to true for alpaca models. |
| gpt4all | GPT4ALL | false | Set to true for gpt4all models. |
Here's an example of using `llama-cli`: Here's an example of using `llama-cli`:
@ -84,6 +85,7 @@ The API takes takes the following:
| address | ADDRESS | :8080 | The address and port to listen on. | | address | ADDRESS | :8080 | The address and port to listen on. |
| context-size | CONTEXT_SIZE | 512 | Default token context size. | | context-size | CONTEXT_SIZE | 512 | Default token context size. |
| alpaca | ALPACA | true | Set to true for alpaca models. | | alpaca | ALPACA | true | Set to true for alpaca models. |
| gpt4all | GPT4ALL | false | Set to true for gpt4all models. |
Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body: Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
@ -111,9 +113,9 @@ Below is an instruction that describes a task. Write a response that appropriate
## Using other models ## Using other models
You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`. You can specify a model binary to be used for inference with `--model`.
13B and 30B models are known to work: 13B and 30B alpaca models are known to work:
``` ```
# Download the model image, extract the model # Download the model image, extract the model
@ -121,6 +123,17 @@ You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin
``` ```
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted:
```bash
wget -O tokenizer.model https://huggingface.co/decapoda-research/llama-30b-hf/resolve/main/tokenizer.model
mkdir models
cp gpt4all.. models/
git clone https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82
pip install sentencepiece
python 828bddec6162a023114ce19146cb2b82/gistfile1.txt models tokenizer.model
```
### Golang client API ### Golang client API
The `llama-cli` codebase has also a small client in go that can be used alongside with the api: The `llama-cli` codebase has also a small client in go that can be used alongside with the api:

Loading…
Cancel
Save