From 999676b10642668b1a1443f81b2ea4d874577964 Mon Sep 17 00:00:00 2001 From: mudler Date: Wed, 29 Mar 2023 18:58:54 +0200 Subject: [PATCH] Add gpt4all instructions --- README.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a0b09ee..cb680aa 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,7 @@ llama-cli --model --instruction [--input ] [-- | top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. | | context-size | CONTEXT_SIZE | 512 | Default token context size. | | alpaca | ALPACA | true | Set to true for alpaca models. | +| gpt4all | GPT4ALL | false | Set to true for gpt4all models. | Here's an example of using `llama-cli`: @@ -84,6 +85,7 @@ The API takes takes the following: | address | ADDRESS | :8080 | The address and port to listen on. | | context-size | CONTEXT_SIZE | 512 | Default token context size. | | alpaca | ALPACA | true | Set to true for alpaca models. | +| gpt4all | GPT4ALL | false | Set to true for gpt4all models. | Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body: @@ -111,9 +113,9 @@ Below is an instruction that describes a task. Write a response that appropriate ## Using other models -You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`. +You can specify a model binary to be used for inference with `--model`. -13B and 30B models are known to work: +13B and 30B alpaca models are known to work: ``` # Download the model image, extract the model @@ -121,6 +123,17 @@ You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.3-lite docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin ``` +gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted: + +```bash +wget -O tokenizer.model https://huggingface.co/decapoda-research/llama-30b-hf/resolve/main/tokenizer.model +mkdir models +cp gpt4all.. models/ +git clone https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82 +pip install sentencepiece +python 828bddec6162a023114ce19146cb2b82/gistfile1.txt models tokenizer.model +``` + ### Golang client API The `llama-cli` codebase has also a small client in go that can be used alongside with the api: