"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
# {"model":"ggml-gpt4all-j","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}
```
</details>
## Prompt templates
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
@ -127,6 +163,7 @@ The API takes takes the following parameters:
| threads | THREADS | Number of Physical cores | The number of threads to use for text generation. |
| address | ADDRESS | :8080 | The address and port to listen on. |
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
@ -138,8 +175,14 @@ You can check out the [OpenAI API reference](https://platform.openai.com/docs/ap
Following the list of endpoints/parameters supported.
Note:
- You can also specify the model a part of the OpenAI token.
- If only one model is available, the API will use it for all the requests.
#### Chat completions
<details>
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
Available additional parameters: `top_p`, `top_k`, `max_tokens`
</details>
#### List models
<details>
You can list all the models available with:
```
curl http://localhost:8080/v1/models
```
</details>
## Using other models
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):