"messages": [{"role": "user", "content": "How are you?"}],
"temperature": 0.9
}'
# {"model":"ggml-gpt4all-j","choices":[{"message":{"role":"assistant","content":"I'm doing well, thanks. How about you?"}}]}
```
</details>
## Prompt templates
## Prompt templates
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
The API doesn't inject a default prompt for talking to the model. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release.
@ -127,6 +163,7 @@ The API takes takes the following parameters:
| threads | THREADS | Number of Physical cores | The number of threads to use for text generation. |
| threads | THREADS | Number of Physical cores | The number of threads to use for text generation. |
| address | ADDRESS | :8080 | The address and port to listen on. |
| address | ADDRESS | :8080 | The address and port to listen on. |
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
Once the server is running, you can start making requests to it using HTTP, using the OpenAI API.
@ -136,10 +173,16 @@ Once the server is running, you can start making requests to it using HTTP, usin
You can check out the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create).
You can check out the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create).
Following the list of endpoints/parameters supported.
Following the list of endpoints/parameters supported.
Note:
- You can also specify the model a part of the OpenAI token.
- If only one model is available, the API will use it for all the requests.
#### Chat completions
#### Chat completions
<details>
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
For example, to generate a chat completion, you can send a POST request to the `/v1/chat/completions` endpoint with the instruction as the request body:
Available additional parameters: `top_p`, `top_k`, `max_tokens`
Available additional parameters: `top_p`, `top_k`, `max_tokens`
</details>
#### List models
#### List models
<details>
You can list all the models available with:
You can list all the models available with:
```
```
curl http://localhost:8080/v1/models
curl http://localhost:8080/v1/models
```
```
</details>
## Using other models
## Using other models
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):
gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):