OpenAI-Compatible API
DocsGPT exposes /v1/chat/completions following the standard chat completions protocol. Point any compatible client — opencode, Aider, LibreChat or the OpenAI SDKs — at your DocsGPT Agent by changing only the base URL and API key.
Quick Start
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:7091/v1", # or https://gptcloud.arc53.com/v1
api_key="your_agent_api_key",
)
response = client.chat.completions.create(
model="docsgpt-agent",
messages=[{"role": "user", "content": "Summarize our refund policy"}],
)
print(response.choices[0].message.content)The model field is accepted but ignored — the agent bound to your API key determines the model. The agent’s prompt, sources, tools, and default model are loaded automatically.
Base URL & Auth
| Environment | Base URL |
|---|---|
| Local | http://localhost:7091/v1 |
| Cloud | https://gptcloud.arc53.com/v1 |
Authenticate with Authorization: Bearer <agent_api_key>.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /v1/chat/completions | Chat request (streaming or non-streaming) |
GET | /v1/models | List agents available to your key |
Streaming
Set "stream": true. You’ll receive SSE chunks with choices[0].delta.content. DocsGPT-specific events (sources, tool calls) arrive as extra frames that carry a top-level docsgpt key on an otherwise-empty chunk — standard clients ignore them.
stream = client.chat.completions.create(
model="docsgpt-agent",
stream=True,
messages=[{"role": "user", "content": "Explain vector search"}],
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)Sampling Parameters
Standard OpenAI sampling parameters are forwarded to the model. When omitted, the agent’s configured defaults apply. Supported: temperature, max_tokens (or max_completion_tokens), top_p, frequency_penalty, presence_penalty, stop, seed.
{
"model": "docsgpt-agent",
"messages": [{"role": "user", "content": "Write a haiku about search"}],
"temperature": 0.2,
"max_tokens": 256,
"seed": 42
}Structured Output
You can force the model to return JSON matching a schema, using either the OpenAI response_format field or the response_schema convenience field.
response_format
{
"model": "docsgpt-agent",
"messages": [{"role": "user", "content": "Extract the order id and total"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "order",
"strict": true,
"schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"total": {"type": "number"}
},
"required": ["order_id", "total"]
}
}
}
}response_formatfollows OpenAI Structured Outputs.strictdefaults totrue; setstrict: falseto relax enforcement.response_format: {"type": "json_object"}requests JSON without a fixed schema (the model is steered by the prompt).response_schemais a DocsGPT convenience: pass a raw JSON Schema object (or a{"schema": {...}}wrapper) directly.
Multimodal Input (text + images)
User messages may use OpenAI typed-content arrays with image_url parts. Images are forwarded to vision-capable models.
{
"model": "docsgpt-agent",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this screenshot?"},
{"type": "image_url", "image_url": {"url": "https://example.com/shot.png"}}
]
}
]
}Tool Calling (client-side, stateless)
You can register your own tools and execute them on the client. The flow is stateless — OpenAI clients that don’t carry a conversation_id re-send the full message history each turn, and DocsGPT rebuilds the agent from it.
- Send a request with a
toolsarray. - If the agent decides to call a tool, the response comes back with
finish_reason: "tool_calls"and atool_callsarray (andcontent: null). - Execute the tool(s) on your side, then re-POST the full message history with the assistant’s
tool_callsmessage followed byrole: "tool"result messages. - DocsGPT continues the run and returns the final answer.
{
"model": "docsgpt-agent",
"messages": [
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "tool_calls": [
{"id": "call_1", "type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\":\"Paris\"}"}}
]},
{"role": "tool", "tool_call_id": "call_1", "content": "18°C, clear"}
],
"tools": [ { "type": "function", "function": { "name": "get_weather", "...": "..." } } ]
}Reasoning
For models that emit reasoning (“thinking”) tokens, the response surfaces them in a non-standard reasoning_content field (a reasoning_content delta when streaming). Standard clients ignore it; clients that understand it can display the model’s thinking separately from the answer.
Idempotent Retries
Add an Idempotency-Key header so a retried request returns the stored first response instead of re-running the agent (which would duplicate the answer and double-bill tokens).
curl -X POST http://localhost:7091/v1/chat/completions \
-H "Authorization: Bearer your_agent_api_key" \
-H "Idempotency-Key: 8f1c...unique-per-request" \
-H "Content-Type: application/json" \
-d '{"model":"docsgpt-agent","messages":[{"role":"user","content":"hi"}]}'- Opt-in — no header means today’s behavior (every request runs).
- Non-streaming only — streaming replay is not supported.
- A completed key replays the cached body (and status) for 24 hours.
- A request with a key whose first attempt is still in flight returns HTTP 409.
- Keys are scoped per agent and capped at 256 characters (oversized keys are rejected).
System Prompt Override
System messages are dropped by default — the agent’s configured prompt is used. To allow callers to override it, enable Allow prompt override in the agent’s Advanced settings.
When an override is active, the agent’s prompt template is replaced wholesale — template variables like {summaries} are not substituted.
Conversation Persistence
Conversations are always persisted server-side, and the response includes docsgpt.conversation_id. They never appear in the agent owner’s sidebar — /v1 traffic is stored hidden, so external clients can’t clutter the owner’s conversation list.
Stateless tool continuations (no conversation_id, e.g. opencode) skip persistence by default to avoid writing orphan rows; set docsgpt.persist to override. The legacy docsgpt.save_conversation flag from older releases is deprecated and ignored.
DocsGPT Extension Fields
DocsGPT adds an optional docsgpt object to both requests and responses for features outside the OpenAI schema.
Request (docsgpt.*):
| Field | Description |
|---|---|
attachments | List of attachment IDs to include as context for this turn. |
persist | Force-enable/disable conversation persistence (mainly for stateless tool continuations). |
Response (docsgpt.*):
| Field | Description |
|---|---|
conversation_id | Server-side conversation ID for this exchange. |
sources | RAG sources used to answer. |
tool_calls | Completed tool-call results from the run. |
When streaming, these arrive on otherwise-empty chunks that carry a top-level docsgpt key, so strict OpenAI clients still validate each frame.
When to Use Native Endpoints Instead
Use /api/answer or /stream if you need server-side attachments, passthrough template variables, explicit conversation_id reuse, or sidebar visibility control via visibility.