Chat Completions (Streaming)
Stream real-time responses from AI models via the OpenAI Chat Completions protocol. Ideal for chat applications and interactive use cases.
Try it now
Quick Start
Step 1: Get your API Key from the Console.
Step 2: Send a streaming request:
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "deepseek-v4-flash",
"stream": true,
"messages": [
{"role": "user", "content": "Briefly explain quantum computing"}
]
}'from openai import OpenAI
client = OpenAI(
api_key="sk-xxx",
base_url="https://open.dieyuyun.com/v1"
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Briefly explain quantum computing"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'sk-xxx',
baseURL: 'https://open.dieyuyun.com/v1',
})
const stream = await client.chat.completions.create({
model: 'deepseek-v4-flash',
messages: [{ role: 'user', content: 'Briefly explain quantum computing' }],
stream: true,
})
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '')
}Step 3: Parse the SSE data stream and concatenate choices[0].delta.content from each chunk to build the complete response.
Endpoint
| Item | Value |
|---|---|
| Method | POST |
| Path | /v1/chat/completions |
| Base URL | https://open.dieyuyun.com |
| Protocol | OpenAI Chat Completions |
Authentication
All requests require a Bearer Token in the request header:
Authorization: Bearer sk-xxxSupported Models
| Model | Provider | Context Length | Description |
|---|---|---|---|
| deepseek-v4-flash | DeepSeek | 1M | Fast response, low cost, high-frequency |
| deepseek-v4-pro | DeepSeek | 1M | Complex reasoning, high quality output |
| qwen3.7-max | Qwen | 1M | Qwen flagship model |
| glm-5.7 | Zhipu AI | 200K | GLM flagship model |
| kimi-k2.6 | Moonshot AI | 256K | Ultra-long context, document analysis |
| minimax-m3 | MiniMax | 1M | MiniMax flagship model |
TIP
See the full model list in the Console. Model availability may change.
Request Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | — | Model identifier, e.g. deepseek-v4-flash |
| messages | array | Yes | — | List of messages, each with role and content |
| stream | boolean | No | false | Set to true to enable streaming |
| stream_options | object | No | — | Stream config, e.g. {"include_usage": true} for usage in last chunk |
| temperature | number | No | 1 | Sampling randomness, range 0~2 |
| top_p | number | No | 1 | Nucleus sampling parameter |
| max_tokens | integer | No | Model default | Maximum generation tokens |
| max_completion_tokens | integer | No | Model default | Maximum completion tokens (newer parameter) |
| stop | string / array | No | null | Stop sequences |
| presence_penalty | number | No | 0 | Presence penalty, range -2~2 |
| frequency_penalty | number | No | 0 | Frequency penalty, range -2~2 |
| n | integer | No | 1 | Number of candidate completions |
| tools | array | No | — | Tool (function) definitions |
| tool_choice | string / object | No | auto | Tool calling strategy: auto, none, required, or specific function |
| response_format | object | No | — | Response format, e.g. {"type": "json_object"} for JSON output |
| reasoning_effort | string | No | — | Reasoning effort (reasoning models only): low, medium, high |
Message Roles
| Role | Description |
|---|---|
| system | System prompt, defines model behavior |
| user | User messages |
| assistant | Model's previous responses |
| tool | Tool call results |
Request Examples
Basic Conversation
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "deepseek-v4-flash",
"stream": true,
"stream_options": {"include_usage": true},
"messages": [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Tell me about the history of artificial intelligence."}
]
}'from openai import OpenAI
client = OpenAI(
api_key="sk-xxx",
base_url="https://open.dieyuyun.com/v1"
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
stream=True,
stream_options={"include_usage": True},
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Tell me about the history of artificial intelligence."}
]
)
for chunk in stream:
if chunk.choices:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
if hasattr(chunk, 'usage') and chunk.usage:
print(f"\n\n[Usage] Input: {chunk.usage.prompt_tokens}, "
f"Output: {chunk.usage.completion_tokens}")import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'sk-xxx',
baseURL: 'https://open.dieyuyun.com/v1',
})
const stream = await client.chat.completions.create({
model: 'deepseek-v4-flash',
stream: true,
stream_options: { include_usage: true },
messages: [
{ role: 'system', content: 'You are a helpful AI assistant.' },
{ role: 'user', content: 'Tell me about the history of artificial intelligence.' },
],
})
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content
if (content) process.stdout.write(content)
if (chunk.usage) {
console.log(`\n\n[Usage] Input: ${chunk.usage.prompt_tokens}, ` + `Output: ${chunk.usage.completion_tokens}`)
}
}Streaming with Tool Calls
{
"model": "deepseek-v4-flash",
"stream": true,
"messages": [{ "role": "user", "content": "What's the weather like in Beijing today?" }],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}
}
}
]
}Response Format
Streaming SSE Chunks
Streaming responses are delivered as Server-Sent Events (SSE), with each chunk prefixed by data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"Artificial"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":" intelligence"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":28,"completion_tokens":156,"total_tokens":184}}
data: [DONE]Field Reference
| Field | Description |
|---|---|
| id | Unique request identifier |
| object | Always chat.completion.chunk |
| created | Unix timestamp |
| model | Actual model ID used |
| choices[].delta.role | Role (only in the first chunk) |
| choices[].delta.content | Incremental text content |
| choices[].delta.tool_calls | Incremental tool call information |
| choices[].finish_reason | Stop reason: stop, length, tool_calls, content_filter |
| usage | Only in the last chunk (requires stream_options.include_usage) |
Error Response
{
"error": {
"message": "Invalid API key provided",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}See Error Codes for details.
Compatibility
| Feature | Wuliang AI | OpenAI Native |
|---|---|---|
| Streaming SSE format | Fully compatible | Native protocol |
| stream_options | Supports include_usage | Supported |
| tools / function | Pass-through support | Native support |
| response_format | Supports json_object | Native support |
| reasoning_effort | Supported (reasoning models only) | Supported (o1/o3 series) |
| Multimodal input | Supports images, audio | Native support |
| Endpoint path | /v1/chat/completions | /v1/chat/completions |
TIP
The platform maintains full compatibility with the OpenAI protocol. Successful responses return the upstream format directly, without additional code / data wrappers.
Best Practices
- Set
stream_options: Add{"include_usage": true}to receive token usage in the final chunk for billing and monitoring. - Handle connection drops: Streaming can be interrupted by network issues. Implement reconnection and timeout logic.
- Set
max_tokensappropriately: Prevent excessive generation and unnecessary costs. - Use the
systemrole: Define model behavior via system prompts rather than repeating instructions in user messages. - Implement exponential backoff: On 429 errors, retry with increasing delays (1s → 2s → 4s).
- Consume the stream promptly: Process chunks as they arrive instead of buffering the entire response in memory.
Rate Limits
See Rate Limits for details.
Related Docs
- Chat Completions (Non-Streaming) - Get a complete response in one call
- Manage API Keys - Create and configure API keys