Chat Completions (Non-Streaming)
Get complete model responses in a single request via the OpenAI Chat Completions protocol. Ideal for backend processing, data analysis, and batch tasks.
Try it now
Quick Start
Step 1: Get your API Key from the Console.
Step 2: Send a non-streaming request:
bash
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "deepseek-v4-flash",
"stream": false,
"messages": [
{"role": "user", "content": "Briefly explain quantum computing"}
]
}'python
from openai import OpenAI
client = OpenAI(
api_key="sk-xxx",
base_url="https://open.dieyuyun.com/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Briefly explain quantum computing"}
]
# stream defaults to False, no need to set explicitly
)
print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")javascript
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'sk-xxx',
baseURL: 'https://open.dieyuyun.com/v1',
})
const response = await client.chat.completions.create({
model: 'deepseek-v4-flash',
messages: [{ role: 'user', content: 'Briefly explain quantum computing' }],
})
console.log(response.choices[0].message.content)
console.log(`Usage: ${response.usage.total_tokens} tokens`)Step 3: Read the complete response from choices[0].message.content.
Endpoint
| Item | Value |
|---|---|
| Method | POST |
| Path | /v1/chat/completions |
| Base URL | https://open.dieyuyun.com |
| Protocol | OpenAI Chat Completions |
Authentication
All requests require a Bearer Token in the request header:
http
Authorization: Bearer sk-xxxSupported Models
| Model | Provider | Context Length | Description |
|---|---|---|---|
| deepseek-v4-flash | DeepSeek | 1M | Fast response, low cost, high-frequency |
| deepseek-v4-pro | DeepSeek | 1M | Complex reasoning, high quality output |
| qwen3.7-max | Qwen | 1M | Qwen flagship model |
| glm-5.7 | Zhipu AI | 200K | GLM flagship model |
| kimi-k2.6 | Moonshot AI | 256K | Ultra-long context, document analysis |
| minimax-m3 | MiniMax | 1M | MiniMax flagship model |
TIP
See the full model list in the Console.
Request Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | — | Model identifier, e.g. deepseek-v4-flash |
| messages | array | Yes | — | List of messages, each with role and content |
| stream | boolean | No | false | Set to false or omit for non-streaming response |
| temperature | number | No | 1 | Sampling randomness, range 0~2 |
| top_p | number | No | 1 | Nucleus sampling parameter |
| max_tokens | integer | No | Model default | Maximum generation tokens |
| max_completion_tokens | integer | No | Model default | Maximum completion tokens (newer parameter) |
| stop | string / array | No | null | Stop sequences |
| presence_penalty | number | No | 0 | Presence penalty, range -2~2 |
| frequency_penalty | number | No | 0 | Frequency penalty, range -2~2 |
| n | integer | No | 1 | Number of candidate completions |
| tools | array | No | — | Tool (function) definitions |
| tool_choice | string / object | No | auto | Tool calling strategy |
| response_format | object | No | — | Response format control |
| reasoning_effort | string | No | — | Reasoning effort (reasoning models only) |
Request Examples
Basic Conversation
bash
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "deepseek-v4-flash",
"stream": false,
"messages": [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Tell me about the history of artificial intelligence."}
]
}'python
from openai import OpenAI
client = OpenAI(
api_key="sk-xxx",
base_url="https://open.dieyuyun.com/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Tell me about the history of artificial intelligence."}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
print(f"Input: {response.usage.prompt_tokens} tokens")
print(f"Output: {response.usage.completion_tokens} tokens")javascript
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'sk-xxx',
baseURL: 'https://open.dieyuyun.com/v1',
})
const response = await client.chat.completions.create({
model: 'deepseek-v4-flash',
messages: [
{ role: 'system', content: 'You are a helpful AI assistant.' },
{ role: 'user', content: 'Tell me about the history of artificial intelligence.' },
],
temperature: 0.7,
max_tokens: 1000,
})
console.log(response.choices[0].message.content)
console.log(`Input: ${response.usage.prompt_tokens} tokens`)
console.log(`Output: ${response.usage.completion_tokens} tokens`)JSON Mode Output
Force the model to return JSON using response_format:
json
{
"model": "deepseek-v4-flash",
"stream": false,
"response_format": { "type": "json_object" },
"messages": [
{ "role": "system", "content": "Return the result as JSON with name and description fields." },
{ "role": "user", "content": "List three common machine learning algorithms." }
]
}Multi-Turn Conversation
json
{
"model": "deepseek-v4-flash",
"stream": false,
"messages": [
{ "role": "system", "content": "You are a technical interviewer." },
{ "role": "user", "content": "I'd like to learn about React Hooks." },
{ "role": "assistant", "content": "React Hooks were introduced in React 16.8..." },
{ "role": "user", "content": "Can you explain useEffect in more detail?" }
]
}Response Format
Successful Response
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1712345678,
"model": "deepseek-v4-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The history of artificial intelligence typically begins in the 1950s. The 1956 Dartmouth Conference is widely considered the founding moment of AI as an independent field..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 256,
"total_tokens": 284
}
}Field Reference
| Field | Description |
|---|---|
| id | Unique request identifier |
| object | Always chat.completion |
| created | Unix timestamp |
| model | Actual model ID used |
| choices[].message.role | Response role, always assistant |
| choices[].message.content | Complete model response text |
| choices[].message.tool_calls | Tool call requests (if any) |
| choices[].finish_reason | Stop reason: stop, length, tool_calls, content_filter |
| usage.prompt_tokens | Number of input tokens |
| usage.completion_tokens | Number of output tokens |
| usage.total_tokens | Total token count |
Error Response
json
{
"error": {
"message": "This model's maximum context length is 128000 tokens, however you requested 150000 tokens.",
"type": "invalid_request_error",
"param": "messages",
"code": "context_length_exceeded"
}
}See Error Codes for details.
Compatibility
| Feature | Wuliang AI | OpenAI Native |
|---|---|---|
| Non-streaming format | Fully compatible | Native protocol |
| JSON mode | Supported | Native support |
| tools / function | Pass-through support | Native support |
| n > 1 candidates | Supported | Native support |
| reasoning_effort | Supported (reasoning models only) | Supported (o1/o3 series) |
| Endpoint path | /v1/chat/completions | /v1/chat/completions |
TIP
Streaming and non-streaming requests share the same endpoint /v1/chat/completions, distinguished only by the stream parameter. The platform returns the upstream format directly without additional wrappers.
Best Practices
- Set
temperatureappropriately: Usetemperature=0for deterministic tasks (classification, extraction) and0.7~1.0for creative tasks. - Use
max_tokensto control costs: Explicitly set the maximum output length to avoid excessive generation. - Implement retry logic: For 5xx and 429 errors, use exponential backoff retry strategies.
- Manage conversation context: Keep the message list within the model's context window to avoid truncation.
- Structured output: When JSON output is needed, use
response_format: {"type": "json_object"}and specify the format in the system prompt.
Rate Limits
See Rate Limits for details.
Related Docs
- Chat Completions (Streaming) - Real-time streaming output
- Manage API Keys - Create and configure API keys