Chat Completions (Non-Streaming)

Get complete model responses in a single request via the OpenAI Chat Completions protocol. Ideal for backend processing, data analysis, and batch tasks.

Try it now

Test in Playground ↗ · View Model List ↗

Quick Start

Step 1: Get your API Key from the Console.

Step 2: Send a non-streaming request:

cURLPythonNode.js

bash

curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role": "user", "content": "Briefly explain quantum computing"}
    ]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Briefly explain quantum computing"}
    ]
    # stream defaults to False, no need to set explicitly
)

print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")

javascript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const response = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [{ role: 'user', content: 'Briefly explain quantum computing' }],
})

console.log(response.choices[0].message.content)
console.log(`Usage: ${response.usage.total_tokens} tokens`)

Step 3: Read the complete response from choices[0].message.content.

Endpoint

Item	Value
Method	POST
Path	`/v1/chat/completions`
Base URL	`https://open.dieyuyun.com`
Protocol	OpenAI Chat Completions

Authentication

All requests require a Bearer Token in the request header:

http

Authorization: Bearer sk-xxx

Supported Models

Model	Provider	Context Length	Description
deepseek-v4-flash	DeepSeek	1M	Fast response, low cost, high-frequency
deepseek-v4-pro	DeepSeek	1M	Complex reasoning, high quality output
qwen3.7-max	Qwen	1M	Qwen flagship model
glm-5.7	Zhipu AI	200K	GLM flagship model
kimi-k2.6	Moonshot AI	256K	Ultra-long context, document analysis
minimax-m3	MiniMax	1M	MiniMax flagship model

TIP

See the full model list in the Console.

Request Parameters

Field	Type	Required	Default	Description
model	string	Yes	—	Model identifier, e.g. `deepseek-v4-flash`
messages	array	Yes	—	List of messages, each with `role` and `content`
stream	boolean	No	false	Set to `false` or omit for non-streaming response
temperature	number	No	1	Sampling randomness, range 0~2
top_p	number	No	1	Nucleus sampling parameter
max_tokens	integer	No	Model default	Maximum generation tokens
max_completion_tokens	integer	No	Model default	Maximum completion tokens (newer parameter)
stop	string / array	No	null	Stop sequences
presence_penalty	number	No	0	Presence penalty, range -2~2
frequency_penalty	number	No	0	Frequency penalty, range -2~2
n	integer	No	1	Number of candidate completions
tools	array	No	—	Tool (function) definitions
tool_choice	string / object	No	auto	Tool calling strategy
response_format	object	No	—	Response format control
reasoning_effort	string	No	—	Reasoning effort (reasoning models only)

Request Examples

Basic Conversation

cURLPythonNode.js

bash

curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)
print(f"Input: {response.usage.prompt_tokens} tokens")
print(f"Output: {response.usage.completion_tokens} tokens")

javascript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const response = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [
    { role: 'system', content: 'You are a helpful AI assistant.' },
    { role: 'user', content: 'Tell me about the history of artificial intelligence.' },
  ],
  temperature: 0.7,
  max_tokens: 1000,
})

console.log(response.choices[0].message.content)
console.log(`Input: ${response.usage.prompt_tokens} tokens`)
console.log(`Output: ${response.usage.completion_tokens} tokens`)

JSON Mode Output

Force the model to return JSON using response_format:

json

{
  "model": "deepseek-v4-flash",
  "stream": false,
  "response_format": { "type": "json_object" },
  "messages": [
    { "role": "system", "content": "Return the result as JSON with name and description fields." },
    { "role": "user", "content": "List three common machine learning algorithms." }
  ]
}

Multi-Turn Conversation

json

{
  "model": "deepseek-v4-flash",
  "stream": false,
  "messages": [
    { "role": "system", "content": "You are a technical interviewer." },
    { "role": "user", "content": "I'd like to learn about React Hooks." },
    { "role": "assistant", "content": "React Hooks were introduced in React 16.8..." },
    { "role": "user", "content": "Can you explain useEffect in more detail?" }
  ]
}

Response Format

Successful Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The history of artificial intelligence typically begins in the 1950s. The 1956 Dartmouth Conference is widely considered the founding moment of AI as an independent field..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 256,
    "total_tokens": 284
  }
}

Field Reference

Field	Description
id	Unique request identifier
object	Always `chat.completion`
created	Unix timestamp
model	Actual model ID used
choices[].message.role	Response role, always `assistant`
choices[].message.content	Complete model response text
choices[].message.tool_calls	Tool call requests (if any)
choices[].finish_reason	Stop reason: `stop`, `length`, `tool_calls`, `content_filter`
usage.prompt_tokens	Number of input tokens
usage.completion_tokens	Number of output tokens
usage.total_tokens	Total token count

Error Response

json

{
  "error": {
    "message": "This model's maximum context length is 128000 tokens, however you requested 150000 tokens.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}

See Error Codes for details.

Compatibility

Feature	Wuliang AI	OpenAI Native
Non-streaming format	Fully compatible	Native protocol
JSON mode	Supported	Native support
tools / function	Pass-through support	Native support
n > 1 candidates	Supported	Native support
reasoning_effort	Supported (reasoning models only)	Supported (o1/o3 series)
Endpoint path	`/v1/chat/completions`	`/v1/chat/completions`

TIP

Streaming and non-streaming requests share the same endpoint /v1/chat/completions, distinguished only by the stream parameter. The platform returns the upstream format directly without additional wrappers.

Best Practices

Set temperature appropriately: Use temperature=0 for deterministic tasks (classification, extraction) and 0.7~1.0 for creative tasks.
Use max_tokens to control costs: Explicitly set the maximum output length to avoid excessive generation.
Implement retry logic: For 5xx and 429 errors, use exponential backoff retry strategies.
Manage conversation context: Keep the message list within the model's context window to avoid truncation.
Structured output: When JSON output is needed, use response_format: {"type": "json_object"} and specify the format in the system prompt.

Rate Limits

See Rate Limits for details.

Chat Completions (Streaming) - Real-time streaming output
Manage API Keys - Create and configure API keys

Chat Completions (Non-Streaming) ​

Quick Start ​

Endpoint ​

Authentication ​

Supported Models ​

Request Parameters ​

Request Examples ​

Basic Conversation ​

JSON Mode Output ​

Multi-Turn Conversation ​

Response Format ​

Successful Response ​

Field Reference ​

Error Response ​

Compatibility ​

Best Practices ​

Rate Limits ​

Related Docs ​

Chat Completions (Non-Streaming)

Quick Start

Endpoint

Authentication

Supported Models

Request Parameters

Request Examples

Basic Conversation

JSON Mode Output

Multi-Turn Conversation

Response Format

Successful Response

Field Reference

Error Response

Compatibility

Best Practices

Rate Limits

Related Docs