Skip to content

Chat Completions (Non-Streaming)

Get complete model responses in a single request via the OpenAI Chat Completions protocol. Ideal for backend processing, data analysis, and batch tasks.

Quick Start

Step 1: Get your API Key from the Console.

Step 2: Send a non-streaming request:

bash
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role": "user", "content": "Briefly explain quantum computing"}
    ]
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Briefly explain quantum computing"}
    ]
    # stream defaults to False, no need to set explicitly
)

print(response.choices[0].message.content)
print(f"Usage: {response.usage.total_tokens} tokens")
javascript
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const response = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [{ role: 'user', content: 'Briefly explain quantum computing' }],
})

console.log(response.choices[0].message.content)
console.log(`Usage: ${response.usage.total_tokens} tokens`)

Step 3: Read the complete response from choices[0].message.content.

Endpoint

ItemValue
MethodPOST
Path/v1/chat/completions
Base URLhttps://open.dieyuyun.com
ProtocolOpenAI Chat Completions

Authentication

All requests require a Bearer Token in the request header:

http
Authorization: Bearer sk-xxx

Supported Models

ModelProviderContext LengthDescription
deepseek-v4-flashDeepSeek1MFast response, low cost, high-frequency
deepseek-v4-proDeepSeek1MComplex reasoning, high quality output
qwen3.7-maxQwen1MQwen flagship model
glm-5.7Zhipu AI200KGLM flagship model
kimi-k2.6Moonshot AI256KUltra-long context, document analysis
minimax-m3MiniMax1MMiniMax flagship model

TIP

See the full model list in the Console.

Request Parameters

FieldTypeRequiredDefaultDescription
modelstringYesModel identifier, e.g. deepseek-v4-flash
messagesarrayYesList of messages, each with role and content
streambooleanNofalseSet to false or omit for non-streaming response
temperaturenumberNo1Sampling randomness, range 0~2
top_pnumberNo1Nucleus sampling parameter
max_tokensintegerNoModel defaultMaximum generation tokens
max_completion_tokensintegerNoModel defaultMaximum completion tokens (newer parameter)
stopstring / arrayNonullStop sequences
presence_penaltynumberNo0Presence penalty, range -2~2
frequency_penaltynumberNo0Frequency penalty, range -2~2
nintegerNo1Number of candidate completions
toolsarrayNoTool (function) definitions
tool_choicestring / objectNoautoTool calling strategy
response_formatobjectNoResponse format control
reasoning_effortstringNoReasoning effort (reasoning models only)

Request Examples

Basic Conversation

bash
curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": false,
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ]
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)
print(f"Input: {response.usage.prompt_tokens} tokens")
print(f"Output: {response.usage.completion_tokens} tokens")
javascript
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const response = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [
    { role: 'system', content: 'You are a helpful AI assistant.' },
    { role: 'user', content: 'Tell me about the history of artificial intelligence.' },
  ],
  temperature: 0.7,
  max_tokens: 1000,
})

console.log(response.choices[0].message.content)
console.log(`Input: ${response.usage.prompt_tokens} tokens`)
console.log(`Output: ${response.usage.completion_tokens} tokens`)

JSON Mode Output

Force the model to return JSON using response_format:

json
{
  "model": "deepseek-v4-flash",
  "stream": false,
  "response_format": { "type": "json_object" },
  "messages": [
    { "role": "system", "content": "Return the result as JSON with name and description fields." },
    { "role": "user", "content": "List three common machine learning algorithms." }
  ]
}

Multi-Turn Conversation

json
{
  "model": "deepseek-v4-flash",
  "stream": false,
  "messages": [
    { "role": "system", "content": "You are a technical interviewer." },
    { "role": "user", "content": "I'd like to learn about React Hooks." },
    { "role": "assistant", "content": "React Hooks were introduced in React 16.8..." },
    { "role": "user", "content": "Can you explain useEffect in more detail?" }
  ]
}

Response Format

Successful Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "deepseek-v4-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The history of artificial intelligence typically begins in the 1950s. The 1956 Dartmouth Conference is widely considered the founding moment of AI as an independent field..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 256,
    "total_tokens": 284
  }
}

Field Reference

FieldDescription
idUnique request identifier
objectAlways chat.completion
createdUnix timestamp
modelActual model ID used
choices[].message.roleResponse role, always assistant
choices[].message.contentComplete model response text
choices[].message.tool_callsTool call requests (if any)
choices[].finish_reasonStop reason: stop, length, tool_calls, content_filter
usage.prompt_tokensNumber of input tokens
usage.completion_tokensNumber of output tokens
usage.total_tokensTotal token count

Error Response

json
{
  "error": {
    "message": "This model's maximum context length is 128000 tokens, however you requested 150000 tokens.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}

See Error Codes for details.

Compatibility

FeatureWuliang AIOpenAI Native
Non-streaming formatFully compatibleNative protocol
JSON modeSupportedNative support
tools / functionPass-through supportNative support
n > 1 candidatesSupportedNative support
reasoning_effortSupported (reasoning models only)Supported (o1/o3 series)
Endpoint path/v1/chat/completions/v1/chat/completions

TIP

Streaming and non-streaming requests share the same endpoint /v1/chat/completions, distinguished only by the stream parameter. The platform returns the upstream format directly without additional wrappers.

Best Practices

  • Set temperature appropriately: Use temperature=0 for deterministic tasks (classification, extraction) and 0.7~1.0 for creative tasks.
  • Use max_tokens to control costs: Explicitly set the maximum output length to avoid excessive generation.
  • Implement retry logic: For 5xx and 429 errors, use exponential backoff retry strategies.
  • Manage conversation context: Keep the message list within the model's context window to avoid truncation.
  • Structured output: When JSON output is needed, use response_format: {"type": "json_object"} and specify the format in the system prompt.

Rate Limits

See Rate Limits for details.