Chat Completions (Streaming)

Stream real-time responses from AI models via the OpenAI Chat Completions protocol. Ideal for chat applications and interactive use cases.

Try it now

Test in Playground ↗ · View Model List ↗

Quick Start

Step 1: Get your API Key from the Console.

Step 2: Send a streaming request:

cURLPythonNode.js

bash

curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Briefly explain quantum computing"}
    ]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Briefly explain quantum computing"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

javascript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const stream = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [{ role: 'user', content: 'Briefly explain quantum computing' }],
  stream: true,
})

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '')
}

Step 3: Parse the SSE data stream and concatenate choices[0].delta.content from each chunk to build the complete response.

Endpoint

Item	Value
Method	POST
Path	`/v1/chat/completions`
Base URL	`https://open.dieyuyun.com`
Protocol	OpenAI Chat Completions

Authentication

All requests require a Bearer Token in the request header:

http

Authorization: Bearer sk-xxx

Supported Models

Model	Provider	Context Length	Description
deepseek-v4-flash	DeepSeek	1M	Fast response, low cost, high-frequency
deepseek-v4-pro	DeepSeek	1M	Complex reasoning, high quality output
qwen3.7-max	Qwen	1M	Qwen flagship model
glm-5.7	Zhipu AI	200K	GLM flagship model
kimi-k2.6	Moonshot AI	256K	Ultra-long context, document analysis
minimax-m3	MiniMax	1M	MiniMax flagship model

TIP

See the full model list in the Console. Model availability may change.

Request Parameters

Field	Type	Required	Default	Description
model	string	Yes	—	Model identifier, e.g. `deepseek-v4-flash`
messages	array	Yes	—	List of messages, each with `role` and `content`
stream	boolean	No	false	Set to `true` to enable streaming
stream_options	object	No	—	Stream config, e.g. `{"include_usage": true}` for usage in last chunk
temperature	number	No	1	Sampling randomness, range 0~2
top_p	number	No	1	Nucleus sampling parameter
max_tokens	integer	No	Model default	Maximum generation tokens
max_completion_tokens	integer	No	Model default	Maximum completion tokens (newer parameter)
stop	string / array	No	null	Stop sequences
presence_penalty	number	No	0	Presence penalty, range -2~2
frequency_penalty	number	No	0	Frequency penalty, range -2~2
n	integer	No	1	Number of candidate completions
tools	array	No	—	Tool (function) definitions
tool_choice	string / object	No	auto	Tool calling strategy: `auto`, `none`, `required`, or specific function
response_format	object	No	—	Response format, e.g. `{"type": "json_object"}` for JSON output
reasoning_effort	string	No	—	Reasoning effort (reasoning models only): `low`, `medium`, `high`

Message Roles

Role	Description
system	System prompt, defines model behavior
user	User messages
assistant	Model's previous responses
tool	Tool call results

Request Examples

Basic Conversation

cURLPythonNode.js

bash

curl -X POST "https://open.dieyuyun.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{
    "model": "deepseek-v4-flash",
    "stream": true,
    "stream_options": {"include_usage": true},
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ]
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://open.dieyuyun.com/v1"
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    stream=True,
    stream_options={"include_usage": True},
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Tell me about the history of artificial intelligence."}
    ]
)

for chunk in stream:
    if chunk.choices:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
    if hasattr(chunk, 'usage') and chunk.usage:
        print(f"\n\n[Usage] Input: {chunk.usage.prompt_tokens}, "
              f"Output: {chunk.usage.completion_tokens}")

javascript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://open.dieyuyun.com/v1',
})

const stream = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  stream: true,
  stream_options: { include_usage: true },
  messages: [
    { role: 'system', content: 'You are a helpful AI assistant.' },
    { role: 'user', content: 'Tell me about the history of artificial intelligence.' },
  ],
})

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content
  if (content) process.stdout.write(content)

  if (chunk.usage) {
    console.log(`\n\n[Usage] Input: ${chunk.usage.prompt_tokens}, ` + `Output: ${chunk.usage.completion_tokens}`)
  }
}

Streaming with Tool Calls

json

{
  "model": "deepseek-v4-flash",
  "stream": true,
  "messages": [{ "role": "user", "content": "What's the weather like in Beijing today?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["city"]
        }
      }
    }
  ]
}

Response Format

Streaming SSE Chunks

Streaming responses are delivered as Server-Sent Events (SSE), with each chunk prefixed by data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":"Artificial"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{"content":" intelligence"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712345678,"model":"deepseek-v4-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":28,"completion_tokens":156,"total_tokens":184}}

data: [DONE]

Field Reference

Field	Description
id	Unique request identifier
object	Always `chat.completion.chunk`
created	Unix timestamp
model	Actual model ID used
choices[].delta.role	Role (only in the first chunk)
choices[].delta.content	Incremental text content
choices[].delta.tool_calls	Incremental tool call information
choices[].finish_reason	Stop reason: `stop`, `length`, `tool_calls`, `content_filter`
usage	Only in the last chunk (requires `stream_options.include_usage`)

Error Response

json

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

See Error Codes for details.

Compatibility

Feature	Wuliang AI	OpenAI Native
Streaming SSE format	Fully compatible	Native protocol
stream_options	Supports include_usage	Supported
tools / function	Pass-through support	Native support
response_format	Supports json_object	Native support
reasoning_effort	Supported (reasoning models only)	Supported (o1/o3 series)
Multimodal input	Supports images, audio	Native support
Endpoint path	`/v1/chat/completions`	`/v1/chat/completions`

TIP

The platform maintains full compatibility with the OpenAI protocol. Successful responses return the upstream format directly, without additional code / data wrappers.

Best Practices

Set stream_options: Add {"include_usage": true} to receive token usage in the final chunk for billing and monitoring.
Handle connection drops: Streaming can be interrupted by network issues. Implement reconnection and timeout logic.
Set max_tokens appropriately: Prevent excessive generation and unnecessary costs.
Use the system role: Define model behavior via system prompts rather than repeating instructions in user messages.
Implement exponential backoff: On 429 errors, retry with increasing delays (1s → 2s → 4s).
Consume the stream promptly: Process chunks as they arrive instead of buffering the entire response in memory.

Rate Limits

See Rate Limits for details.

Chat Completions (Non-Streaming) - Get a complete response in one call
Manage API Keys - Create and configure API keys

Chat Completions (Streaming) ​

Quick Start ​

Endpoint ​

Authentication ​

Supported Models ​

Request Parameters ​

Message Roles ​

Request Examples ​

Basic Conversation ​

Streaming with Tool Calls ​

Response Format ​

Streaming SSE Chunks ​

Field Reference ​

Error Response ​

Compatibility ​

Best Practices ​

Rate Limits ​

Related Docs ​