Online Experience (TTS Text‑to‑Speech)
Today, you deserve gentle care. Give me the text—listen with your heart.
Tonight, I’ll tell you a story—take it slow, the best part’s ahead.
Some words aren’t meant to be loud. Just listen—carefully.
Good morning! Today is a good day—let’s begin.
Select text language, enter text, choose voice/dialect, then generate and download MP3.
TTS Use Cases
Read books/articles
Comfortable night listening, long texts without fatigue.
Short‑video/ads dubbing
Local dialects, expressive emotions, batch ready.
Course/presentation reading
Clear and stable, better delivery.
Navigation/reminders
Natural and friendly, not robotic.
AI companion/assistant
Speaks like a human, realtime response.
Accessibility
More friendly for visually impaired.
Hear the Difference (TTS Samples)
Pick voices/dialects; download MP3/WAV; adjust rate/volume/pitch.
TTS Dialects & Voices
Chinese dialects: Cantonese, Sichuan, Wu (Shanghai), Minnan, Beijing, Tianjin, Nanjing, Shaanxi, etc.
Rate/volume/pitch/bitrate configurable; punctuation respected for natural pauses.
TTS Quick Start
Qwen Chat
Generate reply and tap “Read aloud”.
Mobile reading
WeChat long‑press → Read; iPhone Accessibility → Spoken Content; Android Text‑to‑Speech.
TTS in 3 Steps
1. Copy text
Choose the text you want to read.
2. Select voice/dialect
E.g., Chinese female, Cantonese male.
3. Play & download
Save as MP3/WAV when satisfied.
TTS Developer Integration
Realtime API:
https://dashscope.aliyuncs.com/compatible-mode/v1/services/aigc/multimodal-conversation(每月免费 50
万字符)。文本建议分段(每段 <500 字符),并用标点控制节奏;指定 language/dialect 以提升混语与方言效果。
WebSocket:
wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
Audio params:
format='wav', sample_rate=22050, bitrate='128k'; presence_penalty=0.6
to reduce repetition.
import dashscope
dashscope.api_key = 'your_api_key'
response = dashscope.Audio.speech_synthesizer(
model='qwen3-tts-flash-realtime',
text='你好,这是 Qwen3-TTS 测试。',
voice='中文女声',
language='zh'
)
print(response.output_audio)
TTS FAQ
API Key invalid or region mismatch?
北京区使用 sk-,国际区使用 sk-intl-;在 https://dashscope.aliyuncs.com 选择对应区域生成与重置。
High latency or disconnects?
使用 WebSocket 并添加重连;本地部署选 flash-realtime 版本。
Unnatural sound or repetition?
长文本分段(每段 <500 字符);用标点控制节奏;选择匹配音色;可调 presence_penalty。
Custom voice cloning?
官方暂不内置;可用开源 So-VITS-SVC/XTTS 训练后路由 Qwen3 输出实现克隆。
Slow load or out of memory locally?
pip install transformers torch,from transformers import QwenTTSForConditionalGeneration;GPU
需 CUDA 11+;CPU 用 --device cpu;内存不足使用 FP8 量化。
Free quota and billing?
每月免费约 50 万字符;批量分批调用;也可切换开源本地完全免费。
Multilingual/dialect glitches?
明确 language 与 dialect;先用短句测试;英文+中文无缝更佳,俄语建议稍慢速。
Unsupported format or low quality?
设置 format/sample_rate/bitrate;保存前检查文本无乱码。
Tool integration errors?
更新 dashscope/qwen-tts;必要时加 torch.no_grad() 防内存泄漏。
Demo slow or silent?
刷新或使用隐身模式;必要时本地克隆 Hugging Face Space;检查设备权限。
TTS Comparison & Choice
Naturalness
Lower WER across languages; stable in Chinese/multilingual.
Dialects & languages
9 Chinese dialects + 10 languages; seamless code‑switching.
Open & cost
Open source locally; 500k chars/month in cloud.
TTS Technical Overview
Architecture
Transformer + MoE; unified multimodal framework.
Pipeline
Tokenize/encode → MoE prosody → VQ‑VAE → mel‑spectrogram → HiFi‑GAN (22kHz).
Key points
Adaptive Rhythm, RLHF stability, CUDA Graph for low latency.
TTS Pricing & Compliance
Pricing & quota
About 500k chars/month (~400 mins); low‑cost models; local open source is free.
Compliance & safety
Apache 2.0 license; local running for privacy; ensure authorized data for cloning.