What can Custom Voices generate?

Custom Voices can generate natural speech from text and create reusable voice profiles when the user has authorization.

Does Custom Voices allow adult or NSFW content?

No. Custom Voices prohibits NSFW, adult, pornographic, illegal, hateful, fraudulent, and unauthorized impersonation content.

FAQ

Common questions about Custom Voices

Quick answers for creating voices, generating audio, using credits, and keeping voice assets safe.

Last updated: May 21, 2026

What kind of audio can I generate?

You can generate spoken audio from text using MiMo built-in voices, designed voices, or authorized cloned voices. The best results come from clean scripts with a clear style direction.

Can I clone any voice?

No. You should only upload and clone voices that you own or have explicit permission to use. Do not clone someone else's voice for impersonation, fraud, misleading endorsements, or deceptive content.

What sample files work for voice cloning?

MiMo cloning accepts MP3 or WAV samples under the size limit shown in the studio. Use clean speech with minimal background noise, stable volume, and one speaker.

Why does my generated audio sound different from what I expected?

Most mismatches come from vague style directions, long scripts, unclear punctuation, or conflicting emotions. Try a shorter test, add a single tone direction, and mark important pauses.

How are credits used?

Voice creation and audio generation can consume credits based on the selected workflow and model. If a provider failure prevents generation, the system is designed to refund or avoid charging credits where applicable.

Where are my voice samples and generated audio stored?

Uploaded voice samples and generated audio are treated as private workspace assets and stored according to the configured storage provider and plan retention rules.

Can I delete generated audio?

Yes. Generated audio can be removed from your workspace, and deletion also removes the stored file where supported by the storage provider.

Still need help?

Send us the use case, the voice mode, and the kind of result you expected so support can give a practical suggestion.