Voice Library

In this tutorial, you'll learn how to access and navigate your Voice Library in just minutes. We'll cover how to customize each voice. Plus, you'll discover the Voice Cloning feature, allowing you to replicate your own voice for a truly personalized AI experience.

Creation Date: Mar 20, 2025

Created By: Alexandra Fojas

1. Configure

You can find the list of voices you can choose from under Configure. Scroll down the bottom and you'll find "Voice Library."

Configure

2. Voice Library

When you click the Voice Library, it will show you the Featured Voices at the top and the All the Voices at the bottom.

Voice Library

3. At the side, there's the search button.

You can search the name of the language you're looking for. You can choose in the filters if you require a separate language.

At the side, there's the search button.

4. Browse

Different voices consume credits at different rates per minute. This variation exists because we utilize three different LLM providers: Deepgram, 11Labs, Open AI and Cartesia. Each provider has its own pricing and resource requirements, which affect the credit usage for their voices. Please be mindful when selecting a voice to ensure efficient credit management.

Browse

5. If nothing fits, you can clone your voice.

When you clone your voice, the credit usage will also vary. To create a custom voice, simply record your voice and upload the recording to the system. The system will then generate a voice model based on your recording. Keep in mind that the credit consumption for your cloned voice may differ depending on the processing and provider used.

Best practices on recording your voice:

  • Record in a quiet, controlled environment to minimize background noise and echo.
  • Use high-quality audio recordings with a professional microphone and a pop filter to ensure clarity.
  • Provide sufficient audio material, ideally more than 5 minutes, for better AI training.
  • Include diverse speech patterns with varying emotions and intonations for a more natural-sounding clone.
If nothing fits, you can clone your voice.

# Editing your Voice if you choose a Cartesia provider.

6. Choose your ideal voice and click on the settings button.

Once you've chosen a voice by pressing select, an option and settings button is going to appear.

Choose your ideal voice and click on the settings button.

7. It will open a Voice Configuration.

There will be options to change the Voice Model, Speed, Emotion Name and Emotion Level.

It will open a Voice Configuration.

8. Voice Model

Cartesia previously offered two speech models under the Sonic-1 category: Sonic-English, which specialized in English language processing, and Sonic-Multilingual, which supported multiple languages.

Recently, they introduced two new models: Sonic-2 and Sonic-Turbo. Unlike Sonic-1, both of these models can handle both English and non-English languages within a single system.

The pricing remains the same across all models, but the key difference lies in latency, which refers to the response time of the AI in processing and generating speech. According to Cartesia:

  • Sonic-1 models had a latency of 120-150 ms.
  • Sonic-2 improves on this with a latency of 90 ms.
  • Sonic-Turbo offers the fastest response time at 40 ms, though it may sound less natural.

Lower latency means faster AI responses and reduced delay in Text-To-Speech (TTS) output. Based on my experience, Sonic-2 provides a good balance between speed and naturalness, making it my recommended choice.

Voice Model

9. Click on Speed

We recommend using the Normal pace for our AI to make it sound more conversational.

Click on Speed

10. Emotion Name

You can customize your receptionist based on the main emotion it's going to convey during the call.

Emotion Name

11. Emotion Level

To further customize the AI, we have the Emotion Level.

Emotion Level

12. Click on OK

Once you're all set, press on OK.

Click on OK

# Editing your Voice if you choose a 11Labs provider.

# My AI Frontdesk

13. Voice Configuration

Similar to the process above, click on the settings and and it will show you the two options provided below.

Eleven Labs offers Flash and Turbo models, each optimized for different needs. Flash models prioritize ultra-low latency for real-time applications, with Flash v2 (English-only, <75ms latency). They are ideal for conversational AI but have slightly lower quality. Turbo models focus on lifelike speech and emotional depth, making them better for voiceovers and content creation, though they have higher latency. Turbo v2 (English-only) deliver superior quality at the cost of speed.

Voice Configuration

Try Our AI Receptionist Today

Start your free trial for My AI Front Desk today, it takes minutes to setup!

They won’t even realize it’s AI.

My AI Front Desk