Microsoft’s New AI Can Clone Your Voice in Just 3 Seconds


speakers bubbles textvector cartoons background design

AI is being used to generate everything from images to text to artificial proteins, and now another thing has been added to the list: speech. Last week researchers from Microsoft released a paper on a new AI called VALL-E that can accurately simulate anyone’s voice based on a sample just three seconds long. VALL-E isn’t the first speech simulator to be created, but it’s built in a different way than its predecessors—and could carry a greater risk for potential misuse.

Most existing text-to-speech models use waveforms (graphical representations of sound waves as they move through a medium over time) to create fake voices, tweaking characteristics like tone or pitch to approximate a given voice. VALL-E, though, takes a sample of someone’s voice and breaks it down into components called tokens, then uses those tokens to create new sounds based on the “rules” it already learned about this voice. If a voice is particularly deep, or a speaker pronounces their A’s in a nasal-y way, or they’re more monotone than average, these are all traits the AI would pick up on and be able to replicate.

The model is based on a technology called EnCodec by Meta, which was just released this part October. The tool uses a three-part system to compress audio to 10 times smaller than MP3s with no loss in quality; its creators meant for one of its uses to be improving the quality of voice and music on calls made over low-bandwidth connections.

To train VALL-E, its creators used an audio library called LibriLight, whose 60,000 hours of English speech is primarily made up of audiobook narration. The model yields its best results when the voice being synthesized is similar to one of the voices from the training library (of which there are over 7,000, so that shouldn’t be too tall of an order).

Besides recreating someone’s voice, VALL-E also simulates the audio environment from the three-second sample. A clip recorded over the phone would sound different than one made in person, and if you’re walking or driving while talking, the unique acoustics of those scenarios are taken into account.

Some of the samples sound fairly realistic, while others are still very obviously computer-generated. But there are noticeable differences between the voices; you can tell they’re based on people who have different speaking styles, pitches, and intonation patterns.

The team that created VALL-E knows it could very easily be used by bad actors; from faking sound bites of politicians or celebrities to using familiar voices to request money or information over the phone, there are countless ways to take advantage of the technology. They’ve wisely refrained from making VALL-E’s code publicly available, and included an ethics statement at the end of their paper (which won’t do much to deter anyone who wants to use the AI for nefarious purposes).

It’s likely just a matter of time before similar tools spring up and fall into the wrong hands. The researchers suggest the risks that models like VALL-E will present could be mitigated by building detection models to gauge whether audio clips are real or synthesized. If we need AI to protect us from AI, how do know if these technologies are having a net positive impact? Time will tell.

Can Scientists Diagnose Depression With The Sound Of Your Voice?


Could the sound of your voice tell a doctor if you are depressed? That’s the thought behind SimSensei, a system designed to help doctors diagnose depression by analyzing speech patterns. Researchers at the University of Southern California developed the system that they say can work alongside doctors to more accurately diagnose depression in patients.Research shows that reduced frequency range in vowel production is a speech characteristic of people with psychological and neurological disorders. The system is programmed to specifically identify reductions in vowel expression — a characteristic associated with depression — that human interviewers may not recognize. In a 2009 study, doctors misdiagnosed depression half the time. SimSensei is looking to vastly reduce that percentage. Watch the videos below for more information on depression.

Whose Voice Are You Listening To?


Throughout my life, there had been no short supply of voices coming from both outside and inside of me pulling me in different directions. Sometimes the voice was gentle like a whisper from a lover, sometimes it was harsh and demanding especially when confronted with a fight or flight situation. The voices took on different roles such as the soother, the sergeant, the seductress, the teacher and they knew exactly when their presence was required. Most of them functioned on autopilot that I never truly stopped and questioned the origins of these voices until my awakening.

Whose Voice Are You Listening To - Allow thoughts to come and go, just don't serve them tea - Shunryu Suzuki

As if having woken up from a dream after a long night, I suddenly noticed not all voices were created equal, that is, some of them clearly did not have my best interest at heart despite the fact they all claimed they were looking after me.

Questioning the voices:

One by one, I began questioning each voice every time when it spoke to me. I would ask where it came from and have a conversation with it. Some of them, the self-deprecating and self-denying ones desperately wanted to be heard and they brought with them a ton of excuses, justifications and explanations as to why I must follow their lead or else I would end up with unimaginable catastrophes.

There was the constant nagging one that sounded perfectly soft and sincere, “You can’t do it because you are not good enough.” If only I could plant a tree every time that voice was uttered in my world! Lo and behold, it was the answer to almost every idea I wanted to manifest that was outside of my comfort zone for three decades of my life. I never questioned that voice because I believed it was protecting me and serving my highest good even though it most often reduced me to curling up in a corner of a cage made of self-loathing and tears.

The first time I questioned it was when I was over at my parents’ house one day. I shared some of my ideas with them about a project I had in mind and immediately, my dad’s reaction was, “Who do you think you are? You can’t do it because you are not made of that material.”

Freeze framing the movie scenes:

Right then and there, I saw a movie titled “I’m not good enough” playing scene by scene in my mind. Some of the scenes were from childhood when I hopelessly tried to defend myself in front of dad, but most of the scenes comprised of those moments in my adult life such as the one when I chose a career that was safe rather than one I was passionate about so I wouldn’t end up on the street, or the one when I held onto a failing relationship because I told myself I would never find a partner who wanted to be with me, on and on it played as that moment froze and time stayed still.

How many other voices were there trying to stop me from living my life? And how much was I holding onto that was not mine?

Those two questions initiated an intense healing process that was not unlike spring cleaning accompanied by long periods of soul searching, heartaches and the tears.

It was the beginning of my awakening.

When we go through our lives set on auto pilot – which by the way is a necessary mechanism for us to not having to relearn how to walk and brush our teeth every day – the trade-off is, we are also living on a subconscious level. Without the awareness, we allow our lives to be dictated by outside influences such as values and beliefs from our parents, teachers and culture that we have come to internalize. The result of moving through life reacting based on our past experiences rather than creating from who we are today, makes us feel like powerless victims simply because the life that “just happens to us” is not the one of our choosing as the voices we listen to clearly have no idea what is best for us today, at this moment!

Releasing the chains that bind us to the past:

This is perhaps one of the most empowering and liberating realizations one can have, for the chains that bind us only have power over us for as long as we remain unaware. The second we see them for what they are, a necessary tool, or I prefer to think of them as the greatest gift that allows us to learn what doesn’t work for us in life so we can discover what does work, they will simply shatter into a million pieces and dissolve into thin air.

Once we become conscious of how a voice impacts our decisions, it then becomes our choice whether we will follow that voice, or adjust its volume so it cannot take over.

Free-willing our way forward:

This is the magnificent power of our free will. Our brain is hardwired like a disc that can be both programmed and erased once the data is no longer needed. Once we press the delete button, we will end up with an infinite amount of free space – space to rewrite our stories and our lives.

There are many ways to identify disempowering voices, my favorite one is to just listen to our feelings as they are language of the soul. Whenever an out of tune voice shouts out a command, we experience a moment of “yuck.” Those are usually the voices that trigger fear and anxiety, make us want to scream and run, cause us to lose sleep at night, drag us into a sea of depression and manifest as physical pains and illnesses, the list goes on.

Once those voices are identified, befriend them and talk to them. They are our internal barometers that have been working tirelessly day and night to indicate to us what we truly desire in life. They are our cheerleaders that use any means necessary to get us to run that extra mile so we can reach our milestones. They are the x-ray glasses that allow us to see beyond the surface so we can move into our consciousness and elevate it. This is our time to give them our most heartfelt gratitude for having illuminated the paths in front of us so we can now travel in light, or enlightened.

Eventually, we will discover the one voice that has been singing to us all along – the voice of love, joy and healing that has never judged or abandoned us since the day we began our journeys. As the silence inside of us grows, that voice will become clearer and more expansive and fill us with endless inspiration.

We will say to ourselves, “But I’ve known this all along! I just didn’t listen to it.”

Follow that voice — the voice of the heart.

Raspy Voice In The Morning? Here’s Why Your Voice Changes As The Day Goes On


Morning Voice

For many people, a deeper voice in the morning is just one of the many reasons to avoid human contact before you’ve had time to get ready for the day ahead. Other people (mostly men) wish they could maintain their raspy voice all day long. There’s a reason why the voice we go to sleep with is markedly different from the one we wake up with. Actually, there are a couple of reasons for this common and temporary phenomenon. There are also easy ways to remedy a deep voice in the morning if you’re not a fan of sounding like Bill Clinton when you wake up.

“Morning voice,” or the deeper voice we all experience after getting up in the morning, is not to be confused with hoarseness, which tends to be a common symptom caused by a problem with the vocal cords or an inflamed larynx. According to the Cleveland Clinic, hoarseness can be caused by the common cold, upper respiratory tract infection, smoking, allergies, voice abuse, and gastroesophageal reflux or acid reflux — when stomach acid makes its way up the swallowing tube and irritates the vocal cords.

A deeper voice in the morning is an inevitable result of a good night’s rest. During sleep, the tissues in our throat collect fluid, which is also what causes our eyes to look puffy when we just wake up. Our lack of vocal cord use during the night also causes mucus to build up during the hours we spend asleep. People who breathe through their mouth during sleep quickly dry out their vocal cords. This lack of lubrication hinders our vocal cords from moving together, which creates the normal or higher pitch of our voice.

“First and foremost, ‘morning voice’ is caused by fluid collecting in the tissues of our throat and mucus building up overnight,” Susan Berkley, author of Speak To Influence: How To Unlock The Hidden Power Of Your Voice, told Medical Daily. “Mouth breathing is more of a secondary cause. Acid reflux, which refers to stomach acid leaking back up the esophagus, can also result in the raspier voice we tend to wake up with. Eating a big meal before bed or a spicy meal can exasperate this problem. Sleep with your head elevated is one strategy for avoiding a raspy voice in the morning.”

If you are one of the people who would rather not have their voice heard first thing in the morning, you are in luck. It doesn’t take long to alleviate your raspy morning breath. Berkley suggests starting each morning out with two glasses of room temperature water complimented by a squeeze of lemon. This should be done before your morning coffee and tea, which can actually make your throat even drier. Next, while the warm and moist air from your morning shower relaxes your throat muscles, practice humming to “wake up” your voice.