My Perspective on Apple Live Speech as a Deaf Person
On May 16, Apple released a press release introducing new features for cognitive accessibility including Live Speech, Personal Voice, and Point and Speak in Magnifier. Those features will be available later this year.
You can read more about it here: https://www.apple.com/newsroom/2023/05/apple-previews-live-speech-personal-voice-and-more-new-accessibility-features/
Today, I would like to talk about Live Speech because it is valuable to me as a deaf person who uses written English and sign language to communicate with others. Live Speech allows me to use my iPhone to type to speak with others through phone calls, video calls, and in person.
Figure 1: Typing a phrase into my iPhone to be read aloud by my phone.
Live speech is valuable to non-speaking individuals and others experiencing speech loss. It includes a Personal Voice feature in which a person can create a voice that sounds like them and be able to use it with the tool. In the upcoming new iOS 17 and iPadOS 17 this fall, Personal Voice and Live Speech will be part of Apple’s built-in accessibility features.
Figure 2: The Personal Voice feature learning a user’s voice so that Live Speech sounds like them.
If it is available today, I would love to try it by ordering food at a restaurant without pointing to items on the menu. It will pair nicely with Live Captions. I will be able to know any questions about my order without asking the person to write it down. Most questions would be like, “What kind of dressing do you want on your salad?” It’s so simple and could be answered quickly using Live Speech. Due to my personal experience, most people would rather speak to me than write down on paper.
I would like Apple to provide a feature that will auto-control the volume when using Live Speech because I always keep my phone muted all the time. Also, I have some reservations about noisy or crowded restaurants, Live Captions wouldn’t pick up the speaker’s voice accurately. The speaker will have to speak closely on my iPhone, which I dislike. I would love to see Apple improve its microphone and algorithms to capture voices better in that kind of situation. I have used Live Transcribe on Android and it works well, but it’s not perfect.
One experience with Live Transcribe I had with a car rental place was helpful. A guy asked me tons of questions outside by the car I was about to rent. He had no idea how to communicate with me, whether I would be simply typing text on the phone or writing it down on paper. With my quick thinking, I realized I had an Android phone in my pocket and I grabbed it to listen to his voice. Most of the questions were yes/no questions and I just read them and nodded back to him. The communication barrier disappeared right there in an instant. It’s amazing how technology has evolved just during this past year. This is one of the reasons I am hopeful for Live Speech because Android doesn’t provide it with its Live Transcribe app. Also, I use an iPhone for everything–not an Android. Ironically, I’m an Android developer here at Deque.
Live Speech is not new to Apple. It’s provided on MacOS Ventura with Live Captions, once you enable it in the settings. Below, you can see a screenshot of Live Captions with the “Type to Speak” feature on my Mac saying “Hello!” for me after I type it. On some apps such as Zoom, you’ll have to go through the settings to capture the standalone virtual microphone that’s connected to the Live Captions. When it is connected, there will be no background noise from your own microphone on your Mac, and will only focus on what you are typing to say.
Figure 3: Live Captions on MacOS Ventura with Type to Speak feature.
I’ve tried it on Zoom calls with the awesome mobile team at Deque, and can proudly claim that it’s my robot voice. Although, I don’t use it often because I don’t know when or how to jump in with others while they are speaking. Sometimes I use it when someone asks me a simple question to which I can respond yes or no.
I can see how Live Speech and Live Captions are reliable in some situations like small talk, asking for directions, etc. But they should not be used as communication accommodation for deaf people in meetings, workplaces, courtrooms, etc. They shouldn’t be used in any situation without sign language interpreters or real live humans typing captions. Any form of automatic captions today are not always perfect.
Paul
Because of a vocal chord operation, I used Live Speech for three weeks. Here are my pros and cons
1. Pros
a. enables tailoring to you voice
b. fast training
c. txt to speech
2. Cons
a. Too slow for interactive communications
b. Does not enable user to save a new “txt to speech phrase, word or txt” created during new communication. It should enable immediate savings in the library for future use
c. louder speaker or blue tooth connection to a small speaker
d. no ability to segment library by event. for example have major events such as “dinner conversation”, sports conversation, doctor visits, etc.
Future work
I believe focusing on 2 d above is critical. It will need an external device such as a quarterbacks play call wrist band. I have mapped out such a device and would be interested in further exploring with you.