Emotional speech recognition for everyone.

Pitch Presentation:

Problem Background  

People with ear impairments can now caption conversations and call like everyone else, thanks to voice to text technology. But what about emotions? How can we translate emotions in a non spoken medium?

We know that there is a fundamental analogy between sign language (visual) and spoken language (acoustic / phonological). In fact, just as phonemes are minimal units without meaning participating in the formation of words of articulated language, in the same way the cheremi, whatever the national sign language, are presented as minimal units without meaning, which - as formational parameters - can be combined between them to give rise to the signs of sign language.

When an ear impairments person communicates by text, the correspondence emotion into visual sign is lost.  

Presently, we have successful apps with voice-to-text and text-to-voice technology. We know that there is an extended literature on AI models which can predict the emotion of the speaker by analysing the recorded audio clip of the speaker’s voice, what we don’t have is the text-emotion control application.

Research Insights

The main situation we wanted to understand was: how do deaf people deal with an emergency situation?

We interviewed three people with ear impairments and one ASL interpreter, and asked them to walk us through their actions and feelings during their approach to an emergency situation.

What we discovered is that there are two scenarios in an emergency case: the support is provided freely and immediately via a CODA, Children of Deaf Adult, given that generally deaf people have a very supportive network of people helping each other. 

Otherwise there are on the market text-to-voice applications: as rogervoice or google live transcribe. 

During the interviews, while texting our questions, we came across the same situation. The interviewee often asked us to slow down or rephrase the sentence because something was difficult for them to catch: emotion or intention in written text.

Proposed Solution 

We narrowed down the solution to adding emotion into text. It is hard to recognize other people’s emotions by text. I wish my voice-to-text app could recognize their emotions so that I could communicate with people better.

The features we would like to have in IHEARYOU are:

  • I am a person with a hearing impairment and I would like to express emotion and understand another's emotions through text.
  • I would like to change the font size so that I could visually and quickly recognize so I could get involved with conversation
  • I am in a phone call, using a live transcribe app, I would like to have text nuanced with the correlated emotion of the speaker.

A note on the last feature is that Automatic Recognition of Emotions from Speech and Text of speech is a challenging problem. Recently, AI and DeepLearning solutions have restricted audio data emotions to the top six classes: angry, disgusted, fear, happy, sad, neutral, surprise and calm, for these emotions text has a visual correspondence.

Speech Emotion Recognition (SER), is tough because emotions are subjective and annotating audio is challenging. If emotions can be encoded in visual text, people with ear impairments would have fewer difficulties in understanding voice-to-text conversation.

Solution Explanation

We created a list of User pain points, here below.

 User Pain Points

User Story #1: As a user, I want to use voice to text with emotion, so that I understand non-visual signals. 

Scenario #1: Understand emotion with text  

Acceptance Criteria:

- User can see font ⇿ emotion correspondence 1 on 1 

- User can see spaced text correspondence to vocal speed 

- User can see contrast text correspondence to vocal intensity

(Setting : Equaliser and Text control such as font typeface and size) 

User Story #2: As a user, I want to switch to another language, so that I can use the app anywhere. 

Scenario #1: Translate  

Acceptance Criteria: 

- User can put the device that they want to listen

- User can see voice-to-text in another language

- Recognize the language and translate text to voice

User Story #3: If I don’t understand then I wish someone could explain me easy

Scenario #1: paraphrase and translate

Acceptance Criteria: 

- Paraphrasing with real-life terminology

– Explain the dictionaries term

Based on our target users’ pain points, we knew we wanted to work on the text chat box with selecting emotion as the first feature, and with increasing tech difficulty arrive at the automatic emotion recognition feature.


The feedback we received was enthusiastic, because of the general lack of resources for deaf people. Interviewees enjoyed the simplicity of the design and the efficacy of the solution. They would like to test the web site or app, and proposed to be a plug-in of the existing live transcribe app.

Lo Fi & Hifi Mockups

We created a flow user chart 

From which the designer derived the HiFi Mockups

The chat box with emotion control menu has a dropdown menu with emotions to be selected and the background is colored according to the selected emotion.

Implementation Details

Technical implementation

  • Where is it hosted? It’s hosted on
  • What is your tech stack? We used React and Javascript to build the web app

Technical challenges

  • What was the hardest part of development? The hardest part was actually figuring out how to build the app using React native but come to realize we weren't skilled in React Native, so we decided to switch over to the Basic React Javascript.
  • Does your app have any scaling issues? Not really i believe this app can go even further into existing applications.
  • What are some key takeaways? Start small, pick a small MVP that is workable within a tight deadline. Due to time constraints don't attempt to learn a new coding skill while trying to build an app that is due in a few weeks.

Future Steps

We are not working on the project as a team, because we are going into different professional paths, nonetheless the product is interesting and socially useful, and we would have liked to complete the product adding all the desired features, specifically the API for Emotion Speech Recognition. 


Product Manager Learnings:

Viviana Letizia

I have learned how to collaborate with a cross-functional team, dealing with contrast and difficult situations. How to prioritize features and listen to customers' needs. I am already using the learnings into my work and receiving good feedback. 

Designer Learnings:

YeaGyeong (Rachel) Cho

Developer Learnings:

Kat Sauma

Developers Learnings:

Dre Onyinye Anozie


Gained more familiarity with the agile project management process while working with a team including a product manager, designer and developer

Learned a lot about building out applications that prioritize accessibility features for folks with hearing impairment, blindness and more. This allowed me to look into the technologies that already exists.

Full Team Learning

We have learnt how to collaborate and work through stressful situations. We successfully derived a solution from a complex problem as emotion recognition for the purpose of inclusivity.