Speech-to-Text

Speech-to-Text refers to the flow from users speaking to the Digital Assistant to the speech transcribed into text. Digital Assistant will respond to users’ speech requests as if the request was made via keyboard.

In this guideline, you will learn the best practices for the whole flow for users’ voice requests, which will save them effort compared to purely using a keyboard.

Context

The context where the Speech-to-Text behavior happens is on SAP’s web client. The Digital Assistant panel where voice can be triggered is placed on top of Fiori Launchpad.

Happy Path

The “Happy Path” defines the interaction flow when the Digital Assistant can transcribe and understand users’ speech input correctly.

1. Trigger speaking mode

Once the Digital Assistant Panel is opened, the user can click the microphone on the bottom to trigger speech input, and there will be a panel sliding from the bottom of the window, replacing the text input area, indicating that Digital Assistant is listening.

2. Real-Time Transcription

As long as the user starts to speak, there will be a real-time text transcription showing, followed by a cursor, indicating that the transcription is ongoing.

3. Automatic Submission

After the user pauses the speech for 2 seconds, the request will be submitted to the Digital Assistant automatically in the format of text.

Exit speaking mode

If users want to exit speaking mode while they are speaking, they can click the “X” icon on the bottom right, then the panel on the bottom will disappear and be replaced by the text input area.

Exit speaking mode
Exit speaking mode

Error Correction

In case the Digital Assistant cannot transcribe the user’s speech correctly and make several mistakes, the user has two options to correct the error.

Start Over

First, users can click the “start over” icon on the bottom left to restart the speech request. Once clicked, the original text transcription will be wiped out, and the user can start the voice input again.

Start over
Start over

Edit Manually

Also, users can directly click on the text transcription, after which they will exit the speaking mode and see the text in the text input area, where they can manually edit the text and submit it.

Edit Manually
Edit Manually