Android SDK: Build a Speak and Repeat App

The Android platform provides support for both speech recognition and speech synthesis. In this tutorial, we will create a simple Android app which allows the user to speak, attempts to recognize what they say, and then repeats what was recognized back to them using the Text To Speech engine.

We will use the same technique for the TTS engine as we did in Android SDK: Using the Text to Speech Engine, so the focus of this tutorial will be on the speech recognition element. Both speech recognition and synthesis are relatively easy to implement on the Android platform, so you should be able to achieve the steps in this tutorial even if you are an Android beginner.

Step 1: Start an Android Project

Create a new Android project in Eclipse. Alternatively, if you want to implement the speech recognition functionality in an existing app, open it instead. For this tutorial we have a minimum SDK version of 8, and you do not need to make any particular additions to your Manifest file, the default contents should suffice.

Step 2: Define the User Interface

Let's start by defining the user interface. When the app launches, the user will be presented with a button. On pressing the button, the app will prompt them to speak, listening for their voice input. When the speech recognition utility processes the speech input, the app will present a list of suggested words to the user. As you'll know if you've tried speech recognition as a user, the recognizer is not always accurate, so this list is essential. When the user selects an item from the list, the app will speak it back to them using the TTS engine. The TTS part of the application is optional, so you can omit it if you prefer.

The app is going to use a few text Strings as part of the interface, so define them by opening the "res/values/strings.xml" file and entering the following content:

Of course, you can alter the String content in any way you like.

Open your "res/layout/main.xml" file to create the main app layout. Switch to the XML editor if the graphical editor is displayed by default. Enter a Linear Layout as the main layout for the app's launch Activity:

The Linear Layout contains various style declarations including a background color. Inside the Linear Layout, first enter an informative Text View:

Notice that the Text View refers to one of the Strings we defined. It also sets various display properties which you can alter if you wish. After the Text View, add a button:

The user will press this button in order to speak. We give the button an ID so that we can identify it in the Java code and display one of the Strings we defined on it. After the button, add another informative Text View, which will precede the list of suggested words:

Again, this Text View uses a String resource and contains style properties. The last item in our main.xml Linear Layout is the list of suggested words:

The List View will be populated with data when the app runs, so we give it an ID for identification in Java. The element also refers to a drawable resource, which you should add to each of the drawables folders in your app's "res" directory, saving it as "words_bg.xml" and entering the following content:

This is a simple shape drawable to display behind the List View. You can of course alter this and the List View style properties if you wish. The only remaining user interface item we need to define now is the layout for a single item within the list, each of which will display a word suggestion. Create a new file in "res/layout" named "word.xml"and then enter the following code:

Each item in the list will be a simple Text View. That's our interface design complete. This is how the app appears on initial launch:

Speak and Repeat Launch

Note: don't worry about the lack of dithering, this is just how it looks in the DDMS screenshot. On the device itself, the gradient is perfectly smooth.

Step 3: Setup Speech Recognition

Now we can implement our Java code. Open your app's main Activity and add the following import statements at the top:

You may not need all of these if you do not implement the TTS functionality - Eclipse should highlight imports you have not used so check them when you finish coding. Extend your opening class declaration line as follows, altering the Activity name to suit your own:

The "OnInitListener" is only required for the TTS function. Add the following instance variables inside your class declaration, before the "onCreate" method:

Inside your "onCreate" method, your class should already be calling the superclass method and setting your main layout. If not, it should begin like this:

Next, still inside your "onCreate" method, retrieve a reference to the speech button and list we created, using their ID values:

The List View is an instance variable, accessible throughout the class. Now we need to find out whether the user device has speech recognition support:

We query the environment to see if the Recognizer Intent is present. If it is, we instruct the app to listen for the user pressing the speech button. If speech recognition is not supported, we simply disable the button and output an informative message to the user.

Step 4: Listen for Speech Input

Let's setup the click listener for the speech button we've instructed the app to detect clicks for. Outside the "onCreate" method, but inside your Activity class declaration, add an "onClick" method as follows:

Now implement the method we've called here after the "onClick" method:

Some of this code is standard for setting up the speech recognition listening functionality. Areas to pay particular attention to include the line in which we specify the "EXTRA_PROMPT" - you can alter this to include text you want to appear for prompting the user to speak. Also notice the "EXTRA_MAX_RESULTS" line, in which we specify how many suggestions we want the recognizer to return when the user speaks. Since we are calling the "startActivityForResult" method, we will handle the recognizer results in the "onActivityResult" method.

When the app is listening for user speech, it will appear as follows:

Speak and Repeat Listening

Step 5: Present Word Suggestions

Implement the "onActivityResult" method inside your class declaration as follows:

Here we retrieve the result of the speech recognition process. Notice that the "if" statement checks to see if the request code is the variable we passed when calling "startActivityForResult", in which case we know this method is being called as a result of the listening Intent. The recognizer returns the list of 10 suggested words, which we store as an Array List. We then populate the List View with these words, by setting an Array Adapter object as Adapter for the View. Now each of the items in the List View will display one of the suggested words.

If the app successfully recognizes the user input speech and returns the list of words, it will appear as follows:

Speak and Repeat Word List

Alternatively, if the app does not recognize the user speech input, the following screen will appear:

Speak and Repeat Failed to Recognize

Step 6: Detect User Word Choices

We want to detect the user selecting words from the list, so let's implement a click listener for the list items. Back in your "onCreate" method, after the existing code, set the listener for each item in the list as follows:

We use the "setOnItemClickListener" method to assign a listener to each item in the list. Inside the new "OnItemClickListener", we implement the "onItemClick" method to respond to these clicks - this method will fire when the user selects a suggested word from the list. First, we cast the View that has been clicked to a Text View, then we retrieve the text from it. This text is the word the user has selected. We write the chosen word out to the Log for testing and output it back to the user as a Toast message. Depending on the needs of your own application, you may wish to carry out further processing on the chosen word - this code is purely for demonstration.

The user can press the touchscreen or use a trackball to select words in the list.

Speak and Repeat Selecting Words

When the user selects a word, the Toast message appears confirming it.

Speak and Repeat Toast Message

Step 7: Setup TTS Functionality

If you do not want to implement the Text To Speech functionality, you can stop now and test your app. We only require a little more processing to make our app repeat the user's chosen word. First, to set up the TTS engine, add the following code to the section in your "onCreate" method where you queried the system for speech recognition support. Inside the "if" statement, after "speechBtn.setOnClickListener(this);":

Like the speech listening process, we will receive the result of this code checking for TTS data in the "onActivityResult" method. In that method, before the line in which we call the superclass "onActivityResult" method, add the following:

Here we initialize the TTS if the data is already installed, otherwise we prompt the user to install it. For additional guidance on using the TTS engine, see the Android SDK: Using the Text to Speech Engine tutorial.

To complete TTS setup, add the "onInit" method to your class declaration, handling initialization of the TTS as follows:

Here we simply set the Locale for the TTS, but you can carry out other setup tasks if you like.

Step 8: Repeat the User Choice

Finally, we can repeat the user's chosen word. Back in your "onCreate" method, inside the "OnItemClickListener" "onItemClick" method, after the line in which we output a Toast message, add the following:

This will cause the app to repeat the user's chosen word as part of a simple phrase. This will occur at the same time the Toast message appears.


That's our complete Speak and Repeat app. Test it on an Android device with speech recognition and TTS support - the emulator does not support speech recognition so you need to test this functionality on an actual device. The source code is attached, so you can check if you have everything in the right place. Of course, your own apps may implement speech recognition as part of other processing, but this tutorial should have equipped you with the essentials of supporting speech input.



Related Articles