Voice Assistants 101: Amazon Alexa

Today we are going to explore an interesting topic which is on everyone’s lips (literally): Voice assistants.

They tell us what time it is, they make the shopping list, the weather forecast, they turn on the lights at home and a whole bunch of other things, so at Solid GEAR we asked ourselves the following question: What if we could integrate a Voice Assistant in our applications?

So we got down to work, researching voice assistants and how to integrate them into our Node.js server.

Alexa, your time has come

Alexa is Amazon’s voice assistant. To integrate our apps with Alexa we have to create a Skill at the Alexa developer portal.

What is Alexa Skill?

It is an interface that will allow us to use our server through voice commands processed by Alexa.

Without buzzwords: It is the equivalent of an application, where the user sends and receives information.

In the usage flow, the user would tell Alexa to open the Skill, which subsequently takes control of the logic of the conversation and manages what it has to do in each case.

Ok, now that we know what an Alexa Skill is and how it works, let’s configure it.

Create an Alexa Skill

Once inside the console, we will click on the Create Skill button, it will ask you for a name and a default language; you will be asked for a model too, we will use Custom; Then you will be asked for a method to host your Skill server resources, we will select Provision your own, since we are going to create a Node.js server locally,  then click on Create Skill again.

Create Skill

A brief guide

In the side menu there are several sections, the one we are going to use is Custom, where we will find different subsections. The ones that interest us are Interaction Model, which is the one we are going to focus on, and Endpoint, which is going to be where we are going to configure the resource of our server where  the Skill will hit when some action is executed.

What is my assistant’s name?

Once we have created the Skill, the first thing should configure is the invocation name of the Skill (the famous Alexa, open x). To configure it go to the Custom section, and within this, to Invocation. Skill Invocation Name will be the place where the Skill’s Name will be placed.

Invocation Name

Once the name is set, click on Save Model.

What will our Skill do

To define the actions that our Skill will do, we will use Intents, which are located within the Interaction Model.

Built-in Intents

There will be 4 default Intents, these Intents must exist, but it is not necessary to configure them, although it is highly recommended since they are basic actions which the user would want to do in our Skill. If you want to know more about Built-in Intents here you have the documentation.

To configure an Intent we will have to provide the model example phrases the user will input, the more examples we provide, the more intuitive it will be, and therefore more comfortable for the user. To add examples, we have a text field under the Sample Utterances section, there we can add phrases, just by typing them and pressing the ⏎ key.

Sample Utterances

Custom Intents

Unlike the Built-in Intents, in Custom Intents we will be able to add parameters to our phrases, these are known as Slots.

To be able to add a parameter in our example sentences, select the text that we want to parameterize, a popup will appear in which we can add a new Slot which can be edited later, in the Intent Slots section.

Intent Slots

We need to specify the parameter type, in the Slot Type drop-down, we can select one of those provided by Amazon or create a new one in which we will be able to define values for that parameter and synonyms.

If the parameter type is not going to be of a fixed type, Amazon provides a type called Amazon.SearchQuery, which will identify any value.

Slot Types

Once we have defined the parameter type, it can be saved.

To save and train the model after configuring an Intent, click on Build Model.

To verify that the mapping of Intents and Slots is correct we can click on Evaluate Model and use one of the example phrases used to configure the Intent when the phrase is processed we will be prompted with de Selected Intent and a map with the Slots and its values.

Evaluate Model

Great, but for now we have a silent assistant, and nobody likes that, so let’s see how we can let Alexa speak and how to connect the SKill with our server to answer dynamically.

Alexa, say something, please

As our server is going to be stored locally and Amazon forces us to ensure that the URL used on the Endpoint has an SSL certificate, we will generate one with the help of ngrok.

First things first: download ngrok

npm install -g ngrok

Then, we will launch ngrok on a local port like this:

ngrok http [PORT]

This will generate a URL with http and another with https, we will use the https one.

To connect our Skill to a server we will have to specify the resource our API that we are going to attack, so in the Endpoint subsection we will select HTTPS under Service Endpoint Type, on the Default Region field we will set the complete route, which will be the URL previously created with ngrok and an endpoint, which in my case is going to be /alexa, it should look something like https://486a3467.ngrok.io/alexa, on the other hand, in SSL Certificate Type, we will select “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority”. Click on Save Endpoints.

This was the last step for the basic configuration of the Skill, now it’s time to create our server and specify the actions you have to do with each Intent that arrives.

To save time, here a simple Node.js server created with Express where there is already an endpoint to receive calls from the Skill ([POST] /alexa)

The server is ready to be exposed in port 4500, if you want to launch it in another one, you can change it in the configuration/config.json file (you have to take into account that the port has to be the same one that you used to launch ngrok.

To start the server:

npm install && npm start

How do the Intents work

When invoking the Alexa Skill, the server must have registered Handlers which map the actions of Alexa (Intents) to functions on the server, essentially it is an object with two functions:

  • canHandle: when executed, it returns whether this Handler should be used or not, so if it returns true, this will be the Handler used. Usually, this method will check the name of the intent and/or the type of the request (here are the types of request that exist), in our case we will need to return true if the type of the request is IntentRequest and the name of the Intent that we have previously configured in our Skill.

  • handle: is the function that is executed if this is the selected Handler.

Both functions receive the body of the request which our Skill sends to our server as a parameter.

Usually, we will add as many Handlers as Intents in our Skill.

At the time of invocation of the Skill, the Handler code searches from the first Intent registered to the last until it finds one in which the canHandle returning true, so if there are two or more Intents in which the canHandle function returns true will only return first one.

Technical part

In the canHandle function we will use the Alexa SDK for Node.js, to install it in our project:

npm install --save ask-sdk

Then we will need to import it into our server:

const Alexa = require('ask-sdk-core');

If we want Alexa to answer us, we will need the object passed parameter in the handle function, this same object has a responseBuilder that will allow us to return a spoken response using the .speak() method.

const response = handlerInput.responseBuilder
    .speak('Hello!')
    .getResponse();

On the other hand, with the shouldEndSession field from the response object, we will tell our Skill if, after this response, Alexa must end the conversation or must remain listening to the user.

response.shouldEndSession = false;
return response;

Two example Intents, one checking the type, the other checking the type and the name:

// Check type
const LaunchRequestHandler = {
    canHandle: handlerInput => handlerInput.requestEnvelope.request.type === 'LaunchRequest',
    handle: handlerInput => {
        const response = handlerInput.responseBuilder
            .speak('Welcome to your custom Skill, What you want to do?')
            .getResponse();
        response.shouldEndSession = false;
        return response;
    }
};

// Check type y name
const HelloWorldHandler = {
    canHandle: handlerInput =>
        handlerInput.requestEnvelope.request.type === 'LaunchRequest' &&
        handlerInput.requestEnvelope.request.intent === 'Custom_HelloIntent',
    handle: handlerInput => {
        return handlerInput.responseBuilder
            .speak('Hello World!')
            .getResponse();
    }
};

Then, we create the Skill and register the Intents on our server.

To create a Skill:

const customSkill = Alexa.SkillBuilders.custom();

To register the intents:

customSkill.addRequestHandlers(
    LaunchRequestHandler,
    HelloWorldHandler
);

Finally, we invoke the Skill at the moment when the request is received at the /alexa endpoint (inside the endpoint, where we have access to the parameters req and res of the request).

const response = await customSkill.invoke(req.body);
return res.send(response);

Conclusions

As you can see, it is not difficult to create an Alexa Skill, and it can provide a differentiating point to our applications.

The next steps would be to Certify and Distribute the Skill in the Alexa Skills market. To do this, Amazon provides some instructions on the Certify and Distribute tabs, inside the Alexa Developer Console.

I hope this is useful for you and also I hope you can take advantage of this for your projects.

Bye!

Leave a Comment

¿Necesitas una estimación?

Calcula ahora

Privacy Preference Center

Own cookies

__unam, gdpr 1P_JAR, DV, NID, _icl_current_language

Analytics Cookies

This cookies help us to understand how users interact with our site

_ga, _gat_UA-42883984-1, _gid, _hjIncludedInSample,

Subscription Cookies

These cookies are used to execute functions of the Web, such as not displaying the advertising banner and / or remembering the user's settings within the session.

tl_3832_3832_2 tl_5886_5886_12 tve_leads_unique

Other