Learning about Alexa skills from a solutions architect

May 11, 2020 - 15 min read
Cassa Hanon
editor-page-cover

In January, I attended a Pakistani Women in Computing event hosted by Educative author Samia Khalid to get hands-on experience using the Alexa Skills Kit (ASK). Our presenter was Greg Bulmash, Solutions Architect from the Voice Design Education team at Amazon. The training materials Amazon created are great! With self-service APIs and tools in the Alexa Skills Kit, it’s easy to build nearly any experience. We interviewed Greg after the session to capture some of the insider tips he offered.

Here’s what you need to know:

  • Amazon Alexa training materials are free online at developer.amazon.com
  • Sign in to the Alexa Developer console with any Amazon account you already have
  • If you don’t have an Alexa device, you can use the Alexa simulator in the developer console
  • Host your skill backend with the Alexa-hosted option and Amazon S3 will be provisioned for you

Intrigued? Read on!

There are 9 Modules that you can complete in a few hours.

Greg began with an overview of How an Alexa Skill Works. An Alexa skill has a voice user interface (VUI) and application logic. When you speak to Alexa, the speech is processed in the context of the interaction model to interpret the request. Alexa sends the request to the application logic for action.



There are Pre-built models for Smart Home, news, videos and music. Greg walked us through the step-by-step instructions in Module 3 to build a skill called Cake Walk and learn the basics. The Cake Walk skill enables Alexa to ask your birth date, remember your birth date, and use your birth date in other skills.

Get hands-on with Alexa skills today.

Try one of our 300+ courses and learning paths: Alexa Skills 101: Building voice apps for Alexa.


Voice Design Concepts

Understanding voice design concepts helps you to create an effective Voice User Interface for your skill. The idea is to design a VUI that will be as close as possible to a natural conversation between human beings.

The components of a VUI are wake word, launch word, invocation name, utterance, prompt, intent, and slot value.

Wake word: The wake word tells Alexa to start listening to your commands. The wake words you can use for Alexa are: Alexa, Computer, Amazon, Echo.

Launch word: A launch word is a transitional action word that tells Alexa that a skill invocation will likely follow. Examples of launch words are: tell, ask, start, open, begin, launch, and use.

Invocation name: To start interacting with a skill, a user says the skill’s invocation name. For example, to use the Weather skill, the user may say, “Alexa, what’s the weather?”

Utterance: An utterance is a user’s spoken request. A spoken request can invoke a skill, provide inputs for a skill, confirm an action for Alexa, etc. The challenge (and fun!) of voice design is considering the many ways a user might make a request.

Prompt: A string of text that should be spoken to the customer to request information. Skill builders include the prompt text in the response to a customer’s request.

Intent: An intent represents an action that fulfills a user’s spoken request. If an Intent needs additional details, it may optionally have arguments called slots.

Slot value: Slots are input values given as part of a user’s spoken request. These values help Alexa figure out the user’s intent and fulfill the request.


widget

Situational design is a voice-first approach to designing a VUI. Just as humans in a conversation take turns speaking, each interaction between a user and the skill is a turn. Each turn has a situation that gives the context. For example, if it’s the user’s first time using a skill, there is data that isn’t yet known. The situation, and context, is “first-time use”. Once the skill collects and stores the information, it will be available for the next use of the skill.

For good voice design, Amazon advises skill builders to:

  1. Stay close to Alexa’s persona. Alexa is designed to be friendly and helpful, with good manners.

  2. Write for the ear, not the eye. A prompt may look right when you type it, but sound odd in text-to-speech (TTS). Listen to the prompts on your test device, or the simulator, and then revise them until they sound natural.

  3. Be contextually relevant. If you’re giving the user options, list the most relevant options first to make it easier to understand.

  4. Be brief. Reduce the number of steps needed to complete a task.

  5. Write for engagement to increase retention. Can you design the skill to leave out information that experienced users learn over time? Give fresh dialog to frequent users to keep the skill from becoming annoying.




Greg Bulmash answered our questions and shared more insights here.

Tell us about Alexa skills invocation names?

The “invocation name” for your skill is the unique name that will allow you to start/launch the skill using an Alexa enabled device. Users say a skill’s invocation name to begin an interaction with a particular custom skill.

It’s important to know that you can change your invocation name at any time while developing a skill, but you cannot change the invocation name after a skill is certified and published. The invocation name is only needed for custom skills. If you are using a pre-built model (Smart Home, etc), users do not need to use an invocation name for the skill.

There are three ways in which users may say your invocation name to start using your custom skill. A good invocation name works well in all three of these contexts:

Invoking the skill with a particular request. Example: “Alexa, ask Cake Walk for xxx”

Invoking the skill without a particular request, using a defined phrase such as “open” or "start.”

Example:

“Alexa, open Cake Walk”

“Alexa, start Cake Walk”

"Alexa, ask Cake Walk "

Invoking the skill using just the invocation name and nothing else: “Alexa, Cake Walk”.


What is an intent?

It’s information you provide to help Alexa determine what your customer wants to do and then pass that request to your skill code for processing.

For example, when the customer opens your Cake Walk skill for the first time, it uses the “Launch” intent. In what we’ll create, the response to that is to ask the customer when they were born. We’ll also define a “Capture Birthday” intent to be triggered when they say a date.


What is an utterance?

That is the information in what a customer says that’s not the wake word or invocation name. It might come in with them, like “Alexa ask [horoscope skill name] what will happen for Gemini today.” If we break that down:

  • Wake word: Alexa
  • Action Verb: ask
  • Invocation: the horoscope skill name
  • Utterance: “what will happen for Gemini today.”

Alexa will compare that utterance to sample utterances you defined for an intent, and if it’s close enough, will then extract any information you defined as a slot or slots in the intent. Then it will pass the slot value(s) and intent name to the handlers in your skill code, so it can respond.


What is this idea of creating a storyboard or movie script? What is the significance of creating a storyboard? Why is it explained that way?

One of the best ways to clearly represent a conversation between two people is a screenplay format, and this is a conversational user interface.

You can script out the ways the conversation might go, and with that collection of scripts, you have an easier-to-follow representation of how the process might flow. You can group different variants on how the same goal might be reached, such as asking for someone’s birthday.

Cake Walk really has one primary data gathering intent. Getting the customer’s birthday. Once it has it and stores it, it won’t need to ask for any more information. But getting it…

“When’s your birthday?” “November 6th, 2014.” (Alexa’s birthday)

That’s great. You have everything. But not everyone is so cooperative.

“When’s your birthday?” “November.”

Now you need to follow up to get the year and the day.

Those two exchanges—the one where you get everything perfectly and the one where you need to ask for more information—are two different ways the conversation can go.

But remember how we talked about intents? Part of defining an intent is defining sample utterances. So, by scripting out the different possible answers to our question, we create a list of sample utterances. And by having a good set of sample utterances, we have a higher likelihood your skill will understand what to do.

Once you have those conversations scripted out, you can try to create a flow chart for them, but at this point it’s going to be conceptually easier to create a storyboard for each conversation and then group them by intent.

Get hands-on with Alexa skills today.

Try one of our 300+ courses and learning paths: Alexa Skills 101: Building voice apps for Alexa.

Tell us a little about yourself, what led you to this role?

I wrote my first “Hello World” at the beginning of 7th grade. Loved computers ever since and actually got through AP Computer Science in high school. But then my love of writing and performing took over. I ended up studying Creative Writing in college. My claims to fame in the 90s all revolved around things I wrote and published online. One of my jokes even has its own Snopes.com page.

My philosophy for keeping things moving is “when in doubt, build something.” That’s taken me in a number of interesting directions.

I’ve spent nearly 11 of the last 22 years with Amazon, but not all in a row. I’ve done things there ranging from being Senior Editor at the Internet Movie Database to writing developer-facing documentation for an AWS service that crowdsources the annotation of machine learning training data.

Along the way, the “build something” philosophy led me to teach myself a few programming languages, get some certifications, found a volunteer group that teaches kids to code, and make some fun projects including an Alexa skill that rolled a Sphero robot around the stage at a couple of tech conferences.

I’m in this role because I found something that lets me be a little silly, do my favorite things (writing, teaching, performing, coding), work with an amazing team, and work with fascinating emerging technology. I’m just happy Amazon lets me do it.


Explain why developers can / should use Adobe XD to create storyboards.

It’s our goal to offer developers, designers and creators a variety of options to build great voice experiences with the tools they are familiar with. Adobe XD is one of those options.

We also have some free templates available for you to use with it.

Link


Explain the backend resources that can be used to work with Alexa (i.e. AWS, Alexa hosted, custom endpoints)

Essentially there are two options: Alexa-Hosted and Custom Endpoints.

With Alexa-Hosted, Amazon Alexa provides your skill with a set of AWS resources. The upside is that you don’t need to have an AWS account or know how to configure the services. The downside is that your use of these services has some limits.

These limits are generous, and enough, for many different skills to run well, even in commercial release, but if you have a skill that requires a bigger set of resources or will drive enough traffic to exceed the limits of Alexa-Hosted, you’ll want to set up your own custom endpoints.

Custom endpoints are another name for self-hosting. You can self-host via AWS or via your own servers in your own datacenter.


Which is preferred and why?

We prefer Alexa-Hosted for this project because it’s fairly simple. We don’t have to delve very far into the backend configuration, if at all. You can focus on building your skill.

Once you begin building bigger and more complex skills that surpass Alexa Hosted capabilities, it’s important to understand what your skill needs and what’s going to help it run most efficiently so you provide the best experience and fastest possible response time.


What’s included with Alexa-hosted?

With an Alexa-hosted skill, you can build, edit, and publish a skill without leaving the developer console. You get access to an AWS Lambda endpoint, an Amazon S3 bucket for media storage, and an Amazon S3-backed key-value table for managing session persistence. The code editor allows you to edit the backend code for your skill and deploy directly to AWS Lambda. When you create the skill, the service also sets up an AWS CodeCommit repository for managing your code, which you can access by using the ASK CLI to edit your skill.

The limits are generous, and for most skills, there’s no need for more. But for high-traffic skills or ones needing complex infrastructure, you will need to provision the additional resources in your own AWS account.


What should I name my Cakewalk?

When you first set up your skill, go to the invocation name section of the Build Tab and change the invocation name. Based on the default “Hello World” skill that gets set up, the name starts out as “change Me.”

People skip that step, then when they start trying their skills and asking Alexa to “open cake walk,” they actually open someone else’s Cake Walk skill. Just to be on the safe side, we recommend adding another word to the invocation name to make it more unique and be less likely to accidentally invoke someone else’s skill.

Something like “daves sweet cake walk” (no punctuation, because it’s not allowed in invocation names) helps. Use your imagination and make it family friendly.


Can you give some detail on the interaction model?

The interaction model lives on the Build tab of the developer console and consists of your invocation name, your intents, and your slots. It basically tells Alexa everything to help interpret, parcel out, and route the things said to your skill.


How do utterance, intent, and slots interact?

When defining an intent, you provide sample utterances; things people might say that would trigger that intent. In those sample utterances, you can define slots.

For example, if Alexa asked “What’s your name?” we might have a “GetName” intent with “my name is {name}” as a sample utterance. In that sample, “{name}” is a slot. When Alexa sends the information to your skill handler, she’ll tell the handler that she got a “GetName” intent with a “name” slot value of whatever name you said.

Now your “GetNameHandler” function in the script might respond with “Hi,” followed by the value in the “name” slot.


What are some voice design best practices?

We have documentation available through our Alexa Design Guide that can help you understand the principles of situational voice design. The guide is divided into four design patterns including:

  • Be adaptable: Let users speak in their own words.
  • Be personal: Individualize your entire interaction.
  • Be available: Collapse your menus; make all options top-level.
  • Be relatable: Talk with them, not at them.

I recommend checking out the guide here:

LINK


What’s the difference between auto delegation and dialog management?

Auto-delegation tells Alexa “I need all of these slots filled before you send the Intent and the slot values,” and lets Alexa handle getting the slots that weren’t filled based on some sample prompts for how to get those slot values.

Dialog management tells Alexa “I need all of these slots filled, so if you don’t have values for some of them, let me know when you send the Intent, I’ll evaluate what’s missing, and then tell you how to proceed in getting the other bits.”

For example, let’s say I’m asking for three pieces of information, but I can work from any two. That’s something I can manage with dialog management, but it’s too complex for auto-delegation.


What do you mean by situational design?

Situational Design is another name for how you break up all the conversations that your customer might have with your skill. For each exchange you script out, you name it with the situation.

For example, with Cake Walk, the first time the customer launches the skill, the situation is “Launch - Birthday unknown” So you might script it like this:

Customer: Alexa, open cake walk.

Alexa: Hi, welcome to cake walk. When is your birthday?

You’ve defined the situation, the utterance that initiated the exchange, and what Alexa says or asks for.

By thinking about the different situations you’ll have to handle, you can build your dialogues around them.


What is a multi-turn dialog?

One where Alexa will need to have multiple back-and-forth exchanges with the customer to get the information it needs to fulfill an intent.

Single Turn:

Customer: “Alexa, set a reminder to order dinner for 5 o’clock this afternoon.”

Alexa: “Okay, I’ll remind you at 5 p.m.”

Multi-Turn:

Customer: “Alexa, set a reminder to order dinner.”

Alexa: “When should I remind you?”

Customer: “5 o’clock.”

Alexa: “Okay, I’ll remind you at 5 p.m.”


What is Amazon’s goal of investing in educational content?

We’re investing in developers and dreamers. We want to inspire people to create Alexa skills, then help them understand how to make their skills even better.



WRITTEN BYCassa Hanon

Join a community of more than 1 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.

Learn in-demand tech skills in half the time

Copyright ©2022 Educative, Inc. All rights reserved.

soc2