Alexa.Tip – Build Better Responses in .NET With SSML

In this Alexa.Tip series, we explore some little bits of code that can make your life easier in developing Alexa Skills in many languages including C# + .NET, node.jS + TypeScript, Kotlin, etc. We look at concepts that developers might not be aware of, design patterns that and how they can be applied to voice application development, best practices, and more!

In this post, we are looking at a VERY simple, but useful tool when developing Alexa Skills – SSML or Speech Synthesis Markup Language.

If you’re new to voice application development, SSML is a common standard among all the big players – Alexa, Google, Cortana, etc. At its simplest form, it’s a simple set of XML elements that are used to tell the Voice Assistant (such as Alexa) how to speak your response in a particular way. The alternative to using SSML is to return simple text as a string and let the assistant respond with their normal speech. However, this commonly falls short on things like:

  • Abbreviations
  • Dates
  • Emphasis
  • Pauses
  • Tone
  • Pitch

This is especially true when the assistant is reading out a particularly long set of text. It becomes too… robotic. Ironically…

So let’s look at an example of a C# and .NET response using simple text:

Function.cs

public SkillResponse SimpleTextResponse()
{
    return ResponseBuilder.Tell(new PlainTextOutputSpeech
    {
        Text = "When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us."
    });
}

With a response like this, Alexa will likely read it too quickly which can really diminish the power behind such a great quote (by Helen Keller).

Now we can very easily update a response like this to give ourselves something more powerful:

Function.cs

public SkillResponse RichTextResponse()
{
    return ResponseBuilder.Tell(new SsmlOutputSpeech
    {
        Ssml = 
            @"<speak>
                When <emphasis level=""moderate"">one</emphasis> door of happiness closes, another opens;
                <break time=""1.5s""/>
                <s>but often we look so long at the <emphasis level=""strong"">closed</emphasis> door that we do not see the one which has been opened for us.</s>
            </speak>"
    });
}

Now with this, we can add emphasis of different levels on certain words, add breaks in the speech when we need to, and separate contextual sentences!

For a full list of SSML tags that Alexa supports check this out:
https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html

And be sure to check back for more intro to expert level tips on Alexa Skill development!
Or take a look at other posts in the Alexa.Tip series here!

Side note: I’m also working on a tool to make building SSML responses even better – especially for very large and dynamic pieces of conversations. The idea is to build a more Fluid Design approach for something like this:

// returns a string
return SsmlBuilder.Speak("When")
    .Emphasis("one", Emphasis.Moderate)
    .Speak("door of hapiness closes, another opens;)
    .Break(1.5)
    .Sentence("but often we look so long at the closed door that we do not see the one which has been opened for us.")
    .Build();

Or something like:

// returns a string - uses string searching
return SsmlBuilder.Speak("When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.")
    .Emphasis("one", Emphasis.Moderate)
    .Break(";")
    .Sentence("but often we look so long at the closed door that we do not see the one which has been opened for us.")
    .Emphasis("closed", Emphasis.Strong)
    .Build();

Or:

// returns a string - uses indexes and lengths
return SsmlBuilder.Speak("When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has been opened for us.")
    .Emphasis(4, 3, Emphasis.Moderate)
    .Break(47)
    .Sentence(49, 112)
    .Emphasis(59, 6, Emphasis.Strong)
    .Build();

Let me know what you think in the comments or on twitter @Suave_Pirate – Would you rather build SSML Responses like this or is writing the XML in a big fat string working alright for you?


If you like what you see, don’t forget to follow me on twitter @Suave_Pirate, check out my GitHub, and subscribe to my blog to learn more mobile and AI developer tips and tricks!

Interested in sponsoring developer content? Message @Suave_Pirate on twitter for details.


voicify_logo
I’m the Director and Principal Architect over at Voicify. Learn how you can use the Voice Experience Platform to bring your brand into the world of voice on Alexa, Google Assistant, Cortana, chat bots, and more: https://voicify.com/


Leave a comment