Robot Skills and Messaging APIs

Messaging services set the stage for humans to interact with programmable robots using the same devices we already use to talk with each other. That kind of interaction feels a little like magic, but it’s magic that anyone who codes can conjure. To show you what I mean, we need to look at Misty’s Photo Booth skill, which Misty demoed at Twilio SIGNAL 2019.

When this skill runs, you can send an SMS to ask your Misty robot to take your picture. When Misty gets your text, she stops what she’s doing, turns to look at you, and snaps your portrait with the camera in her visor. She then sends that picture right back to your phone via MMS.

To show how all this works, this article breaks down what happens behind-the-visor of Misty’s Photo Booth skill, from your finger touching send to your portrait landing back in your device. It examines the links between each service the skill combines, and it explains a few important blocks of the JavaScript code that controls Misty’s response.

Let’s start by looking at Twilio Autopilot, the service your text visits first.

SMS -> Twilio Autopilot

To get Misty’s attention, the user texts a phone number hooked up to Twilio’s Autopilot service. If you’re new to Autopilot, here’s how Twilio describes it:

Autopilot is a conversational AI platform to build omnichannel bots and virtual assistants with natural language understanding and complete programmability with Autopilot Actions.


When you text this special number, Twilio forwards your message to the Programmable Messaging channel associated with a unique Autopilot “bot”. Each Autopilot bot you create can perform one or more “tasks” (a.k.a. actions that trigger when your bot receives a certain kind of message). You customize tasks in the Twilio Console to program what they should do when triggered.

For the photo booth skill, the core function of our Autopilot bot lives in a task called we call take_picture . We “train” our bot to trigger the take_picture task when it receives one of the following phrases:

Screen Shot at PM

The next step is to program what the bot does when the take_picture task triggers. Possible actions include sending a message back to the user, listening for responses, or collecting and storing information. Or, if the built-in actions aren’t enough, you can program tasks to redirect to other services. That’s what we do in the Photo Booth skill. When the take_picture task triggers, it calls a redirect to run a Twilio Function.

Autopilot -> Twilio Function

Twilio Functions are serveless functions for handling inbound Twilio communications. These functions are a quick way to introduce the communications you’re processing in Twilio to the rest of the web (a medium of communication in which robots like Misty are exceptionally well-versed).

The Twilio Function we use in our Photo Booth skill does a few things. First, it assigns the user’s phone number to a variable. Then, it uses the Twilio API to reply to the user with an SMS: [••] Time to Pose . Finally, it POSTs the user’s phone number (and the nature of the action they’ve asked for) to the PubNub channel Misty is listening to for new messages. (More detail on this in the next section).

Screen Shot at AM

The Twilio Function code for the Photo Booth skill looks something like this:

exports.handler = function(context, event, callback) { // Saves phone number to contact variable var contact = event.UserIdentifier || "19294421336"; // Sends SMS to user const replySms = {"actions": [{"say": "[••] Time to Pose"}]}; const replyErrorSms = {"actions": [{"say": "[••] Oops an error occured, Could you please try again.."}]}; const axios = require('axios') // The PubNub URL includes publish/subscribe keys, a // channel name (similar to the name of a chatroom), // and a client name (a unique name identifying this // device in the PubNub channel).'<publish-key>/<subscribe-key>/0/<channel-name>/0?store=0&uuid=<client-name>', { 'phNumber': contact, 'type': 'photo' }) .then((res) => { console.log(res); callback(null, replySms); }) .catch((error) => { console.log(error); callback(null, replyErrorSms); });

With the Twilio Function and Autopilot Bot set up, we’re ready to look at PubNub, the service that notifies Misty to take a picture.

Twilio Function -> PubNub

Hacking a robot to react to an SMS is pretty cool. Even cooler? When that robot responds with hardly any latency. That’s where PubNub comes in.

PubNub provides a real-time messaging API that developers can leverage via HTTP communication protocols, allowing for quick communication between all kinds of machines. When you use PubNub’s messaging API, you create a data channel — sort of like a chatroom for devices — that multiple devices (like Twilio servers and robots) can subscribe and publish messages to.

While PubNub provides SDKs for several different languages, we get along just fine in the Photo Booth skill with basic HTTP requests. When you create a new “app” in PubNub, you get unique API Keys for publishing and subscribing to that app. To publish data (as we do in the Twilio Function above), we send a POST request to:<publish-key>/<subscribe-key>/0/<channel-name>/0?store=0&uuid=<client-name>

You’ll notice that the PubNub URL includes publish/subscribe keys, a channel name (sort of like the name of a chatroom), and a client name (a unique name that identifies this device in the PubNub channel). When we send this request, we pass along a JSON body with the message we want to publish. In our case, that message resembles:

{ 'phNumber': contact, 'type': 'photo'

You can read more about this in the PubNub developer docs, but the high-level view is that this request publishes a message to the PubNub app we created for the Photo Booth skill. External devices (like a programmable robot) that are listening to that app can then read those messages and make use of them on their own.


PubNub -> Misty

Before we look at the robot’s skill code, it’s helpful to understand what we mean by a skill. In the Misty robo-verse, a skill is your JavaScript code, running locally on the robot. Each skill requires a JavaScript code file, with the JavaScript Misty executes when the skill runs, and a JSON meta file, with other information about the skill.

Misty’s on-board JavaScript API provides methods for subscribing to data from sensors and other events, and you define the callbacks for handling this data in your skill code. This API also includes methods for commands like moving the robot, using her sensors, playing sounds, and sending web requests. That last bit is how Misty gets data from PubNub in the Photo Booth skill.

For a snappy response, the robot operating the Photo Booth skill should be powered on and running the code before anyone sends her a text. While the skill runs, Misty listens for new SMS notifications by regularly sending requests to our PubNub channel. We do this in our code via the misty.SendExternalRequest() method from the Misty JavaScript API.

Each request Misty sends will time out if it doesn’t get a response after twenty seconds, and there’s no guarantee that someone will send Misty a message within that frame of time. We work around this in our skill by pairing our subscription request with a request to publish an empty message to the PubNub channel, which runs on a loop to keep the lines of communication open. When our Twilio bot forwards a message to PubNub, Misty returns it to our skill and passes it into the _pubNubSubscribe() callback function.

In the JavaScript file for our Photo Booth skill, that code looks something like this (you can also find an example of just the extracted PubNub functionality on GitHub):

// Calls the keepActive() function every 15 seconds
misty.RegisterTimerEvent("keepActive", 15000, true); // Sends a publish request to work around timeouts function _keepActive() { misty.SendExternalRequest("POST", "<publish-key>/<subscribe-key/0/<channel-name>/myCallback", null, null, "{}", false, false, "", "application/json");
} // Gets the message Twilio sends to PubNub and passes
// the response into the _pubNubSubscribe callback function
misty.SendExternalRequest("GET", "<subscribe-key>/<channel-name>/0/0?uuid=<client-id>", null, null, "{}", false, false, "", "application/json","_pubNubSubscribe"); // Extracts the phone number and runs the function that
// has Misty take a picture
function _pubNubSubscribe(data) { outputExt(data.Result.ResponseObject.Data);

The outputExt() function (shown above) extracts the phone number and the value of the type parameter from the message the Twilio Function sends. Misty stores the phone number and checks that the value of type is equal to photo. If it is, she runs a block of code that has her move her head (and camera) to face the user, change her display image, and play sounds to let the user know what’s going on. Here’s an example of how that can look:

if (data != [] && data.type == 'photo') { // Saves the user's contact info misty.Set("contact", (data.phNumber).toString(), false); // Changes display image, sets head position, and plays // sounds to show she's taking a picture misty.DisplayImage("DefaultEyes_SystemCamera.jpg"); misty.Pause(100); misty.PlayAudio("DefaultSounds_Awe3.wav", 100); misty.Set("pictureMode", true, false); misty.MoveHeadPosition(0, 0, 0, 45); misty.Pause(3000); misty.DisplayImage("DefaultEyes_SystemFlash.jpg"); misty.ChangeLED(255, 255, 255); misty.PlayAudio("DefaultSounds_SystemCameraShutter.wav", 100); // Snaps a portrait! By default, this method passes // a base64-encoded string with the image data for the // picture into the _TakePicture() callback function. misty.TakePicture("Photobooth", 375, 812, false, true); misty.Pause(200); misty.DisplayImage("DefaultEyes_SystemCamera.jpg"); misty.ChangeLED(140, 0, 255); misty.Pause(500); misty.DisplayImage("DefaultEyes_Joy2.jpg"); }

Misty -> Imgur

When you code Misty to take pictures, you can pass base64-encoded strings of the picture data into callback functions for additional processing. By default, those callback functions use the same name as the misty.TakePicture() method, prefixed with an underscore: _TakePicture(). In our skill, we use this _TakePicture() callback function to pass the base64-encoded string with our picture data into an uploadImage() function. This callback resembles the following:

function _TakePicture(data) {
var base64String = data.Result.Base64;

When we call the uploadImage() function, Misty posts the picture to a private Imgur album. There are several image-sharing services we could use to host these pictures, but Imgur’s API does two things that make it ideal for the Photo Booth Skill. Thing One: it accepts base64-encoded strings, and Thing Two: it returns the URL for the uploaded image in the response body. By passing this returned URL back into Twilio’s MMS API, Misty can send the picture directly to the person who asked for it.

The code for managing this in the Photo Booth skill looks something like this:

function uploadImage(imageData) { // Sets up the JSON body for uploading the picture var jsonBody = { 'image': imageData, 'type' : 'base64', 'album': '<album-name> }; // Uploads the picture to a private album; then, passes Imgur // response data into the _imageUploadResponse() callback misty.SendExternalRequest("POST", "", "Bearer", "<bearer-token>", JSON.stringify(jsonBody), false, false, "", "application/json", "_imageUploadResponse");
function _imageUploadResponse(responseData) { // Saves the URL misty.Set("imageLink", JSON.parse(responseData.Result.ResponseObject.Data), false); // Runs the code to send the picture sendPicture();

Misty -> MMS

With the picture in our Imgur album, just one step remains: getting that picture into the phone that wants it. That, too, happens by way of the Twilio API. In our JavaScript skill code, we use the SendPicture() function to post a request that includes the contact information of the person who sent the original text, along with the URL that links to their uploaded image.

When we call the sendPicture() function, Misty sends a request to the Twilio API, which drops the image at the given URL into our user’s messaging inbox. It goes a bit like this:

function sendPicture() { misty.Debug("Sending Image to User"); // Sets up thee JSON body for the Twilio SMS API. // Includes the phone number of the recipient and // the URL for their photograph on Imgur var jsonBody = { 'Body': '[••] Greetings from Misty!', 'From': '<number-to-send-from>', 'To': misty.Get("contact"), 'MediaUrl': misty.Get("imageLink") }; // Sends a request to the Twilio API with our account // credentials to send the picture to the person who asked for it var credentials = "<base64-encoded-Twilio-credentials>" misty.SendExternalRequest("POST", "<account-id>/Messages.json", "Basic", credentials, JSON.stringify(jsonBody), false, false, "", "application/x-www-form-urlencoded");

To Recap

In this post, we discussed how to link Twilio and PubNub’s messaging APIs, Imgur’s photo-sharing services, and robot-icizied JavaScript code to build a Photo Booth skill for Misty. When we use this skill:

  1. Someone sends a message to our Twilio phone number
  2. Twilio passes the message to our Twilio Autopilot bot
  3. Our Autopilot bot reads the message, identifies the task, and redirects to our Twilio Function
  4. The Twilio Function posts the user’s phone number to our PubNub channel
  5. Misty, who’s been running the Photo Booth skill all the while, pulls down the message from PubNub
  6. Misty repositions her camera and takes a picture
  7. Misty uploads the picture to a private Imgur album and calls out to the Twilio API to send it as an MMS to our user

Sending your robot a text is a pretty sociable way to ask it to do something. When the robot replies with a picture of its favorite person? That’s downright chummy.

This UrIoTNews article is syndicated fromDzone