Why am I here?
As 2017 rolls in, a few of us are probably pondering this metaphysical question. As they drop the ball in Times Square, you are asking yourself “How did I drop the ball?” But, don’t worry, this is not another New Year’s Eve motivational article asking you to peer deep inside your soul, and wonder where it all went wrong.
Rather, this is an article about the importance of dialog context for chatbots. Speaking of context, I went for a stroll on a nice Saturday afternoon at a nearby trail that I frequently visit. On a whim, I pulled out my iPhone and decided to have a chat with Siri and Google Assistant (Allo). This is what I had in mind:
- Me: Where am I?
- Bot: You are at <address>, GPS location, identifies landmark, shows map
- Me: Why am I here?
- Bot: Oh, this is a trail. You are probably walking or hiking.
Simple enough, right? I mean, I am asking where I am, and then with that context in mind, I am asking why am I there? Yes, this is pathetic I know; lonely me just having a sad little dialog with my bot. But, this should be easy for a chatbot right? Oh, before you object about the “here” part, I am perfectly OK with substituting the question “Why am I at this location?” — which I ended up doing as you will see below. This, by the way, in itself, is an interesting Natural Language Processing (NLP) disambiguation question (related to entity recognition); but, that wasn’t my aim.
Now, imagine my surprise when I got these results from Google Assistant (first) and Siri (next). I am not showing the “Where am I” question which I asked first. Rest assured that I did ask that question first (to set up the context), and both bots pulled up a map, though neither proceeded to identify the landmark as a trail. Google Assistant was better than Siri on the “Where am I” question, but only marginally so. As I said, neither was able to tell that I am at a trail, even though this (GPS) location is identified as such on Google Maps.
Even accepting that neither bot could tell that I am at a trail, the answers they gave me are just ridiculous. I specifically asked why I am at this location; so the metaphysical/confusing answers are inexcusable. But, more importantly, the bots were not keeping the context of the dialog. They were treating the questions as separate conversations. These bots are more like information retrieval search engines with an NLP veneer; they are not true dialog bots or genuine conversational AI interfaces.
Instead, a bot’s dialog manager should hold the context variable like so:
- Question-1: Where am I?
- Bot uses the GPS/maps application(s) to set up the context variable, location = X
- Question-2: Why am I here?
- Bot recognizes the entity “here” to mean “location = X”, and properly interprets the question, and attempts an answer based on understanding that the user is at a particular location X.
Maintaining the dialog state by obtaining values for these context variables is a key feature of any chatbot dialog manager. Context variables can either be obtained directly from the user, or, as in the above illustration, obtained as a result of performing an action(s) (in this case, GPS-based location retrieval) on some external application(s). For bonus points, the bot would obtain context variables through personalization based on access to a user’s dialog and information access history. Imagine this dialog:
- Me: Where am I?
- Bot: You are at your favorite trail <trail-name>.
- Me: Why am I here?
- Bot: You come here often, especially on weekends. You walk for about an hour or so.
Is this far-fetched? Not really. The information is readily available (thanks to ever-present GPS). Whether the GPS data is recorded and used to track the user’s whereabouts and historical patterns is clearly a matter for privacy experts to sort out. But I, for one, will be glad to allow these bots access to all such personal information to enable them to have better context. Notice that context from these data sources supplements context obtained from the conversation/dialog itself. So, the data integration from multiple sources aids the AI, NLP and machine learning by providing additional data to help further the dialog and enable an intelligent personalized conversation.
Lest you think that I am picking an isolated incident out of context (pardon the pun) to paint these bots in a bad light, let me narrate another recent example of my interaction with Google Assistant. A friend had sent a Google Calendar invite, via Gmail, for lunch at a nearby restaurant. I got there a little early, having used Google Maps on my iPhone to navigate to the place. To kill time before my friend arrived, I stood in front of the restaurant and asked Google Assistant (Allo) where I was, and why I was there. For the first question, it showed me the map, though strangely didn’t precisely identify the restaurant (this was a strip mall). It totally bombed the second question, even though I tried asking in many ways (such as who I was meeting).
Clearly, Google Assistant is not integrated with all these other services that Google owns; I have allowed personalization for all Google services where asked, so I am assuming these Google services have all this data stored under my identity to be readily leveraged. Perhaps, this has been rectified (my experience is a few months old). But, this is an example of the importance of integrating many different data sources to obtain context variables needed to conduct an intelligent dialog that is personal and effective. Ironically, integrating a lot of different data sources and history of previous dialog, will enable the user to have a shortened dialog with the bot. This is important since in many cases, especially for business applications, the user is not trying to engage in a long conversation with the bot. The aim is to conduct just enough dialog to obtain information and perform necessary actions towards accomplishing some goal.
Siri and Google Assistant are open-domain bots with unrestricted context and vocabulary. So, the bar is higher for these bots, unlike their narrow-domain single-application bots which have restricted context and vocabulary. Still, the importance of data integration from a variety of sources remains the same for both types of bots. Data integration may not seem as sexy as NLP or deep learning, but it is just as important for bots to obtain the context needed to hold an intelligent dialog with the user. Identifying all relevant sources of such data and actively making them available to the bot is important. Historical data is used to train the bot with the proper context. Real-time data allows the bot to update the data variables with current context as it pertains to the dialog.
Don’t just sit there, say something. Share your bot stories. Or, tell me I got this all wrong, and I am just being cranky and mean to these nice bots on New Year’s Eve. Go, right ahead. Unlike the bots, I will take it all in context.