Short-Term Memory: Maintaining Conversation Context

In this article, I’ll try to give a high-level overview of STM  —  Short-Term Memory, a technique used to maintain conversational context. Maintaining the proper conversation context  —  remembering what the current conversation is about  —  is essential for all human interaction and thus essential for computer-based natural language understanding.

Let’s dive in.

Parsing User Input

One of the key objectives when parsing user input sentence for Natural Language Understanding (NLU) is to detect all possible semantic entities, a.k.a named entities. Let’s consider a few examples:

  • “What’s the current weather in Tokyo?” —  this sentence is fully sufficient for the processing since it contains the topic “weather” as well as all necessary parameters like time (“current”) and location (“Tokyo”).
  • “What about Tokyo?” —  this is an unclear sentence since it doesn’t have the subject of the question.
  • “What’s the weather?” —  this is also unclear since we are missing important parameters of location and time.

Sometimes we can use default values like the current user’s location and the current time (if they are missing). However, this can easily lead to the wrong interpretation if the conversation has an existing context.

In real life, as well as in chatbots, we always try to start a conversation with a fully defined sentence since without a context the missing information cannot be obtained and the sentenced cannot be interpreted.

Semantic Entities

Let’s take a closer look at the named entities from the above examples:

  • “weather”  — this is an indicator of the subject of the conversation. Note that it indicates the type of question rather than being an entity with multiple possible values.
  • “current”  — this is an entity of type Date with the value of “now”.
  • “Tokyo” —  this is an entity of type Location with two values “city” and “Tokyo, Japan”

Note that we have two distinct classes of entities:

  • Entities that have no values and only act as indicators or types. The entity “weather” is the type indicator for the subject of the user input.
  • Entities that additionally have one or more specific values like “current” and “Tokyo” entities.

Incomplete Sentences

Assuming previously asked questions about the weather in Tokyo (in the span of the ongoing conversation), one could ask the following question using a shorter, incomplete, form:

  • “What about Kyoto?”  —  this question is missing both the subject and the time. However, we can safely assume we are still talking about current weather.
  • “What about tomorrow?”  —  just like above we automatically assume the weather subject but use “Kyoto” as the location since it was mentioned the last.

These are incomplete sentences. This type of short-hand cannot be interpreted without prior context (neither by humans or by machines) since, by themselves, they are missing necessary information.

In the context of the conversation, however, these incomplete sentences work. We can simply provide one or two entities and rely on the “listener” to recall the rest of missing information from a short-term memory, a.k.a conversation context.

Short-Term Memory

The short-term memory is exactly that…a memory that keeps only a small amount of recently used information and that evicts its contents after a short period of inactivity.

Let’s look at the example from real life. If you would call your friend in a couple of hours asking about “What about a day after?” (still talking about weather in Kyoto)  —  he’ll likely be confused. The conversation is timed out, and your friend has lost its context. You will have to explain to your confused friend what is that you are asking about…

Context Switch

Resetting the context by the timeout is, obviously, not a hard thing to do. What can be trickier is to detect when a conversation topic is switched and the previous context needs to be forgotten to avoid very confusing interpretation errors.

Let’s continue our weather-related conversation. All of a sudden, we ask:

  • “How much mocha latter at Starbucks?”
    At this point we should forget all about previous conversation about weather and assume going forward that we are talking about coffee in Starbucks.
  • What about Peet’s?”
    We are talking about latter at Peet’s.
  • “…and croissant?”
    Asking about Peet’s crescent-shaped fresh rolls.

Despite seemingly obvious logic, the implementation of context switch is not an exact science. Sometimes, you can have a “soft” context switch where you don’t change the topic of the conversation 100%, yet it’s sufficient enough to forget at least some parts of the previously collected context.

Overriding Entities

As we’ve seen above, one named entity can replace or override an older entity in the STM, e.g. “Peet’s” replaced “Starbucks” in our previous questions. The actual algorithm that governs this logic is one of the most important parts of STM implementation. In human conversations, we perform this logic seemingly subconsciously  —  but the computer algorithm to do it is far from trivial.

One of the most important supporting design decisions is that an entity can belong to one or more groups. You can think of groups as types or classes of entities (to be mathematically precise these are the categories). The entity’s membership in such groups is what drives the rule of overriding.

Let’s look at a specific example.

Consider a data model that defined 3 entities:

  • “sell” (with synonym “sales”)
  • “buy” (with synonym “purchasing”)
  • “best_employee” (with synonyms like “best”, “best employee”, “best colleague”)

Our task is to support the following conversation:

  • “Give me the sales data”
    We return sales information since we detected “sell” entity by its synonym “sales”.
  • “Who was the best?”
    We return the best salesmen since we detected “best_employee” and we should pick “sell” entity from the STM.
  • “OK, give me the purchasing report now.”
    This is a bit trickier. We should return general purchasing data and not a best purchaser employee. It feels counter-intuitive but we should NOT take “best_employee” entity from STM and, in fact, we should remove it from STM.
  • “…and who’s the best there?”
    Now, we should return the best purchasing employee. We detected “best_employee” entity and we should pick “buy” entity from STM.
  • “One more time - show me the general purchasing data again”
    Once again, we should return a general purchasing report and ignore (and remove) “best_employee” from STM.

It’s pretty clear that we need some well-defined logic for how this overriding should work: sometimes entities do override the older ones, sometimes they don’t.

Overriding Rule

Here’s the rule we developed at NLPCraft and have been successfully using in various models:

The entity will override other entity or entities in STM that belong to the same group set or its superset.

In other words, an entity with a smaller groupset (a more specific one) will override an entity with the same or larger groupset (more generic one).

Let’s consider an entity that belongs to the following groups: {G1, G2, G3}. This entity will:

  • override existing entity belonging to {G1, G2, G3} groups (same set)
  • override existing entity belonging to {G1, G2, G3, G4} groups (superset)
  • NOT override existing entity belonging to {G1, G2} groups
  • NOT override existing entity belonging to {G10, G20} groups

Let’s come back to our sell/buy/best example. To interpret the questions we’ve outlined above we need to have the following 4 intents:

  • “id=sale term={id==’sale’}
  • “id=best_sale_person term={id==’sale’} term={id=='best_employee'}”
  • “id=buy term={id==’buy’}
  • “id=buy_best_person term={id==’buy’} term={id=='best_employee'}”

We also need to properly configure groups for our entities (names of the groups are arbitrary):

  • Entity “sell” should belong to group A
  • Entity “buy” should belong to group B
  • Entity “best_employee” should belong to groups A and B

Let’s run through our example again with this configuration:

  • “Give me the sales data”
    — We detected entity from group A. — STM is empty at this point.— Return general sales report.- Store “sell” entity with group A in STM.
  • “Who was the best?”
    – We detected entity belonging to groups A and B.- STM has entity belonging to group A.- {A, B} does NOT override {A}.- Return best salesmen report.- Store detected “best_employee” entity.- STM now has two entities with {A} and {A, B} group sets.
  • “OK, give me the purchasing report now.”
    – We detected “buy” entity with group A. – STM has two entities with {A} and {A, B} group sets.- {A} overrides both {A} and {A, B}.- Return general purchasing report. – Store “buy” entity with group A in STM.

And so on… easy, huh? In fact, the logic is indeed relatively straightforward. It also follows common sense where the logic produced by this rule matches the expected human behavior.

Note also that we achieved this result without any usage of clunky deep learning networks and lengthy prep/training phases.

Explicit Context Switch

In some cases, you may need to explicitly clear the conversation STM without relying on algorithmic behavior. It happens when current and new topics of the conversation share some of the same entities. Although at first it sounds counter-intuitive, there are many examples of that in day-to-day life.

Let’s look at this sample conversation:

Q1: “What the weather in Tokyo?
A1: Current weather in Tokyo.

Q2: “Let’s do New York after all then!”
A2: Without an explicit conversation reset we would return current New York weather.

However, the second question was about going to New York (booking tickets, etc.). In real life , your counterpart will likely ask what do you mean by “doing New York after all,” and you’ll have to explain the abrupt change in the topic.

You can avoid this confusion by simply saying: “Enough about the weather! Let’s talk about my weekend plans” — after which the second question becomes clear. That sentence is an explicit context switch.


Let’s collect all our thoughts on STM into a few bullet points:

  • Missing entities in incomplete sentences can be auto-recalled from STM
  • Newly detected type/category entity is likely indicating the change of topic
  • The key property of STM is its short-time storage and overriding rule
  • The explicit context switch is an important mechanism

It is uncanny how properly implemented STM can make conversational interfaces feel like a normal human conversation. It allows minimizing the number of parasitic dialogs and Q&A driven interfaces without unnecessarily complicating the implementation of such systems.

This UrIoTNews article is syndicated fromDzone