In defense of Clippy

Created: Friday, 16 December 2016 Written by Gary Elsbernd Print Email

Clippy was ahead of his time.

I'll let that sink in.

Clippy, the infamous Microsoft Office assistant,  was introduced in November 1996. He was refined three years later, in Microsoft Office 2000. He went into retirement two years later, when he was turned off by default. And he finally departed this digital veil in 2007, when Microsoft Office dismissed him all together.

While he was eventually consigned to the dustbin of failed software, like Microsoft Bob, at the time, his novelty spun off a wave of "conversational agents." I worked on "Seemore the Sock Puppet" - a conversational agent for Payless ShoeSouce back in the 90s who you would click to "See more" - get it?  He waggled his eyebrows, and danced around the screen.  In a juvenile Easter Egg, there was one pixel on the screen that would make him pass gas, if you knew where to find it.

Clippy is famous for being one of the worst user interfaces ever deployed to the mass public. He stopped users to ask them if they needed help with basic tasks, like writing a letter or making a spreadsheet. In user experience terms, Clippy was “optimized for first use”: amusing the first time you encountered him, and frustrating after that. He was a puppet who only knew one script and kept repeating it, at you, throughout the workday.

Today, we have Conversational Agents again!  Apple's Siri, Amazon's Echo and Google Home, not to mention all manner of chatbots on the web are all examples of the evolution of the conversational agent. A conversational agent is a software program which interprets and responds to statements made by users in ordinary natural language. It integrates computational linguistics techniques with communication over the internet.

Why are these agents so much more successful than clippy? - I have a few hypotheses:

  1. They are user-invoked. Instead of interrupting your work or conversation with proactive suggestions, these agents do not speak until spoken to. Can you imagine the chaos that would ensue if Siri were to interject "You seem to be having an argument with your spouse - would you like me to read emails relevant to the situation?"
  2. They demonstrate semantic learning and artificial intelligence. Technology has developed to the point that these agents are much more flexible and "conversational." Google Home can keep track of context and respond correctly to unclear pronouns.  For example, while playing music, you can ask, "Hey Google, who is this?" and Google will accurately interpret you are asking about the artist for the music currently playing.  In addition, you can ask Google about a nearby sushi restaurant, and then ask "How far away is that?" and Google will understand you are referring to the restaurant you were just asking about.
  3. They are often voice-controlled.  With the exception of chatbots, these conversational agents are triggered with voice cues, respond audibly and can be used hands-free.  The ease of being able to wonder out loud who played the Joker in the 60's Batman series (Cesar Romero), and have Google tell you without pulling out a phone or laptop or reading and parsing text is a game-changer.
  4. They have personalities, but not overwhelming personalities. Each of the conversational agents have fans and Easter Eggs of questions you can ask to get funny answers programmed by developers.  They tell jokes, play games and sing songs, but only when explicitly requested.  Otherwise, they are all business.  
    Personalities have been added to chatbots as well. Students at The Centre for Psychology at Athabasca University developed the Freudbot, with whom a student can engage in an online conversation with a simulated Sigmund Freud.  Freudbot is capable of discussing a range of personal and psychological topics.

Conversational agents only work when they are truly conversational.  They require the semantic awareness and ability to follow a conversation that has only been recently possible in technology.  Poor Clippy was just a victim of timing (and poor animation).