Spot is now a scripted tour guide

Spot is now a scripted tour guide. (Source – Shutterstock)

Guiding through the script: Boston Dynamic’s Spot is a talking tour guide now

  • Boston Dynamics’ Spot becomes a talking tour guide, employing a tailored script for room-specific guidance.
  • Though real-time AI interactions present occasional hitches.

In an age where technology advancements once relegated to the realm of science fiction are becoming daily headlines, Boston Dynamics has once again pushed the boundaries. It’s not just about robots walking or jumping anymore; it’s about them communicating.

Imagine Spot, the renowned robot dog, decked out in a playful top hat, mustache, and those unmistakable googly eyes, leading staff through the intricacies of the company’s facilities. And if that’s not enough, it engages in light-hearted exchanges with a British accent. This is more than just a feat of engineering; it’s emblematic of the evolving relationship between humans and machines. As technology continues its relentless march forward, the line between human interactions and machine dialogues grows increasingly blurred.

Spot, the tour guide with a script

“Shall we commence our journey?” Spot asks. “The charging stations, where Spot robots rest and recharge, is our first point of interest. Follow me, gentlemen.” The demonstration highlights Spot’s prowess in engaging in conversations, even going to the extent of moving its “mouth” in sync with its speech, giving the illusion of actual speech.

To equip Spot with the gift of speech, Boston Dynamics harnessed the power of OpenAI’s ChatGPT API. Additionally, they integrated open-source large language models (LLM) to fine-tune its articulated responses. Afterward, they furnished the robot with a speaker system and embedded text-to-speech features. The final touch was making its gripper — which acts as its “mouth” — move in tandem with the words, much like a puppeteer animates a puppet.

At Boston Dynamics, Matt Klingensmith, the lead software engineer, explains the approach they took. The team provided Spot with a concise script tailored to each specific room within their facility. Spot then amalgamated this script with visual feeds from its integrated cameras located on its gripper and frame. This synergy allowed Spot to interpret its environment better, offering more context-driven responses. Spot’s capability to decipher its surroundings and respond pertinently is attributed to Visual Question Answering models, which essentially provide annotations to images and furnish answers when queried.

Navigating the script: Real-time challenges in AI interactions

There was, however, a moment in the video that raised eyebrows and offers a window into the challenges of real-time AI interactions. Klingensmith, seemingly taken by Spot’s conversational flair, complimented its accent. But, Spot, seemingly engrossed in guiding the tour, continued with, “Let us venture onward to the calibration board, shall we? Keep close.” Only after this, did it pause, perhaps processing the compliment, and then offered a response.

This minor hiccup sheds light on the intricate dance of priorities an AI system like Spot must constantly navigate. Real-time processing demands the AI to prioritize tasks, and sometimes, the script-driven or pre-programmed actions may take precedence over on-the-spot interactions.

Another possibility is the inherent delay in processing external inputs, especially when they are unexpected or off-script. AI models, especially complex ones like the LLMs, need to process vast amounts of data and compare them against trained data sets to generate an appropriate response.

It’s also a gentle reminder that while machines like Spot are growing increasingly sophisticated, they are not infallible. Their actions are a result of meticulously designed algorithms, vast data sets, and pre-determined priorities. These systems can sometimes miss the nuances of human interactions or take an extra beat to process them.

As AI continues to evolve, addressing such minor but noticeable challenges will be crucial in ensuring seamless human-robot interactions.

Spot’s multiple avatars

In the showcased video, Spot does not limit itself to just one character. It dons multiple avatars — from a refined 1920s explorer and a contemporary teenager to an elocutionist from the Shakespearean era. Spot’s adaptability is further showcased when, in a humorous twist, it assumes a sardonic character and delivers the quirky haiku: “Generator hums low in a room devoid of joy. Much like my soul.”

The journey with Spot as the guide was not without its enlightening moments, shares Boston Dynamics. When the team posed a question about Spot’s “parents”, the robot cleverly navigated to the older Spot iterations exhibited in the company’s showcase area. But, as with all AI, there were moments of inaccuracy. For instance, when referring to Stretch, Boston Dynamics’ box-manipulating robot, the LLM humorously misinterpreted its function, suggesting it was crafted for yoga exercises.

Boston Dynamic created a robot tour guide using Spot integrated with ChatGPT

Boston Dynamic created a robot tour guide using Spot integrated with ChatGPT. (Source – YouTube)

Klingensmith, in a reflective piece on Boston Dynamics’ platform, conveys, “We’re excited to continue exploring the intersection of artificial intelligence and robotics.” He further expands on the potential of LLMs, suggesting that they can infuse robots with a broader cultural understanding, commonsensical reasoning, and the adaptability which could be invaluable in numerous robotic functions. An enticing possibility he mentions is the prospect of instructing a robot through mere conversation, which could make their adoption more seamless.

However, despite the whimsical undertones in Spot’s presentation, it’s essential not to overlook its more pragmatic capabilities. Spot’s adeptness in activities like door operations and surveillance becomes particularly potent when considering its applications in law enforcement and military operations.