Robotics + Human Robot Interaction

7 min read

Talking Turtle

A ROS 2 package that enables voice control of turtlesim with speech recognition and text-to-speech feedback enabling the transformation of turtlesim into an interactive, voice-controlled drawing game for kids.

Motivation

Human–Robot Interaction (HRI) has the potential to produce scalable, accessible tools that support learning, emotional growth, and daily independence — especially for children and older adults. The Talking Turtle project was built as a compact, demonstrable proof-of-concept to show how a social, voice-enabled interface can scaffold interaction and encourage engagement. My goals were twofold:

  1. demonstrate rapid technical uptake by learning Robot Operating System (ROS) workflows

  2. show a working HRI loop — voice → interpretation → robot behavior → audio/visual feedback — that can be extended to assistive scenarios such as tutoring, motivation prompts, or companion interactions.

Project

Talking Turtle is a ROS 2 package that converts speech into interactive turtle behaviors in the turtlesim environment. Beyond simple voice control, it includes a guided “Draw with Me” game where the turtle coaches a child to draw shapes step-by-step, text-to-speech confirmations, a modular multi-node architecture for easy extension, and an offline speech alternative. A short hosted demo is available to view at the bottom of this page.

Key features.

Draw with Me (guided gameplay): Turtle leads the user through drawing shapes (square, rectangle) with stepwise verbal instructions and movement sequences.

  • Robust modular design: Separate nodes for voice input, game manager, TTS, and motor control (easy to extend or port to a physical robot).

  • Multiple input modes: Live Google Speech API for high accuracy + an offline Vosk-based fallback for low-connectivity demos.

  • Text-to-Speech feedback: Immediate spoken confirmations to close the interaction loop.

  • Manual controls & debugging: command_publisher.py for keyboard input and simpler testing.

  • Stable launch scripts: Single-command launch files that orchestrate all nodes for reproducible demos.

Use Cases (only those that I can think of)
  1. Early Shape Learning Without Writing or Typing
    Children often develop speech before writing skills. A social robot that listens and draws shapes on the ground (or screen) can help them learn squares, circles, and triangles hands-free, turning abstract concepts into tangible play.

  2. Playful Pedagogy for Engagement
    Instead of static lessons, the robot transforms learning into a game. “Let’s draw together” becomes a journey—kids stay curious, engaged, and eager to learn while having fun.

  3. Inclusive Learning for Children With Speech or Motor Challenges
    A child who cannot type, or has limited fine motor control, can still guide the robot using simple voice cues. The robot scaffolds participation, ensuring every child has a way to join in.

  4. Collaborative Storytelling
    The robot doesn’t just draw—it becomes part of a narrative. Shapes can build houses, maps, or characters, letting children blend creativity with learning through co-play.

  5. Home or Classroom Assistant
    Whether in a classroom setting or at home, the robot can supplement teachers and parents—providing personalized, patient, repeatable guidance anytime.

  6. Scaffolded Learning Paths
    Starting with basic shapes, the robot can gradually introduce more complex figures and patterns. This mirrors how children naturally build cognitive skills—step by step, with guidance.

Tech Stack
  • ROS 2 (Humbe/rolling compatible) - underlying operating system for robots.

  • Python for node implementations and glue logic.

  • Speech engines: Google Speech API (online), Vosk (offline).

  • TTS: pyttsx3 / system TTS for spoken feedback.

  • turtlesim for rapid visualization and interaction prototyping.

  • Environment tooling: pixi-based build/launch scripts used in my dev setup.

Challenges
  • Usability for non-technical reviewers: Created single-command launch files and a short recorded demo so reviewers can evaluate the system quickly.

  • Synchronization of audio + motion: Implemented simple state-machine sequencing (acknowledge → execute → confirm) to avoid clashing outputs.

  • Cross-platform and offline reliability: Added Vosk offline listener and packaged convenient launch scripts so the demo runs even without network access.

Demo Video

Watch at 1.5x for best experience. Draw mode time stamp - 01:38

Motivation

Human–Robot Interaction (HRI) has the potential to produce scalable, accessible tools that support learning, emotional growth, and daily independence — especially for children and older adults. The Talking Turtle project was built as a compact, demonstrable proof-of-concept to show how a social, voice-enabled interface can scaffold interaction and encourage engagement. My goals were twofold:

  1. demonstrate rapid technical uptake by learning Robot Operating System (ROS) workflows

  2. show a working HRI loop — voice → interpretation → robot behavior → audio/visual feedback — that can be extended to assistive scenarios such as tutoring, motivation prompts, or companion interactions.

Project

Talking Turtle is a ROS 2 package that converts speech into interactive turtle behaviors in the turtlesim environment. Beyond simple voice control, it includes a guided “Draw with Me” game where the turtle coaches a child to draw shapes step-by-step, text-to-speech confirmations, a modular multi-node architecture for easy extension, and an offline speech alternative. A short hosted demo is available to view at the bottom of this page.

Key features.

Draw with Me (guided gameplay): Turtle leads the user through drawing shapes (square, rectangle) with stepwise verbal instructions and movement sequences.

  • Robust modular design: Separate nodes for voice input, game manager, TTS, and motor control (easy to extend or port to a physical robot).

  • Multiple input modes: Live Google Speech API for high accuracy + an offline Vosk-based fallback for low-connectivity demos.

  • Text-to-Speech feedback: Immediate spoken confirmations to close the interaction loop.

  • Manual controls & debugging: command_publisher.py for keyboard input and simpler testing.

  • Stable launch scripts: Single-command launch files that orchestrate all nodes for reproducible demos.

Use Cases (only those that I can think of)
  1. Early Shape Learning Without Writing or Typing
    Children often develop speech before writing skills. A social robot that listens and draws shapes on the ground (or screen) can help them learn squares, circles, and triangles hands-free, turning abstract concepts into tangible play.

  2. Playful Pedagogy for Engagement
    Instead of static lessons, the robot transforms learning into a game. “Let’s draw together” becomes a journey—kids stay curious, engaged, and eager to learn while having fun.

  3. Inclusive Learning for Children With Speech or Motor Challenges
    A child who cannot type, or has limited fine motor control, can still guide the robot using simple voice cues. The robot scaffolds participation, ensuring every child has a way to join in.

  4. Collaborative Storytelling
    The robot doesn’t just draw—it becomes part of a narrative. Shapes can build houses, maps, or characters, letting children blend creativity with learning through co-play.

  5. Home or Classroom Assistant
    Whether in a classroom setting or at home, the robot can supplement teachers and parents—providing personalized, patient, repeatable guidance anytime.

  6. Scaffolded Learning Paths
    Starting with basic shapes, the robot can gradually introduce more complex figures and patterns. This mirrors how children naturally build cognitive skills—step by step, with guidance.

Tech Stack
  • ROS 2 (Humbe/rolling compatible) - underlying operating system for robots.

  • Python for node implementations and glue logic.

  • Speech engines: Google Speech API (online), Vosk (offline).

  • TTS: pyttsx3 / system TTS for spoken feedback.

  • turtlesim for rapid visualization and interaction prototyping.

  • Environment tooling: pixi-based build/launch scripts used in my dev setup.

Challenges
  • Usability for non-technical reviewers: Created single-command launch files and a short recorded demo so reviewers can evaluate the system quickly.

  • Synchronization of audio + motion: Implemented simple state-machine sequencing (acknowledge → execute → confirm) to avoid clashing outputs.

  • Cross-platform and offline reliability: Added Vosk offline listener and packaged convenient launch scripts so the demo runs even without network access.

Demo Video

Watch at 1.5x for best experience. Draw mode time stamp - 01:38

Motivation

Human–Robot Interaction (HRI) has the potential to produce scalable, accessible tools that support learning, emotional growth, and daily independence — especially for children and older adults. The Talking Turtle project was built as a compact, demonstrable proof-of-concept to show how a social, voice-enabled interface can scaffold interaction and encourage engagement. My goals were twofold:

  1. demonstrate rapid technical uptake by learning Robot Operating System (ROS) workflows

  2. show a working HRI loop — voice → interpretation → robot behavior → audio/visual feedback — that can be extended to assistive scenarios such as tutoring, motivation prompts, or companion interactions.

Project

Talking Turtle is a ROS 2 package that converts speech into interactive turtle behaviors in the turtlesim environment. Beyond simple voice control, it includes a guided “Draw with Me” game where the turtle coaches a child to draw shapes step-by-step, text-to-speech confirmations, a modular multi-node architecture for easy extension, and an offline speech alternative. A short hosted demo is available to view at the bottom of this page.

Key features.

Draw with Me (guided gameplay): Turtle leads the user through drawing shapes (square, rectangle) with stepwise verbal instructions and movement sequences.

  • Robust modular design: Separate nodes for voice input, game manager, TTS, and motor control (easy to extend or port to a physical robot).

  • Multiple input modes: Live Google Speech API for high accuracy + an offline Vosk-based fallback for low-connectivity demos.

  • Text-to-Speech feedback: Immediate spoken confirmations to close the interaction loop.

  • Manual controls & debugging: command_publisher.py for keyboard input and simpler testing.

  • Stable launch scripts: Single-command launch files that orchestrate all nodes for reproducible demos.

Use Cases (only those that I can think of)
  1. Early Shape Learning Without Writing or Typing
    Children often develop speech before writing skills. A social robot that listens and draws shapes on the ground (or screen) can help them learn squares, circles, and triangles hands-free, turning abstract concepts into tangible play.

  2. Playful Pedagogy for Engagement
    Instead of static lessons, the robot transforms learning into a game. “Let’s draw together” becomes a journey—kids stay curious, engaged, and eager to learn while having fun.

  3. Inclusive Learning for Children With Speech or Motor Challenges
    A child who cannot type, or has limited fine motor control, can still guide the robot using simple voice cues. The robot scaffolds participation, ensuring every child has a way to join in.

  4. Collaborative Storytelling
    The robot doesn’t just draw—it becomes part of a narrative. Shapes can build houses, maps, or characters, letting children blend creativity with learning through co-play.

  5. Home or Classroom Assistant
    Whether in a classroom setting or at home, the robot can supplement teachers and parents—providing personalized, patient, repeatable guidance anytime.

  6. Scaffolded Learning Paths
    Starting with basic shapes, the robot can gradually introduce more complex figures and patterns. This mirrors how children naturally build cognitive skills—step by step, with guidance.

Tech Stack
  • ROS 2 (Humbe/rolling compatible) - underlying operating system for robots.

  • Python for node implementations and glue logic.

  • Speech engines: Google Speech API (online), Vosk (offline).

  • TTS: pyttsx3 / system TTS for spoken feedback.

  • turtlesim for rapid visualization and interaction prototyping.

  • Environment tooling: pixi-based build/launch scripts used in my dev setup.

Challenges
  • Usability for non-technical reviewers: Created single-command launch files and a short recorded demo so reviewers can evaluate the system quickly.

  • Synchronization of audio + motion: Implemented simple state-machine sequencing (acknowledge → execute → confirm) to avoid clashing outputs.

  • Cross-platform and offline reliability: Added Vosk offline listener and packaged convenient launch scripts so the demo runs even without network access.

Demo Video

Watch at 1.5x for best experience. Draw mode time stamp - 01:38

Create a free website with Framer, the website builder loved by startups, designers and agencies.