SANS

GERARD

Google Developer Expert

Developer Evangelist

International Speaker

Spoken 209 times in 43 countries

Global

Generative AI for Developers

Vertex AI

Complexity

Features

Experiment with Gemini Live

Create an API key in AI Studio

GoogleAI

VertexAI

Tools: Google Search

Google Search

Run query in

Google Search

Tools: Code Execution

Execute code

Run code in

Python Sandbox

Tools: URL Context

Contains URL

URL is fetched and added to context

Tools: Function Calling

Function Calling vs MCP

Model 2

Model 1

Model 2

Model 1

MCP TypeScript SDK

Voice-first use cases

24/7 Bookings using AI Voice Assistants

Cleaning Company

Robot Cafe

Demo: Web Console - Gemini Live

Scan to access link.

Useful Links and Materials

Workshop: Building an AI Voice Agent to Automate a Robot Cafe with Gemini Live and Angular

By Gerard Sans

Workshop: Building an AI Voice Agent to Automate a Robot Cafe with Gemini Live and Angular

The launch of Alexa+ has sparked renewed excitement around the next generation of AI voice assistants powered by generative AI. With Gemini 2.0 and the new Gemini Live API, developers now have the tools to build voice-driven AI agents that seamlessly integrate into web applications, backend services, and third-party APIs. In this workshop we will go beyond simple chatbot interactions to explore how AI agents can power real-world automation—in this case, running an entire robot cafe. We’ll walk through building a voice-first assistant capable of executing complex workflows, streaming real-time audio, querying databases, and interacting with external services. This marks a shift from "ask and respond" to a more dynamic "talk, show, and act" experience. You might assume taking a coffee order is straightforward, but even a basic interaction involves more than 15 distinct states. These include greeting the customer, handling the order flow, confirming selections, applying offer codes, managing exceptions, and supporting cancellations or changes. Behind the scenes, the AI agent coordinates with multiple systems to fetch menu data, validate inputs, and trigger robotic actions. On the frontend, we’ll build an Angular client from scratch that handles real-time audio input and output. You’ll learn how to stream microphone data, integrate with Gemini voice responses, and use the GenAI SDK to connect everything together. We’ll also cover how to tap into core Gemini capabilities like code execution, grounding search, and function calling—enabling a true AI agent that can manage workflows, make decisions, and take action in real time. Instead of a traditional chat UI, this project creates a fully voice-automated, hands-free experience where the assistant doesn’t just chat—it runs the operation. Join us for a deep dive into the future of AI automation — where natural voice is the interface, and the AI agent takes care of the rest, including your fancy choice of coffee!

  • 25