Case-study part 1: Explore Anatomy and Taxonomy of Agents

Objectives

  • Explore core concepts of Autonomous Systems using Norwegian Language Learning Agent

  • Concepts

    • Core Anatomy and Taxonomy of Agents

    • The Agent’s Operational Loop

    • Complete Loop Diagram

    • Taxonomy of Capabilities

Conversational Agent for language learning

Python implementation

Agent Codebase

agent.py

from google.adk.agents import LlmAgent
from google.adk.agents.callback_context import CallbackContext
from google.adk.sessions.sqlite_session_service import SqliteSessionService
from google.adk.memory.in_memory_memory_service import InMemoryMemoryService
from google.genai.types import GenerateContentConfig, ThinkingConfig
from google.adk.agents.run_config import RunConfig, StreamingMode
import os
import sys
from .tools import list_scenarios, select_scenario, get_vocabulary, mark_word_practiced, get_progress, end_scenario, get_user_profile
from .prompts import build_instruction
from .scenarios import SCENARIOS

SYSTEM_INSTRUCTION = """
Du er en vennlig og tålmodig norsk språkpartner. 
Bruk alltid norsk.

Du befinner deg først i en "meny"-modus. 
Hovedoppgaven din her er å hjelpe brukeren med å velge et praksis-scenario.
Bruk `list_scenarios`-verktøyet for å fortelle brukeren hvilke scenarioer (situasjoner) de kan øve på.

Når brukeren har valgt et scenario, eller du forstår hvilket de vil bruke, bruk `select_scenario`-verktøyet for å bytte rolle.
Etter å ha byttet rolle, ta umiddelbart på deg den nye personligheten (persona) og start samtalen med 'opening_line' fra scenarioet.
Hold deg i denne rollen og svar utelukkende på norsk.
Hold svarene dine svært korte (1-2 små setninger).
Dersom brukeren gjør grammatikkfeil eller uttaler noe feil, gi en rask og vennlig korreksjon på norsk.

Hvis brukeren står fast eller ber om hjelp til noen ord ("hva sier jeg?", "trenger vokabular"), bruk `get_vocabulary`-verktøyet for det valgte scenarioet og foreslå 2-3 relevante norske uttrykk med engelsk oversettelse.

VIKTIG: Hvis brukeren spør om betydningen eller definisjonen av et ord ("hva betyr", "definer", "hva er"), MÅ du bruke `lookup_word` verktøyet fra det eksterne registeret. Hvis MCP-serveren ikke svarer eller feiler, bruk den lokale kunnskapen din som fallback.

Dersom brukeren sier "bytt scenario" eller "new scenario", gå ut av rollen din og tilbake til "meny"-modus for å velge et nytt scenario.
"""

async def on_before_agent(callback_context: CallbackContext):
    state = callback_context.state
    scenario_id = state.get("active_scenario_id")
    
    memory_info = ""
    # Search memory if no active scenario yet to personalize greeting
    if not scenario_id:
        try:
            memories = await callback_context.search_memory("scenario fullført")
            if memories and memories.memories:
                memory_info = "\n\nKontekst fra tidligere samtaler:\n"
                for mem in memories.memories[-3:]: # last 3 memories
                    memory_info += f"- {mem.content.parts[0].text if mem.content and mem.content.parts else ''}\n"
        except Exception:
            pass

    if scenario_id and scenario_id in SCENARIOS:
        scenario = SCENARIOS[scenario_id]
        new_instruction = build_instruction(scenario, state)
        root_agent.instruction = new_instruction
    else:
        root_agent.instruction = SYSTEM_INSTRUCTION + memory_info

async def on_after_agent(callback_context: CallbackContext):
    count = callback_context.state.get("exchange_count", 0)
    callback_context.state["exchange_count"] = count + 1
    
    try:
        await callback_context.add_session_to_memory()
    except Exception:
        pass

session_service = SqliteSessionService("sessions.db")
memory_service = InMemoryMemoryService()

MODEL_ID = "gemini-live-2.5-flash-native-audio"

root_agent = LlmAgent(
    name="norsk_agent",
    model=MODEL_ID,
    instruction=SYSTEM_INSTRUCTION,
    tools=[list_scenarios, select_scenario, get_vocabulary, mark_word_practiced, get_progress, end_scenario, get_user_profile],
    before_agent_callback=on_before_agent,
    after_agent_callback=on_after_agent,
)

prompts.py

def build_instruction(scenario: dict, state: dict) -> str:
    """
    Builds a dynamic system instruction based on the scenario and the user's progress.
    """
    persona = scenario.get("persona", "")
    vocabulary = scenario.get("vocabulary", [])
    
    words_practiced = state.get("words_practiced", [])
    exchange_count = state.get("exchange_count", 0)
    
    exchange_count = state.get("exchange_count", 0)
    weak_words = state.get("user:weak_words", [])
    
    # Base setup
    instruction = f"{persona}\n\nDu må kun snakke NORSK til enhver tid.\n"
    instruction += "Svarene dine må være svært korte (1-2 små setninger).\n"
    instruction += "Dersom brukeren gjør feil, rett dem raskt og vennlig på norsk.\n\n"
    
    if weak_words:
        instruction += f"VIKTIG: I tidligere samtaler har brukeren hatt problemer med disse ordene: {', '.join(weak_words)}.\nPrøv å flette dem inn i samtalen hvis det passer.\n\n"
    
    # Inject state awareness if vocabulary exists
    if vocabulary:
        instruction += "Brukeren prøver å lære følgende vokabular:\n"
        for v in vocabulary:
            status = " [LÆRT]" if v["no"] in words_practiced else ""
            instruction += f"- {v['no']} ({v['en']}){status}\n"
        
        instruction += "\nPrøv å styre samtalen slik at brukeren får bruk for ordene som IKKE er lært ennå.\n"
        
        if exchange_count > 0:
            instruction += f"Dere har snakket sammen i {exchange_count} utvekslinger. Fortsett naturlig.\n"
            
    return instruction

scenarios.py

SCENARIOS = {
    "cafe": {
        "id": "cafe",
        "title_no": "På kafé",
        "title_en": "At the café",
        "persona": "Du er en vennlig servitør på en kafé i Oslo.",
        "opening_line": "Hei! Hva kan jeg friste med i dag?",
        "vocabulary": [
            {"no": "en kaffe", "en": "a coffee"},
            {"no": "kan jeg få", "en": "can I get"},
            {"no": "en bolle", "en": "a bun/pastry"},
            {"no": "å betale", "en": "to pay"},
            {"no": "kort eller kontant?", "en": "card or cash?"},
            {"no": "kvittering", "en": "receipt"},
            {"no": "en te", "en": "a tea"},
            {"no": "et smørbrød", "en": "a sandwich"},
            {"no": "sukker", "en": "sugar"},
            {"no": "melk", "en": "milk"},
            {"no": "takk", "en": "thanks"},
            {"no": "vær så snill", "en": "please"}
        ]
    },
    "hotel": {
        "id": "hotel",
        "title_no": "På hotellet",
        "title_en": "At the hotel",
        "persona": "Du er en imøtekommende resepsjonist på et hotell.",
        "opening_line": "Velkommen til oss! Har du en reservasjon?",
        "vocabulary": [
            {"no": "en reservasjon", "en": "a reservation"},
            {"no": "et rom", "en": "a room"},
            {"no": "et enkeltrom", "en": "a single room"},
            {"no": "et dobbeltrom", "en": "a double room"},
            {"no": "frokost inkludert", "en": "breakfast included"},
            {"no": "nøkkelkort", "en": "key card"},
            {"no": "å sjekke inn", "en": "to check in"},
            {"no": "å sjekke ut", "en": "to check out"},
            {"no": "bagasje", "en": "luggage"},
            {"no": "legitimasjon", "en": "ID"},
            {"no": "underskrift", "en": "signature"},
            {"no": "heisen", "en": "the elevator"}
        ]
    },
    "train": {
        "id": "train",
        "title_no": "På togstasjonen",
        "title_en": "At the train station",
        "persona": "Du er en hjelpsom billettør på togstasjonen.",
        "opening_line": "Hei! Hvor vil du reise i dag?",
        "vocabulary": [
            {"no": "en billett", "en": "a ticket"},
            {"no": "enveisbillett", "en": "one-way ticket"},
            {"no": "tur-retur", "en": "round trip"},
            {"no": "neste tog", "en": "next train"},
            {"no": "plassbillett", "en": "seat reservation"},
            {"no": "et spor", "en": "a track"},
            {"no": "en perrong", "en": "a platform"},
            {"no": "forsinket", "en": "delayed"},
            {"no": "i tide", "en": "on time"},
            {"no": "avgang", "en": "departure"},
            {"no": "ankomst", "en": "arrival"},
            {"no": "vindusplass", "en": "window seat"}
        ]
    },
    "grocery": {
        "id": "grocery",
        "title_no": "I matbutikken",
        "title_en": "At the grocery store",
        "persona": "Du sitter i kassa på en travel matbutikk.",
        "opening_line": "Hei! Trenger du en pose?",
        "vocabulary": [
            {"no": "en pose", "en": "a bag"},
            {"no": "å betale", "en": "to pay"},
            {"no": "kvittering", "en": "receipt"},
            {"no": "tilbud", "en": "offer/sale"},
            {"no": "brød", "en": "bread"},
            {"no": "melk", "en": "milk"},
            {"no": "grønnsaker", "en": "vegetables"},
            {"no": "frukt", "en": "fruit"},
            {"no": "dyrt", "en": "expensive"},
            {"no": "billig", "en": "cheap"},
            {"no": "å kaste", "en": "to throw away"},
            {"no": "en kasse", "en": "a checkout"}
        ]
    },
    "doctor": {
        "id": "doctor",
        "title_no": "Hos legen",
        "title_en": "At the doctor's",
        "persona": "Du er en profesjonell og omsorgsfull lege.",
        "opening_line": "Hei, kom inn og sett deg. Hva kan jeg hjelpe deg med?",
        "vocabulary": [
            {"no": "vondt i hodet", "en": "headache"},
            {"no": "feber", "en": "fever"},
            {"no": "å hoste", "en": "to cough"},
            {"no": "forkjølet", "en": "a cold"},
            {"no": "en resept", "en": "a prescription"},
            {"no": "et apotek", "en": "a pharmacy"},
            {"no": "en time", "en": "an appointment"},
            {"no": "piller", "en": "pills"},
            {"no": "å puste", "en": "to breathe"},
            {"no": "slapp", "en": "lethargic/weak"},
            {"no": "bedring", "en": "recovery"},
            {"no": "god bedring", "en": "get well soon"}
        ]
    }
}

tools.py

from google.adk.agents import Context
from .scenarios import SCENARIOS

def list_scenarios() -> list[dict[str, str]]:
    """
    Returns the list of available practice scenarios.
    
    Returns:
        A list of dictionaries, where each contains:
          - id: the internal scenario ID
          - title_no: the Norwegian title
          - title_en: the English title
    """
    return [
        {
            "id": s["id"],
            "title_no": s["title_no"],
            "title_en": s["title_en"]
        }
        for s in SCENARIOS.values()
    ]

def select_scenario(tool_context: Context, scenario_id: str) -> dict:
    """
    Selects a scenario by ID and returns its context, triggering a persona switch.

    Args:
        tool_context: The ADK context object containing session state.
        scenario_id: The ID of the scenario to switch to.

    Returns:
        A dictionary with the scenario details (id, title_no, title_en, persona, opening_line)
        or an error message in Norwegian if the ID is invalid.
    """
    if scenario_id not in SCENARIOS:
        return {"error": f"Beklager, jeg kjenner ikke til scenarioet '{scenario_id}'."}
    
    # Initialize state
    tool_context.state["active_scenario_id"] = scenario_id
    tool_context.state["words_practiced"] = []
    tool_context.state["exchange_count"] = 0
    
    scenario = SCENARIOS[scenario_id]
    return {
        "id": scenario["id"],
        "title_no": scenario["title_no"],
        "title_en": scenario["title_en"],
        "persona": scenario["persona"],
        "opening_line": scenario["opening_line"]
    }

def get_vocabulary(scenario_id: str) -> dict:
    """
    Retrieves the vocabulary list for a given scenario.

    Args:
        scenario_id: The ID of the scenario.

    Returns:
        A dictionary containing the English-Norwegian vocabulary pairs, or an error.
    """
    if scenario_id not in SCENARIOS:
        return {"error": f"Beklager, jeg kan ikke finne vokabular for '{scenario_id}'."}
    return {"vocabulary": SCENARIOS[scenario_id]["vocabulary"]}

def mark_word_practiced(tool_context: Context, word: str, correct: bool) -> str:
    """
    Marks a vocabulary word as practiced by the user.
    
    Args:
        tool_context: The ADK context.
        word: The Norwegian vocabulary word that was practiced.
        correct: True if the user used it correctly, False otherwise.
    
    Returns:
        A confirmation string.
    """
    practiced = tool_context.state.get("words_practiced", [])
    if correct and word not in practiced:
        practiced.append(word)
        tool_context.state["words_practiced"] = practiced
        return f"Flott! '{word}' er markert som lært."
    elif not correct:
        return f"Notert at '{word}' trenger mer øving."
    return f"'{word}' er allerede lært."

def get_progress(tool_context: Context) -> dict:
    """
    Returns the user's progress in the current scenario.
    
    Args:
        tool_context: The ADK context.
    
    Returns:
        A dictionary with the progress statistics.
    """
    active_scenario_id = tool_context.state.get("active_scenario_id")
    if not active_scenario_id or active_scenario_id not in SCENARIOS:
        return {"error": "Ingen aktivt scenario for å vise fremgang."}
        
    scenario = SCENARIOS[active_scenario_id]
    total_words = len(scenario.get("vocabulary", []))
    practiced = tool_context.state.get("words_practiced", [])
    practiced_count = len(practiced)
    
    return {
        "scenario_id": active_scenario_id,
        "words_practiced": practiced,
        "practiced_count": practiced_count,
        "total_words": total_words,
        "completion_percentage": (practiced_count / total_words) * 100 if total_words > 0 else 0
    }

def end_scenario(tool_context: Context) -> str:
    """
    Ends the current practice scenario, calculating accuracy and saving progress to the user's profile.
    
    Args:
        tool_context: The ADK context.
        
    Returns:
        A string summarizing the user's performance and returning to menu mode.
    """
    active_scenario_id = tool_context.state.get("active_scenario_id")
    if not active_scenario_id or active_scenario_id not in SCENARIOS:
        return "Bruk 'select_scenario' for å velge et scenario først."
        
    scenario = SCENARIOS[active_scenario_id]
    vocab = scenario.get("vocabulary", [])
    words_practiced = tool_context.state.get("words_practiced", [])
    
    target_words = {item["norwegian"] for item in vocab}
    practiced_set = set(words_practiced)
    weak_words = target_words - practiced_set
    
    completed = tool_context.state.get("user:completed_scenarios", 0)
    tool_context.state["user:completed_scenarios"] = completed + 1
    
    all_weak_words = set(tool_context.state.get("user:weak_words", []))
    all_weak_words.update(weak_words)
    tool_context.state["user:weak_words"] = list(all_weak_words)
    
    tool_context.state["active_scenario_id"] = None
    tool_context.state["words_practiced"] = []
    
    weak_str = ', '.join(weak_words) if weak_words else 'Ingen'
    return f"Scenario fullført! Du klarte {len(practiced_set)} av {len(target_words)} ord. Ord for ekstra øving: {weak_str}. Du er nå i meny-modus."

def get_user_profile(tool_context: Context) -> dict:
    """
    Retrieves the user's cross-session practice profile and statistics.
    
    Args:
        tool_context: The ADK context.
        
    Returns:
        A dictionary containing the user's statistics across all sessions.
    """
    return {
        "completed_scenarios": tool_context.state.get("user:completed_scenarios", 0),
        "weak_words": tool_context.state.get("user:weak_words", [])
    }

Warning

  • Codebase is developed via a coding agent, following Specs-driven development practices

Explore Core Anatomy and Taxonomy of the Conversational Agent

Agent agent.py

  • Instantiates a single LlmAgent around Gemini gemini-live-2.5-flash-native-audio

    • Bootstrapped with a Norwegian-only SYSTEM_INSTRUCTION

    • Session persistence via SqliteSessionService, and

    • Volatile personalization through InMemoryMemoryService

  • Dynamic persona shifts happen inside on_before_agent,

    • which swaps the root instruction with build_instruction() output whenever active_scenario_id changes,

    • while on_after_agent increments the turn counter and syncs conversation traces into memory

Tools tools.py

  • Exposes a taxonomy of stateful ADK tools:

    • Menu-level selectors (list_scenarios, select_scenario),

    • formative assessment helpers (mark_word_practiced, get_progress, end_scenario), and

    • longitudinal profiling (get_user_profile).

  • Vocabulary scaffolding comes from get_vocabulary

  • Every tool either reads from or mutates Context.state,

    • which becomes the single source of truth for exchange counts, weak-word tracking, and persona activation

Orchestration Layer

  • Conversation flow blends declarative prompts with scenario metadata from prompts.py and scenarios.py

  • build_instruction() injects adaptive cues (weak words, practiced vocabulary, exchange counts) into the system prompt so the LLM stays context-aligned without bespoke planner code.

  • Callbacks + state + tool contracts form an implicit orchestration layer without a separate conductor service.

The Agent’s Operational Loop

The agent’s operational loop consists of four phases:

  • Phase 1: Context Preparation (Before Agent)

  • Phase 2: Agent Reasoning (Core Loop)

  • Phase 3: Tool Execution (Tool Integration)

  • Phase 4: State Persistence (After Agent)

Phase 1: Context Preparation (Before Agent)

Before the agent processes any input, it prepares its context

Why this matters:

  • The agent’s behavior depends in the context (behavior is context-aware, not one-size-fits-all)

  • Session continuity should be maintained across conversations

  • The agent’s instruction set evolves based on user progress

What happens here:

  1. State Retrieval: The agent accesses persisted session state (current scenario, words practiced, exchange count)

  2. Memory Search: It queries the memory service for past interactions to provide context-aware personalization

  3. Dynamic Instruction Building: The system instruction is reconstructed based on current state, not static

Python implementation

Context Preparation
# From agent.py: on_before_agent callback
async def on_before_agent(callback_context: CallbackContext):
    state = callback_context.state
    scenario_id = state.get("active_scenario_id")

    memory_info = ""
    # Search memory for personalization
    if not scenario_id:
        try:
            memories = await callback_context.search_memory("scenario fullført")
            if memories and memories.memories:
                memory_info = "\n\nKontekst fra tidligere samtaler:\n"
                for mem in memories.memories[-3:]:
                    memory_info += f"- {mem.content...}\n"
        except Exception:
            pass

    # Dynamically build instruction based on state
    if scenario_id and scenario_id in SCENARIOS:
        scenario = SCENARIOS[scenario_id]
        new_instruction = build_instruction(scenario, state)
        root_agent.instruction = new_instruction
    else:
        root_agent.instruction = SYSTEM_INSTRUCTION + memory_info

Phase 2: Agent Reasoning (Core Loop)

The agent receives user input, applies tools to gather information, and generates a response:

The internal loop:

  1. Observe: Receive user utterance (“Jeg vil velge kafé-scenarioet”)

  2. Plan: Reason about which tools to use (e.g., select_scenario)

  3. Act: Call the tool with appropriate arguments

  4. Observe: Receive tool output with scenario details

  5. Generate: Compose response based on tool results

Python implementation

Agent Reasoning
root_agent = LlmAgent(
    name="norsk_agent",
    model=MODEL_ID,
    instruction=SYSTEM_INSTRUCTION,
    tools=[list_scenarios, select_scenario, get_vocabulary,
           mark_word_practiced, get_progress, end_scenario,
           get_user_profile],
)

Phase 3: Tool Execution (Tool Integration)

The agent invokes tools to affect state or retrieve information: State mutation happens here: The agent’s tools modify session state directly.

Example tool call sequence:

  • User: “Jeg vil velge kafé-scenarioet”

  • Agent reasoning: “User wants cafe scenario → call select_scenario(‘cafe’)”

Python implementation

Agent Reasoning
# Tool execution (tools.py):
def select_scenario(tool_context: Context, scenario_id: str) -> dict:
    if scenario_id not in SCENARIOS:
        return {"error": f"Beklager, jeg kjenner ikke til scenarioet '{scenario_id}'."}

    # Modify state
    tool_context.state["active_scenario_id"] = scenario_id
    tool_context.state["words_practiced"] = []
    tool_context.state["exchange_count"] = 0

    # Return result
    scenario = SCENARIOS[scenario_id]
    return {
        "id": scenario["id"],
        "title_no": scenario["title_no"],
        "persona": scenario["persona"],
        "opening_line": scenario["opening_line"]
    }

Phase 4: State Persistence (After Agent)

After generating a response, the agent persists relevant information:

What happens:

  1. Exchange counter is incremented (used to track conversation depth)

  2. The entire session is serialized to long-term memory

  3. Future conversations can query this memory for context

Python implementation

State Persistence
# From agent.py: on_after_agent callback
async def on_after_agent(callback_context: CallbackContext):
    count = callback_context.state.get("exchange_count", 0)
    callback_context.state["exchange_count"] = count + 1

    # Add current session to long-term memory
    try:
        await callback_context.add_session_to_memory()
    except Exception:
        pass

Complete Loop Diagram

User Input
    ↓
[1] on_before_agent()
    ├─ Retrieve session state
    ├─ Query memory for personalization
    └─ Rebuild dynamic instruction
    ↓
[2] Agent Reasoning
    ├─ Observe: Parse user intent
    ├─ Plan: Decide which tools to call
    ├─ Act: Execute tools
    └─ Generate: Create response
    ↓
[3] Tool Execution
    ├─ Modify state (active_scenario_id, words_practiced, etc.)
    └─ Return results to agent
    ↓
[4] on_after_agent()
    ├─ Increment exchange counter
    └─ Persist session to memory
    ↓
Response to User

Taxonomy of Capabilities

Agent capabilities fall into distinct categories:

  • Observation (Perception)

  • Action (Effecting State)

  • Communication (Expression)

  • Memory (Persistent Knowledge)

Observation (Perception)

How the agent perceives its environment:

Python implementation

Observation (Perception)
# Capability 1: Access to session state
state = callback_context.state
active_scenario_id = state.get("active_scenario_id")
words_practiced = state.get("words_practiced", [])
exchange_count = state.get("exchange_count", 0)

The agent observes:

  • Current scenario ID

  • Words practiced in this session

  • Number of exchanges (conversation depth)

  • User’s cross-session profile (completed scenarios, weak words)

Action (Effecting State)

How the agent affects its environment:

Python implementation

Action (Effecting State)
# Capability 2: Direct state mutation through tools
def mark_word_practiced(tool_context: Context, word: str, correct: bool) -> str:
    practiced = tool_context.state.get("words_practiced", [])
    if correct and word not in practiced:
        practiced.append(word)
        tool_context.state["words_practiced"] = practiced  # ← State mutation
        return f"Flott! '{word}' er markert som lært."

The agent can:

  • Change the active scenario

  • Mark words as learned

  • Update user profiles

  • Track progress

Communication (Expression)

How the agent expresses itself:

Python implementation

Communication (Expression)
# Capability 3: Natural language generation constrained by instruction
SYSTEM_INSTRUCTION = """
Du er en vennlig og tålmodig norsk språkpartner.
Bruk alltid norsk.
Hold svarene dine svært korte (1-2 små setninger).
Dersom brukeren gjør grammatikkfeil, gi en rask og vennlig korreksjon.
"""

The agent is instructed to:

  • Speak only Norwegian

  • Keep responses short (1-2 sentences)

  • Provide gentle corrections

  • Adapt persona based on scenario

Memory (Persistent Knowledge)

How the agent remembers:

Python implementation

Memory (Persistent Knowledge)
# Capability 4: Long-term memory retrieval and storage
memories = await callback_context.search_memory("scenario fullført")
# Can query for:
# - Completed scenarios
# - Weak words (weak_words list)
# - Past interactions

The agent remembers:

  • Which scenarios have been completed

  • Words the user struggles with

  • The trajectory of learning progress