Case-study part 2: Explore basics of context engineering

Objectives

Understand how memory behave in agentic systems
Explore how to apply context engineering techniques in real-world

What Is Context?

Context is the complete information the agent needs to make decisions. It includes:

Short-term context:
- Immediate context: Current conversation, active scenario
- Session context: User’s state within one session
Long-term context: User’s profile across sessions

Session Management

What Is a Session?

A session represents one continuous interaction between user and agent:

session_service = SqliteSessionService("sessions.db")

Sessions are:

Persistent: Stored in a database
Isolated: One user, one conversation thread
Stateful: Maintain variables that evolve during the conversation

How Sessions Store State

Python implementation

# In tools.py
def select_scenario(tool_context: Context, scenario_id: str) -> dict:
    # These go into session state
    tool_context.state["active_scenario_id"] = scenario_id
    tool_context.state["words_practiced"] = []
    tool_context.state["exchange_count"] = 0

# In agent.py - on_after_agent callback
async def on_after_agent(callback_context: CallbackContext):
    count = callback_context.state.get("exchange_count", 0)
    callback_context.state["exchange_count"] = count + 1  # Persisted

Session State Schema

Session State = {
  "active_scenario_id": "cafe",           # Current practice scenario
  "words_practiced": ["en kaffe", "å betale"],  # Words learned this session
  "exchange_count": 5,                    # Number of turns in conversation

  # Cross-session profile
  "user:completed_scenarios": 3,
  "user:weak_words": ["kort eller kontant?", "heisen"],

  # Additional context
  "user:preferred_difficulty": "intermediate"
}

Retrieving Session State

Python implementation

# In on_before_agent
state = callback_context.state
scenario_id = state.get("active_scenario_id")
words_practiced = state.get("words_practiced", [])

# Used to build dynamic instruction
if scenario_id and scenario_id in SCENARIOS:
    scenario = SCENARIOS[scenario_id]
    new_instruction = build_instruction(scenario, state)

Memory Architectures

Our agent uses two-tier memory:

Tier 1: Short-Term Memory (Session State)

Scope: Current conversation only
Capacity: Small (dictionary)
Retrieval: Instant (in-memory access)
Content: Active scenario, current words practiced, exchange count

Python implementation

# Accessed every turn
state.get("active_scenario_id")
state.get("words_practiced")

Tier 2: Long-Term Memory (Persistent Storage)

Scope: Across all sessions
Capacity: Large (database)
Retrieval: On-demand (memory search)
Content: Completed scenarios, weak words, interaction history

Python implementation

# From on_before_agent
memories = await callback_context.search_memory("scenario fullført")
if memories and memories.memories:
    memory_info = "\n\nKontekst fra tidligere samtaler:\n"
    for mem in memories.memories[-3:]:  # Last 3 memories
        memory_info += f"- {mem.content...}\n"

Memory Lifecycle

Turn 1 of Session A:
    ├─ Short-term: empty
    ├─ Long-term: search for past completions
    └─ Add personalization to instruction

Turn 2 of Session A:
    ├─ Short-term: words_practiced = ["en kaffe"]
    └─ Update persists in session

Turn 3 of Session A:
    ├─ on_after_agent: serialize session to memory
    ├─ Long-term: "User practiced scenario:cafe, learned 1 words"
    └─ Memory serves future sessions

Session B (next day):
    ├─ Short-term: reset (new session)
    ├─ Long-term: "User completed cafe scenario before"
    └─ Memory retrieval enables personalization ("Glad to see you back!")

How Memory Informs Behavior

Dynamic Instruction Building Using Memory:

Python implementation

# From prompts.py
def build_instruction(scenario: dict, state: dict) -> str:
    weak_words = state.get("user:weak_words", [])

    # Memory informs instruction
    if weak_words:
        instruction += f"VIKTIG: In previous conversations, the user struggled with: {weak_words}.\n"
        instruction += "Try to incorporate them into the conversation.\n"

The instruction evolves based on:

Immediate session state (active scenario)
Long-term memory (weak words from past attempts)

Example:

Session 1:
  User struggles with: "kort eller kontant?" (card or cash?)
  → Stored in memory as weak_word

Session 2 (next day):
  on_before_agent loads memory
  build_instruction includes: "Try to use 'kort eller kontant?' in conversation"
  → Agent naturally steers conversation to practice this phrase

Memory Search vs. Full Retrieval

Benefits:

Efficiency: Only relevant memories retrieved
Relevance: Recent memories prioritized
Scalability: System works with large memory stores

Python implementation

# Selective retrieval - only get relevant memories
memories = await callback_context.search_memory("scenario fullført")
# Instead of: all_memories = memory_service.get_all()

# Filter by recency
for mem in memories.memories[-3:]:  # Last 3 only
    memory_info += f"- {mem.content...}\n"

Practical Memory Example: The Learning Journey

Let’s trace how memory enables adaptive learning across sessions:

DAY 1 - Session 1:
┌─ User selects "café" scenario
├─ on_before_agent: no memory (first time)
│  └─ Instruction: basic introduction
├─ Conversation: user practices 8/12 words
│  └─ Weak: "kort eller kontant?", "kvittering"
├─ on_after_agent: session → memory
│  └─ Memory: "completed scenario:cafe, weak_words: [...]"
└─ Session ends

DAY 2 - Session 2:
┌─ User starts fresh (new session)
├─ on_before_agent: search memory "scenario fullført"
│  ├─ Found: "User completed café scenario"
│  └─ Instruction += "Context: User has practiced café before.\n"
│  └─ Instruction += "Try to use: 'kort eller kontant?', 'kvittering'\n"
├─ Conversation: agent naturally steers toward weak words
│  └─ Agent: "Hvordan vil du betale – kort eller kontant?"
│  └─ User: "Kort, takk!"
│  └─ weak_words now practiced!
├─ on_after_agent: updated memory
│  └─ Memory: weak_words reduced, user profile updated
└─ Session ends

DAY 7 - Session 3:
┌─ User tries "hotel" scenario
├─ on_before_agent: search memory for context
│  ├─ Found: "User completed café, weak words practiced"
│  └─ Instruction += "User is advancing to new scenario\n"
│  └─ Instruction += "Build on café vocabulary where it applies\n"
├─ Conversation: new scenario with continuity
└─ Session ends

Context Integration Patterns

Pattern 1: Dynamic Instruction Generation
Pattern 2: Callback-Driven State Management
Pattern 3: Tool-Based State Mutation

Pattern 1: Dynamic Instruction Generation

The agent doesn’t have a static system prompt. Instead, it’s constructed based on state:

# on_before_agent
if scenario_id and scenario_id in SCENARIOS:
    scenario = SCENARIOS[scenario_id]
    new_instruction = build_instruction(scenario, state)
    root_agent.instruction = new_instruction  # ← Dynamic assignment
else:
    root_agent.instruction = SYSTEM_INSTRUCTION + memory_info

Why dynamic?

Reduces context window usage (only relevant info included)
Enables precision (instruction tailored to current scenario)
Supports learning (instruction references weak words, completed scenarios)

Pattern 2: Callback-Driven State Management

The agent doesn’t directly manage state. Instead, callbacks hook into the agent’s lifecycle:

User Input
    ↓
on_before_agent() ← Prepare context
    ↓
Agent Reasoning Loop
    ↓
on_after_agent() ← Persist state
    ↓
Response

Why callbacks?

Separation of concerns: Agent reasoning separate from state management
Testability: Callbacks can be mocked
Extensibility: New concerns added without modifying agent core

Pattern 3: Tool-Based State Mutation

State changes only happen through tools, never directly in callbacks:

# ✓ GOOD: Tool modifies state
def mark_word_practiced(tool_context: Context, word: str, correct: bool) -> str:
    practiced = tool_context.state.get("words_practiced", [])
    tool_context.state["words_practiced"] = practiced + [word]

# ✗ BAD: Callback modifies state
# async def on_after_agent(callback_context):
#     callback_context.state["words_practiced"].append(word)  # Side effect!

Why?

Auditability: All state changes are tool calls (logged)
Consistency: State mutations always go through validation (error checking in tool)
Understandability: Reading tools shows what state changes are possible