Lab 4: Self-Critique Loops¶

⏱️ Estimated completion time: 35 minutes

Overview¶

This lab demonstrates the powerful concept of reflection in agentic systems. The agent generates content, critiques its own work, and iteratively improves the output. This self-reflective capability is crucial for building more reliable and self-improving agents.

Learning Objectives¶

By the end of this lab, you will understand: - Implementation of reflection loops in LangGraph - Self-critique mechanisms for quality improvement - Iterative refinement processes - When to terminate reflection loops

Prerequisites¶

Python 3.8+
LangGraph installed (pip install langgraph)

Key Concepts¶

Reflection Loops¶

Reflection loops allow agents to critique and improve their own output through iterative cycles of generation, evaluation, and refinement.

Self-Critique¶

Agents can evaluate the quality of their own work using internal criteria or external validation.

Lab Code¶

#!/usr/bin/env python3
"""
Chapter 4 - Self-Critique with LangGraph
----------------------------------------
This example demonstrates the powerful concept of reflection in agentic systems.
The agent generates content, critiques its own work, and iteratively improves
the output.

Key concepts:
- Reflection loops for self-improvement
- Quality assessment and iterative refinement
- Termination conditions for reflection cycles
"""

import random
from typing import TypedDict, Optional, List
from langgraph.graph import StateGraph

class ReflectionState(TypedDict, total=False):
    task: str
    draft: str
    critique: str
    final_output: str
    iteration: int
    quality_score: float
    improvement_history: List[str]

# ---------------------------------------------------------------------------
# Mock LLM functions for demonstration --------------------------------------
# ---------------------------------------------------------------------------

def generate_content(task: str) -> str:
    """Simulate content generation based on task."""
    content_templates = {
        "marketing email": [
            "Dear valued customer, we're excited to announce our new product!",
            "Hello! Don't miss out on this amazing opportunity to save big!",
            "Greetings! Our revolutionary new service is now available."
        ],
        "technical documentation": [
            "This API endpoint accepts POST requests with JSON payload.",
            "The system architecture consists of three main components.",
            "Installation requires Python 3.8+ and the following dependencies."
        ],
        "creative story": [
            "Once upon a time, in a land far away, there lived a brave knight.",
            "The spaceship hurtled through the cosmos towards an unknown destination.",
            "Sarah discovered the hidden doorway behind the old bookshelf."
        ]
    }

    # Use task keywords to determine template category
    for category in content_templates:
        if category in task.lower():
            return random.choice(content_templates[category])

    # Default generic content
    return f"This is a response to the task: {task}"

def critique_content(content: str, task: str) -> tuple[str, float]:
    """Simulate content critique with quality scoring."""
    issues = []
    quality_score = 7.0  # Base score

    # Check for various quality factors
    if len(content) < 50:
        issues.append("Content is too brief and lacks detail")
        quality_score -= 2.0

    if not any(char.isupper() for char in content):
        issues.append("Content lacks proper capitalization")
        quality_score -= 0.5

    if "!" not in content and "marketing" in task.lower():
        issues.append("Marketing content should be more energetic")
        quality_score -= 1.0

    if "API" in task and "endpoint" not in content:
        issues.append("Technical documentation should mention endpoints")
        quality_score -= 1.5

    if len(content.split()) < 10:
        issues.append("Content needs more comprehensive coverage")
        quality_score -= 1.0

    # Add some randomness to simulate LLM variability
    quality_score += random.uniform(-0.5, 0.5)
    quality_score = max(0.0, min(10.0, quality_score))  # Clamp to 0-10 range

    if not issues:
        critique = "The content meets quality standards."
    else:
        critique = f"Issues identified: {'; '.join(issues)}"

    return critique, quality_score

def improve_content(original: str, critique: str, task: str) -> str:
    """Simulate content improvement based on critique."""
    improved = original

    if "too brief" in critique:
        improved += " Here are additional details and comprehensive information about the topic."

    if "capitalization" in critique:
        improved = improved.capitalize()

    if "energetic" in critique:
        improved += " This is an incredible opportunity you won't want to miss!"

    if "endpoints" in critique:
        improved += " The endpoint supports GET, POST, PUT, and DELETE operations."

    if "comprehensive" in critique:
        improved += f" Let me provide more thorough coverage of {task}."

    return improved

# ---------------------------------------------------------------------------
# Graph nodes ---------------------------------------------------------------
# ---------------------------------------------------------------------------

def generate_draft(state: ReflectionState) -> ReflectionState:
    """Generate initial draft or improved version based on critique."""
    task = state["task"]

    if state.get("critique"):
        # Improve existing draft based on critique
        current_draft = state.get("draft", "")
        critique = state.get("critique", "")
        improved_draft = improve_content(current_draft, critique, task)
        state["draft"] = improved_draft
    else:
        # Generate initial draft
        initial_draft = generate_content(task)
        state["draft"] = initial_draft

    # Track iteration
    state["iteration"] = state.get("iteration", 0) + 1

    # Add to improvement history
    if "improvement_history" not in state:
        state["improvement_history"] = []
    state["improvement_history"].append(f"Iteration {state['iteration']}: {state['draft'][:50]}...")

    return state

def self_critique(state: ReflectionState) -> ReflectionState:
    """Evaluate the current draft and provide critique."""
    draft = state.get("draft", "")
    task = state.get("task", "")

    critique, quality_score = critique_content(draft, task)

    state["critique"] = critique
    state["quality_score"] = quality_score

    print(f"\nIteration {state.get('iteration', 0)}:")
    print(f"Draft: {draft}")
    print(f"Quality Score: {quality_score:.1f}/10")
    print(f"Critique: {critique}")

    return state

def finalize_output(state: ReflectionState) -> ReflectionState:
    """Finalize the output once quality threshold is met."""
    state["final_output"] = state.get("draft", "")
    print(f"\n✅ Final output ready after {state.get('iteration', 0)} iterations")
    print(f"Final quality score: {state.get('quality_score', 0):.1f}/10")
    return state

# ---------------------------------------------------------------------------
# Conditional logic ---------------------------------------------------------
# ---------------------------------------------------------------------------

def should_continue_reflection(state: ReflectionState) -> str:
    """Determine whether to continue refining or finalize the output."""
    quality_score = state.get("quality_score", 0)
    iteration = state.get("iteration", 0)

    # Stop if quality is good enough (score >= 8) or max iterations reached
    if quality_score >= 8.0 or iteration >= 5:
        return "finalize"
    else:
        return "improve"

# ---------------------------------------------------------------------------
# Graph construction --------------------------------------------------------
# ---------------------------------------------------------------------------

def build_reflection_graph() -> StateGraph:
    """Build a graph that implements reflection loops for content improvement."""
    g = StateGraph(ReflectionState)

    # Add nodes
    g.add_node("generate", generate_draft)
    g.add_node("critique", self_critique)
    g.add_node("finalize", finalize_output)

    # Set entry point
    g.set_entry_point("generate")

    # Add edges
    g.add_edge("generate", "critique")

    # Add conditional edge for reflection loop
    g.add_conditional_edges(
        "critique",
        should_continue_reflection,
        {
            "improve": "generate",  # Continue loop
            "finalize": "finalize"  # Exit loop
        }
    )

    # Set finish point
    g.set_finish_point("finalize")

    return g

# ---------------------------------------------------------------------------
# Demo function -------------------------------------------------------------
# ---------------------------------------------------------------------------

def main():
    print("=== Self-Critique Reflection Loop Demo ===\n")

    # Example tasks to test
    tasks = [
        "Write a marketing email for our new AI-powered productivity app",
        "Create technical documentation for our REST API",
        "Write a creative story about time travel"
    ]

    # Build the graph
    graph = build_reflection_graph().compile()

    for task in tasks:
        print(f"\n{'='*60}")
        print(f"Task: {task}")
        print('='*60)

        # Run the reflection loop
        final_state = graph.invoke({"task": task})

        print(f"\n📝 Final Output:")
        print(f"{final_state['final_output']}")

        print(f"\n🔄 Improvement History:")
        for entry in final_state.get('improvement_history', []):
            print(f"  • {entry}")

if __name__ == "__main__":
    main()

How to Run¶

Save the code above as 04_reflection_loops.py
Install dependencies: pip install langgraph
Run the script: python 04_reflection_loops.py

Expected Output¶

=== Self-Critique Reflection Loop Demo ===

============================================================
Task: Write a marketing email for our new AI-powered productivity app
============================================================

Iteration 1:
Draft: Hello! Don't miss out on this amazing opportunity to save big!
Quality Score: 6.5/10
Critique: Issues identified: Marketing content should be more energetic

Iteration 2:
Draft: Hello! Don't miss out on this amazing opportunity to save big! This is an incredible opportunity you won't want to miss!
Quality Score: 8.2/10
Critique: The content meets quality standards.

✅ Final output ready after 2 iterations
Final quality score: 8.2/10

📝 Final Output:
Hello! Don't miss out on this amazing opportunity to save big! This is an incredible opportunity you won't want to miss!

🔄 Improvement History:
  • Iteration 1: Hello! Don't miss out on this amazing opportunity...
  • Iteration 2: Hello! Don't miss out on this amazing opportunity...

Key Concepts Explained¶

Reflection Loop Architecture¶

Generate: Creates initial content or improvements
Critique: Evaluates quality and identifies issues
Conditional Logic: Decides whether to continue or finalize

Quality Assessment¶

Numerical scoring system (0-10 scale)
Multiple criteria evaluation
Threshold-based termination

Iterative Improvement¶

Each iteration builds on previous work
Specific improvements based on critique
History tracking for transparency

Termination Conditions¶

Quality threshold reached (score ≥ 8.0)
Maximum iterations limit (5 iterations)
Prevents infinite loops

Advanced Patterns¶

Multi-Criteria Evaluation¶

def advanced_critique(content: str, criteria: List[str]) -> Dict[str, float]:
    """Evaluate content against multiple criteria."""
    scores = {}
    for criterion in criteria:
        scores[criterion] = evaluate_criterion(content, criterion)
    return scores

Weighted Quality Scoring¶

def weighted_quality_score(scores: Dict[str, float], weights: Dict[str, float]) -> float:
    """Calculate weighted average quality score."""
    total_score = sum(scores[criterion] * weights[criterion] for criterion in scores)
    total_weight = sum(weights.values())
    return total_score / total_weight

Exercises¶

Add more critique criteria: Implement checks for grammar, tone, or domain-specific requirements
Implement external validation: Add human feedback or external API evaluation
Dynamic termination: Adjust quality thresholds based on content type
Parallel critique: Evaluate multiple aspects simultaneously

Real-World Applications¶

Content Generation: Blog posts, marketing materials, documentation
Code Review: Automated code quality improvement
Creative Writing: Story refinement and editing
Research Papers: Academic writing improvement

Download Code¶

Download 04_reflection_loops.py