AI Agents Complete Guide: From Theory to Industrial Practice

Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human language, but they face a critical limitation: they remain passive responders confined to their training data. AI Agents break this barrier by transforming static models into autonomous problem-solvers that can plan, use external tools, maintain memory, and iteratively refine their approaches. This article explores how AI Agents extend LLMs from mere text generators into active reasoning systems capable of handling complex, multi-step real-world tasks.

We'll trace the evolution from basic prompt engineering to sophisticated agent architectures, examine the four core capabilities that define modern agents (planning, memory, tool use, and reflection), dissect popular frameworks like LangChain and AutoGPT, understand multi-agent collaboration patterns, and analyze how these systems are evaluated in production. Whether you're building your first agent or scaling to multi-agent orchestration, this guide provides both theoretical foundations and practical implementation details to help you navigate this rapidly evolving field.

What Are AI Agents?

The term "AI Agent" has become increasingly prevalent in the LLM ecosystem, but its definition varies widely across different contexts. At its core, an AI Agent is an autonomous system powered by a large language model that can perceive its environment, make decisions, take actions, and iteratively work towards achieving specific goals without constant human intervention.

From Static LLMs to Dynamic Agents

Traditional LLM interactions follow a simple request-response pattern: you provide a prompt, the model generates text, and the interaction ends. This approach has significant limitations:

No persistence: Each interaction is isolated with no memory of previous exchanges
Limited reasoning: Complex problems requiring multiple steps are difficult to solve in a single forward pass
No tool access: The model cannot verify facts, perform calculations, or interact with external systems
Static knowledge: Information is frozen at the training cutoff date

AI Agents address these limitations by wrapping LLMs in a reasoning loop that enables:

# Traditional LLM interaction
def traditional_llm(prompt):
    response = llm.generate(prompt)
    return response  # One-shot, no follow-up

# AI Agent interaction
def agent_loop(task):
    state = initialize_memory()
    while not is_goal_achieved(state):
        # Perceive current state
        observation = observe_environment(state)
        
        # Think and plan
        thought = llm.reason(observation, state.memory)
        action = llm.decide_action(thought, available_tools)
        
        # Act in environment
        result = execute_action(action)
        
        # Update memory and reflect
        state.memory.update(thought, action, result)
        state = evaluate_progress(state, result)
    
    return state.final_output

This fundamental shift from passive generation to active reasoning is what distinguishes agents from vanilla LLMs.

Core Components of AI Agents

Every effective AI Agent consists of several interconnected components:

1. Brain (LLM Core)

The LLM serves as the agent's reasoning engine, responsible for: - Understanding natural language instructions - Breaking down complex goals into actionable steps - Generating code, queries, or API calls - Synthesizing information from multiple sources

from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_functions_agent

# Initialize the LLM brain
llm = ChatOpenAI(
    model="gpt-4-turbo",
    temperature=0,  # Deterministic for reliable reasoning
    max_tokens=2000
)

# The brain processes thoughts and generates actions
thought = llm.predict(
    "Given the task 'Find the weather in Tokyo', what should I do first?"
)
# Output: "I need to use a weather API tool to fetch current weather data for Tokyo."

2. Planning Module

Planning is the agent's ability to decompose complex tasks into manageable sub-tasks. Two primary planning strategies exist:

Task Decomposition involves breaking down the goal hierarchically:

class TaskPlanner:
    def __init__(self, llm):
        self.llm = llm
    
    def decompose_task(self, goal):
        prompt = f"""Break down this goal into a sequence of actionable steps:
        Goal: {goal}
        
        Provide steps in this format:
        1. [First step with clear action]
        2. [Second step depending on step 1]
        ...
        """
        response = self.llm.predict(prompt)
        return self.parse_steps(response)
    
    def parse_steps(self, response):
        # Extract numbered steps from LLM response
        steps = []
        for line in response.split('\n'):
            if line.strip() and line[0].isdigit():
                steps.append(line.split('.', 1)[1].strip())
        return steps

planner = TaskPlanner(llm)
steps = planner.decompose_task(
    "Research competitor pricing and create a comparative analysis report"
)
# Returns: [
#   "Identify top 5 competitors in the market",
#   "For each competitor, search for their public pricing pages",
#   "Extract pricing tiers and feature comparisons",
#   "Compile data into a structured format",
#   "Generate a summary analysis with key insights"
# ]

Multi-path reasoning explores different solution approaches simultaneously:

class MultiPathPlanner:
    def generate_alternatives(self, task, num_paths=3):
        prompt = f"""Generate {num_paths} different approaches to solve this task:
        Task: {task}
        
        For each approach, explain:
        - Strategy overview
        - Pros and cons
        - Estimated steps required
        """
        alternatives = self.llm.predict(prompt)
        return self.parse_alternatives(alternatives)
    
    def select_best_path(self, alternatives, constraints):
        evaluation_prompt = f"""Given these approaches and constraints:
        Approaches: {alternatives}
        Constraints: {constraints}
        
        Select the most appropriate approach and explain why.
        """
        return self.llm.predict(evaluation_prompt)

3. Memory System

Memory enables agents to maintain context across interactions. Modern agents typically implement multiple memory types:

Short-term Memory stores information within the current session:

class ShortTermMemory:
    def __init__(self, max_tokens=4000):
        self.messages = []
        self.max_tokens = max_tokens
    
    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        self._trim_if_needed()
    
    def _trim_if_needed(self):
        # Estimate tokens (rough approximation)
        total_tokens = sum(len(m["content"]) / 4 for m in self.messages)
        while total_tokens > self.max_tokens and len(self.messages) > 2:
            self.messages.pop(0)  # Remove oldest messages
            total_tokens = sum(len(m["content"]) / 4 for m in self.messages)
    
    def get_context(self):
        return self.messages

memory = ShortTermMemory()
memory.add_message("user", "Find restaurants in Paris")
memory.add_message("assistant", "I'll search for restaurants in Paris...")
memory.add_message("system", "Found 50 results")

Long-term Memory persists important information across sessions using vector databases:

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document

class LongTermMemory:
    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.vectorstore = Chroma(
            collection_name="agent_memory",
            embedding_function=self.embeddings,
            persist_directory="./memory_db"
        )
    
    def store_experience(self, task, action, result, success):
        doc = Document(
            page_content=f"Task: {task}\nAction: {action}\nResult: {result}",
            metadata={
                "task_type": self._classify_task(task),
                "success": success,
                "timestamp": datetime.now().isoformat()
            }
        )
        self.vectorstore.add_documents([doc])
    
    def recall_similar_experiences(self, current_task, k=3):
        # Retrieve similar past experiences
        similar_docs = self.vectorstore.similarity_search(
            current_task, k=k
        )
        return [doc.page_content for doc in similar_docs]
    
    def _classify_task(self, task):
        # Simple classification - could use LLM for more accuracy
        if "search" in task.lower():
            return "information_retrieval"
        elif "calculate" in task.lower() or "compute" in task.lower():
            return "computation"
        return "general"

ltm = LongTermMemory()
ltm.store_experience(
    task="Find population of Tokyo",
    action="Used search tool with query 'Tokyo population 2024'",
    result="14 million in city proper, 37 million in metro area",
    success=True
)

# Later, when facing a similar task
similar = ltm.recall_similar_experiences("What's the population of London?")
# Returns relevant past experiences to inform current approach

4. Tool Interface

Tools extend the agent's capabilities beyond text generation. A robust tool interface includes:

from typing import Callable, Dict, Any
from pydantic import BaseModel, Field

class Tool(BaseModel):
    name: str
    description: str
    function: Callable
    parameters: Dict[str, Any] = Field(default_factory=dict)
    
    def execute(self, **kwargs):
        try:
            result = self.function(**kwargs)
            return {"success": True, "result": result}
        except Exception as e:
            return {"success": False, "error": str(e)}

# Example: Calculator tool
def calculator(expression: str) -> float:
    """Safely evaluate mathematical expressions"""
    import ast
    import operator
    
    ops = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Pow: operator.pow
    }
    
    def eval_node(node):
        if isinstance(node, ast.Num):
            return node.n
        elif isinstance(node, ast.BinOp):
            return ops[type(node.op)](eval_node(node.left), eval_node(node.right))
        raise ValueError(f"Unsupported operation: {node}")
    
    tree = ast.parse(expression, mode='eval')
    return eval_node(tree.body)

calc_tool = Tool(
    name="calculator",
    description="Evaluates mathematical expressions. Input should be a valid expression like '2 + 2' or '10 * (3 + 4)'",
    function=calculator,
    parameters={
        "expression": {
            "type": "string",
            "description": "Mathematical expression to evaluate"
        }
    }
)

# Using the tool
result = calc_tool.execute(expression="(100 + 50) * 2")
print(result)  # {"success": True, "result": 300.0}

5. Reflection and Self-Critique

Advanced agents can evaluate their own outputs and iteratively improve:

class ReflectiveAgent:
    def __init__(self, llm, max_iterations=3):
        self.llm = llm
        self.max_iterations = max_iterations
    
    def solve_with_reflection(self, task):
        iterations = []
        current_solution = None
        
        for i in range(self.max_iterations):
            # Generate solution
            if current_solution is None:
                prompt = f"Solve this task: {task}"
            else:
                prompt = f"""Previous solution: {current_solution}
                Critique: {iterations[-1]['critique']}
                
                Improve the solution based on the critique."""
            
            solution = self.llm.predict(prompt)
            
            # Self-critique
            critique_prompt = f"""Evaluate this solution for the task: {task}
            Solution: {solution}
            
            Identify:
            1. What's correct
            2. What's missing or incorrect
            3. How to improve
            4. Is this solution satisfactory? (Yes/No)
            """
            critique = self.llm.predict(critique_prompt)
            
            iterations.append({
                "solution": solution,
                "critique": critique
            })
            
            # Check if satisfactory
            if "yes" in critique.lower().split("satisfactory?")[-1][:50]:
                break
            
            current_solution = solution
        
        return {
            "final_solution": iterations[-1]["solution"],
            "iterations": iterations
        }

reflective_agent = ReflectiveAgent(llm)
result = reflective_agent.solve_with_reflection(
    "Write a Python function to find all prime numbers up to n"
)

AI Agents vs. Traditional LLMs: A Direct Comparison

To clarify the distinction, consider how each system handles the same complex task:

Task: "Research the top 3 machine learning conferences, find their submission deadlines for 2025, and create a calendar event for each deadline minus 2 weeks."

Traditional LLM:

User: [sends entire task]
LLM: Based on my training data (cutoff 2023), the top ML conferences are:
     1. NeurIPS - typically December
     2. ICML - typically July
     3. CVPR - typically June
     
     However, I cannot access current 2025 deadlines or create calendar events.
     You would need to manually search for these deadlines and create the events.

Result: Provides outdated information, cannot execute actions.

AI Agent:

Agent (internal reasoning): 
  Step 1: Search for "top machine learning conferences 2025"
  [Uses web search tool]
  Result: NeurIPS, ICML, ICLR, CVPR, AAAI

  Step 2: For top 3 (NeurIPS, ICML, ICLR), search submission deadlines
  [Uses web search tool for each]
  Results:
    - NeurIPS 2025: May 15, 2025
    - ICML 2025: January 31, 2025
    - ICLR 2026: September 27, 2025

  Step 3: Calculate reminder dates (deadline - 14 days)
    - NeurIPS: May 1, 2025
    - ICML: January 17, 2025
    - ICLR: September 13, 2025

  Step 4: Create calendar events
  [Uses calendar API tool for each]
  
Agent response: 
  I've created calendar reminders for submission deadlines:
  1. ICML 2025 reminder: January 17, 2025 (deadline: January 31)
  2. NeurIPS 2025 reminder: May 1, 2025 (deadline: May 15)
  3. ICLR 2026 reminder: September 13, 2025 (deadline: September 27)
  
  All events include links to conference websites and submission portals.

Result: Autonomous execution with current information and concrete actions taken.

Core Capabilities of AI Agents

The power of AI Agents stems from four fundamental capabilities that work in concert. Understanding each capability in depth is essential for building effective agent systems.

Planning and Task Decomposition

Planning is the agent's ability to break down complex, ambiguous goals into structured, executable sequences. Without effective planning, agents resort to trial-and-error approaches that waste resources and fail frequently.

Chain-of-Thought Planning

The simplest planning approach is Chain-of-Thought (CoT), where agents explicitly verbalize their reasoning:

class ChainOfThoughtAgent:
    def __init__(self, llm):
        self.llm = llm
    
    def solve(self, problem):
        cot_prompt = f"""Solve this step-by-step. For each step, write:
        - Thought: [Your reasoning]
        - Action: [What to do]
        - Observation: [Expected outcome]
        
        Problem: {problem}
        
        think through this carefully:
        """
        
        reasoning = self.llm.predict(cot_prompt)
        return self._parse_cot_response(reasoning)
    
    def _parse_cot_response(self, response):
        steps = []
        current_step = {}
        
        for line in response.split('\n'):
            if line.startswith('Thought:'):
                if current_step:
                    steps.append(current_step)
                current_step = {'thought': line.replace('Thought:', '').strip()}
            elif line.startswith('Action:'):
                current_step['action'] = line.replace('Action:', '').strip()
            elif line.startswith('Observation:'):
                current_step['observation'] = line.replace('Observation:', '').strip()
        
        if current_step:
            steps.append(current_step)
        
        return steps

# Example usage
agent = ChainOfThoughtAgent(llm)
steps = agent.solve(
    "I have $100 and want to buy gifts for 5 friends. "
    "Each gift should cost between$15-25. Do I have enough money?"
)

# Output steps:
# [
#   {
#     'thought': 'First, determine the minimum and maximum total cost',
#     'action': 'Calculate: 5 friends × $15 (minimum) and 5 friends ×$25 (maximum)',
#     'observation': 'Minimum total: $75, Maximum total:$125'
#   },
#   {
#     'thought': 'Compare budget to the cost range',
#     'action': 'Check if $100 falls within [$75,$125]',
#     'observation': '$100 is greater than minimum ($75) but less than maximum ($125)'
#   },
#   {
#     'thought': 'Determine if budget is sufficient',
#     'action': 'Conclude based on comparison',
#     'observation': 'You have enough for minimum but not maximum. Budget carefully.'
#   }
# ]

ReAct: Reasoning + Acting

ReAct (Reason + Act) interleaves thinking with tool use, allowing agents to gather information before planning subsequent steps:

from typing import List, Dict, Optional

class ReActAgent:
    def __init__(self, llm, tools: List[Tool], max_iterations=10):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
        self.max_iterations = max_iterations
    
    def run(self, task: str) -> Dict:
        history = []
        observation = f"Task: {task}"
        
        for iteration in range(self.max_iterations):
            # Reasoning step
            thought_prompt = self._build_prompt(task, history, observation)
            response = self.llm.predict(thought_prompt)
            
            # Parse response
            parsed = self._parse_response(response)
            
            if parsed['type'] == 'final_answer':
                return {
                    'answer': parsed['content'],
                    'iterations': iteration + 1,
                    'history': history
                }
            
            # Action step
            tool_name = parsed['tool']
            tool_input = parsed['input']
            
            if tool_name not in self.tools:
                observation = f"Error: Tool '{tool_name}' not found. Available tools: {list(self.tools.keys())}"
            else:
                result = self.tools[tool_name].execute(**tool_input)
                observation = result['result'] if result['success'] else result['error']
            
            history.append({
                'iteration': iteration + 1,
                'thought': parsed.get('thought', ''),
                'action': f"{tool_name}({tool_input})",
                'observation': observation
            })
        
        return {
            'answer': 'Max iterations reached without finding answer',
            'iterations': self.max_iterations,
            'history': history
        }
    
    def _build_prompt(self, task, history, last_observation):
        history_str = "\n".join([
            f"Thought {h['iteration']}: {h['thought']}\n"
            f"Action {h['iteration']}: {h['action']}\n"
            f"Observation {h['iteration']}: {h['observation']}"
            for h in history
        ])
        
        tools_description = "\n".join([
            f"- {name}: {tool.description}"
            for name, tool in self.tools.items()
        ])
        
        return f"""You are a helpful assistant that can use tools to solve tasks.

Available tools:
{tools_description}

Task: {task}

{history_str}

Last observation: {last_observation}

Think step by step:
Thought: [Your reasoning about what to do next]
Action: [Tool name to use, or "Final Answer" if task is complete]
Action Input: [Input to the tool as JSON, or your final answer]

Your response:"""
    
    def _parse_response(self, response):
        # Extract thought, action, and input from LLM response
        lines = response.strip().split('\n')
        parsed = {'type': 'action'}
        
        for line in lines:
            if line.startswith('Thought:'):
                parsed['thought'] = line.replace('Thought:', '').strip()
            elif line.startswith('Action:'):
                action = line.replace('Action:', '').strip()
                if 'Final Answer' in action:
                    parsed['type'] = 'final_answer'
                else:
                    parsed['tool'] = action
            elif line.startswith('Action Input:'):
                input_str = line.replace('Action Input:', '').strip()
                if parsed['type'] == 'final_answer':
                    parsed['content'] = input_str
                else:
                    try:
                        parsed['input'] = eval(input_str)  # or json.loads for safer parsing
                    except:
                        parsed['input'] = {'query': input_str}
        
        return parsed

# Example: Building a ReAct agent with search and calculator tools
def search_tool(query: str) -> str:
    # Simulated search - in production, use real search API
    search_results = {
        "capital of France": "Paris is the capital of France, with a population of approximately 2.2 million.",
        "population of Tokyo": "Tokyo's population is approximately 14 million in the city proper, 37 million in the metro area."
    }
    return search_results.get(query.lower(), f"No results found for: {query}")

tools = [
    Tool(
        name="Search",
        description="Searches for factual information. Input should be a search query string.",
        function=search_tool
    ),
    calc_tool  # Defined earlier
]

react_agent = ReActAgent(llm, tools)
result = react_agent.run(
    "What's the population of Tokyo divided by the population of Paris?"
)

# The agent will:
# 1. Search for Tokyo population (Observation: ~14 million)
# 2. Search for Paris population (Observation: ~2.2 million)
# 3. Use calculator: 14000000 / 2200000 (Observation: ~6.36)
# 4. Return final answer: "Tokyo's population is approximately 6.36 times Paris's population"

Tree of Thoughts: Exploring Multiple Reasoning Paths

For complex problems, Tree of Thoughts (ToT) explores multiple reasoning branches and evaluates them:

import copy
from collections import deque

class TreeOfThoughts:
    def __init__(self, llm, max_depth=3, num_branches=3):
        self.llm = llm
        self.max_depth = max_depth
        self.num_branches = num_branches
    
    def solve(self, problem):
        root = ThoughtNode(
            content=f"Problem: {problem}",
            depth=0,
            value=0
        )
        
        # Best-first search through thought tree
        frontier = [root]
        best_solution = None
        best_value = float('-inf')
        
        while frontier and len(frontier) < 100:  # Limit total nodes
            # Select most promising node
            current = max(frontier, key=lambda n: n.value)
            frontier.remove(current)
            
            if current.depth >= self.max_depth:
                # Evaluate as potential solution
                value = self._evaluate_solution(problem, current)
                if value > best_value:
                    best_value = value
                    best_solution = current
                continue
            
            # Generate child thoughts
            children = self._generate_thoughts(problem, current)
            frontier.extend(children)
        
        return self._extract_solution_path(best_solution)
    
    def _generate_thoughts(self, problem, parent_node):
        prompt = f"""Given this problem and current reasoning path:

Problem: {problem}

Current path: {parent_node.get_path()}

Generate {self.num_branches} different next reasoning steps.
For each, provide:
- The reasoning step
- Why this approach might work
- Potential issues

Format each thought as:
Thought N: [reasoning step]
Rationale: [why it might work]
Issues: [potential problems]
"""
        
        response = self.llm.predict(prompt)
        thoughts = self._parse_thoughts(response)
        
        children = []
        for thought in thoughts[:self.num_branches]:
            child = ThoughtNode(
                content=thought['content'],
                depth=parent_node.depth + 1,
                parent=parent_node
            )
            # Evaluate this thought's promise
            child.value = self._evaluate_thought(problem, child)
            children.append(child)
        
        return children
    
    def _evaluate_thought(self, problem, node):
        eval_prompt = f"""Evaluate this reasoning step for solving the problem.
        
Problem: {problem}
Reasoning path: {node.get_path()}

Rate this path's promise from 0-10:
- Correctness: Does it make logical sense?
- Relevance: Does it address the problem?
- Progress: Does it move closer to a solution?
- Feasibility: Can it lead to a concrete answer?

Provide a score and brief justification.
"""
        
        response = self.llm.predict(eval_prompt)
        # Parse score from response
        try:
            score = float([s for s in response.split() if s.replace('.','').isdigit()][0])
            return min(max(score, 0), 10)  # Clamp to [0, 10]
        except:
            return 5.0  # Default score
    
    def _evaluate_solution(self, problem, node):
        eval_prompt = f"""Evaluate this complete solution to the problem.
        
Problem: {problem}
Solution path: {node.get_path()}

Rate this solution from 0-10:
- Correctness: Is the solution correct?
- Completeness: Does it fully address the problem?
- Clarity: Is the reasoning clear?

Provide a score.
"""
        
        response = self.llm.predict(eval_prompt)
        try:
            score = float([s for s in response.split() if s.replace('.','').isdigit()][0])
            return min(max(score, 0), 10)
        except:
            return 5.0
    
    def _parse_thoughts(self, response):
        thoughts = []
        current = {}
        
        for line in response.split('\n'):
            if line.startswith('Thought'):
                if current:
                    thoughts.append(current)
                current = {'content': line.split(':', 1)[1].strip() if ':' in line else ''}
            elif line.startswith('Rationale:'):
                current['rationale'] = line.replace('Rationale:', '').strip()
            elif line.startswith('Issues:'):
                current['issues'] = line.replace('Issues:', '').strip()
        
        if current:
            thoughts.append(current)
        
        return thoughts
    
    def _extract_solution_path(self, node):
        path = []
        current = node
        while current is not None:
            path.append(current.content)
            current = current.parent
        return list(reversed(path))

class ThoughtNode:
    def __init__(self, content, depth, parent=None, value=0):
        self.content = content
        self.depth = depth
        self.parent = parent
        self.value = value
    
    def get_path(self):
        path = []
        current = self
        while current is not None:
            path.append(current.content)
            current = current.parent
        return " -> ".join(reversed(path))

# Example usage
tot_solver = TreeOfThoughts(llm, max_depth=4, num_branches=3)
solution = tot_solver.solve(
    "Design a database schema for a social media platform that supports "
    "posts, comments, likes, follows, and direct messaging while ensuring "
    "scalability and fast query performance."
)

Memory Architecture

Memory is what transforms a stateless LLM into a stateful agent that learns from experience and maintains context across long interactions.

Working Memory (Conversation Buffer)

The simplest memory form stores recent conversation history:

from collections import deque

class ConversationBuffer:
    def __init__(self, max_messages=20, max_tokens=4000):
        self.messages = deque(maxlen=max_messages)
        self.max_tokens = max_tokens
    
    def add(self, role, content):
        self.messages.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now()
        })
        self._enforce_token_limit()
    
    def _enforce_token_limit(self):
        while self._estimate_tokens() > self.max_tokens and len(self.messages) > 2:
            self.messages.popleft()
    
    def _estimate_tokens(self):
        # Rough token estimation (4 chars ≈ 1 token)
        return sum(len(msg["content"]) for msg in self.messages) // 4
    
    def get_context(self, include_system=True):
        if include_system:
            return list(self.messages)
        return [msg for msg in self.messages if msg["role"] != "system"]
    
    def clear(self):
        self.messages.clear()

# Usage in agent
class ConversationalAgent:
    def __init__(self, llm):
        self.llm = llm
        self.memory = ConversationBuffer()
    
    def chat(self, user_input):
        self.memory.add("user", user_input)
        
        # Build prompt with conversation history
        messages = self.memory.get_context()
        response = self.llm.predict_messages(messages)
        
        self.memory.add("assistant", response)
        return response

Entity Memory: Tracking Important Information

Entity memory extracts and maintains information about specific entities (people, places, concepts):

import re
from typing import Dict, List

class EntityMemory:
    def __init__(self, llm):
        self.llm = llm
        self.entities = {}  # entity_name -> {attributes}
    
    def extract_entities(self, text):
        extract_prompt = f"""Extract all important entities from this text and their attributes:

Text: {text}

For each entity, provide:
- Name
- Type (person, place, organization, concept, etc.)
- Attributes (key facts about this entity)

Format as JSON:
{{ 
  "entities": [
    {{ "name": "...", "type": "...", "attributes": ["...", "..."] }},
    ...
  ]
 }}
"""
        
        response = self.llm.predict(extract_prompt)
        try:
            entities_data = eval(response)  # or json.loads
            for entity in entities_data.get("entities", []):
                self._update_entity(entity)
        except Exception as e:
            print(f"Error parsing entities: {e}")
    
    def _update_entity(self, entity_data):
        name = entity_data["name"]
        if name not in self.entities:
            self.entities[name] = {
                "type": entity_data["type"],
                "attributes": set(),
                "mentioned_count": 0,
                "last_mentioned": None
            }
        
        self.entities[name]["attributes"].update(entity_data.get("attributes", []))
        self.entities[name]["mentioned_count"] += 1
        self.entities[name]["last_mentioned"] = datetime.now()
    
    def get_entity_info(self, entity_name):
        return self.entities.get(entity_name, None)
    
    def get_context_about(self, entity_names: List[str]):
        context = []
        for name in entity_names:
            if name in self.entities:
                entity = self.entities[name]
                attributes_str = ", ".join(entity["attributes"])
                context.append(
                    f"{name} ({entity['type']}): {attributes_str}"
                )
        return "\n".join(context)

# Example usage
entity_mem = EntityMemory(llm)

conversation = """
User: Tell me about John. He works at Google as a senior engineer.
Assistant: I'll remember that John is a senior engineer at Google.
User: John is working on their new AI project.
Assistant: Got it, John is involved in Google's AI project.
"""

entity_mem.extract_entities(conversation)

# Later retrieval
john_info = entity_mem.get_entity_info("John")
# Returns: {
#   "type": "person",
#   "attributes": {"senior engineer", "works at Google", "working on AI project"},
#   "mentioned_count": 2,
#   "last_mentioned": datetime(...)
# }

Semantic Memory: Vector-based Retrieval

For long-term knowledge retention, semantic memory uses embeddings to retrieve relevant past experiences:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document

class SemanticMemory:
    def __init__(self, embedding_model="text-embedding-ada-002"):
        self.embeddings = OpenAIEmbeddings(model=embedding_model)
        self.vectorstore = None
        self.documents = []
    
    def add_memory(self, content, metadata=None):
        doc = Document(
            page_content=content,
            metadata=metadata or {"timestamp": datetime.now().isoformat()}
        )
        self.documents.append(doc)
        
        # Rebuild vector store
        if self.vectorstore is None:
            self.vectorstore = FAISS.from_documents(self.documents, self.embeddings)
        else:
            self.vectorstore.add_documents([doc])
    
    def recall(self, query, k=5, filter_metadata=None):
        if self.vectorstore is None:
            return []
        
        # Semantic similarity search
        results = self.vectorstore.similarity_search_with_score(
            query, k=k
        )
        
        # Filter by metadata if provided
        if filter_metadata:
            results = [
                (doc, score) for doc, score in results
                if all(doc.metadata.get(k) == v for k, v in filter_metadata.items())
            ]
        
        return [(doc.page_content, doc.metadata, score) for doc, score in results]
    
    def get_relevant_context(self, query, max_tokens=1000):
        memories = self.recall(query, k=10)
        
        context_parts = []
        current_tokens = 0
        
        for content, metadata, score in memories:
            tokens = len(content) // 4
            if current_tokens + tokens > max_tokens:
                break
            context_parts.append(f"[Relevance: {score:.2f}] {content}")
            current_tokens += tokens
        
        return "\n\n".join(context_parts)

# Example: Agent with semantic memory
class MemoryEnhancedAgent:
    def __init__(self, llm):
        self.llm = llm
        self.semantic_memory = SemanticMemory()
        self.conversation_buffer = ConversationBuffer()
    
    def process(self, user_input):
        # Retrieve relevant past context
        relevant_memories = self.semantic_memory.get_relevant_context(user_input)
        
        # Build prompt with both recent conversation and relevant memories
        prompt = f"""Relevant past context:
{relevant_memories}

Recent conversation:
{self._format_recent_messages()}

User: {user_input}

Assistant (respond naturally):"""
        
        response = self.llm.predict(prompt)
        
        # Store this interaction in memory
        self.conversation_buffer.add("user", user_input)
        self.conversation_buffer.add("assistant", response)
        
        # Add to semantic memory for long-term retention
        self.semantic_memory.add_memory(
            f"User asked: {user_input}\nAssistant responded: {response}",
            metadata={
                "timestamp": datetime.now().isoformat(),
                "type": "conversation"
            }
        )
        
        return response
    
    def _format_recent_messages(self):
        messages = self.conversation_buffer.get_context()
        return "\n".join([
            f"{msg['role'].capitalize()}: {msg['content']}"
            for msg in messages[-5:]  # Last 5 messages
        ])

Episodic Memory: Learning from Experience

Episodic memory stores complete episodes (sequences of actions and their outcomes) to inform future behavior:

class Episode:
    def __init__(self, task, steps, outcome, success):
        self.task = task
        self.steps = steps
        self.outcome = outcome
        self.success = success
        self.timestamp = datetime.now()
    
    def to_dict(self):
        return {
            "task": self.task,
            "steps": self.steps,
            "outcome": self.outcome,
            "success": self.success,
            "timestamp": self.timestamp.isoformat()
        }

class EpisodicMemory:
    def __init__(self):
        self.episodes = []
        self.embeddings = OpenAIEmbeddings()
    
    def store_episode(self, task, steps, outcome, success):
        episode = Episode(task, steps, outcome, success)
        self.episodes.append(episode)
    
    def retrieve_similar_episodes(self, current_task, k=3):
        if not self.episodes:
            return []
        
        # Embed current task
        current_embedding = self.embeddings.embed_query(current_task)
        
        # Embed all past episodes
        episode_texts = [ep.task for ep in self.episodes]
        episode_embeddings = self.embeddings.embed_documents(episode_texts)
        
        # Calculate similarities
        from numpy import dot
        from numpy.linalg import norm
        
        similarities = [
            dot(current_embedding, ep_emb) / (norm(current_embedding) * norm(ep_emb))
            for ep_emb in episode_embeddings
        ]
        
        # Get top k similar episodes
        sorted_indices = sorted(
            range(len(similarities)),
            key=lambda i: similarities[i],
            reverse=True
        )[:k]
        
        return [(self.episodes[i], similarities[i]) for i in sorted_indices]
    
    def learn_from_episodes(self, llm, current_task):
        similar_episodes = self.retrieve_similar_episodes(current_task, k=5)
        
        if not similar_episodes:
            return "No relevant past experience."
        
        episodes_text = "\n\n".join([
            f"Episode {i+1} (similarity: {sim:.2f}):\n"
            f"Task: {ep.task}\n"
            f"Steps taken: {ep.steps}\n"
            f"Outcome: {ep.outcome}\n"
            f"Success: {ep.success}"
            for i, (ep, sim) in enumerate(similar_episodes)
        ])
        
        learning_prompt = f"""Based on these past experiences with similar tasks:

{episodes_text}

Current task: {current_task}

What lessons can we learn? Provide:
1. What approaches worked well
2. What approaches failed
3. Recommended strategy for the current task
"""
        
        return llm.predict(learning_prompt)

# Example: Agent that learns from experience
class LearningAgent:
    def __init__(self, llm, tools):
        self.llm = llm
        self.tools = tools
        self.episodic_memory = EpisodicMemory()
    
    def execute_task(self, task):
        # Learn from similar past tasks
        lessons = self.episodic_memory.learn_from_episodes(self.llm, task)
        
        # Execute with learned knowledge
        steps = []
        prompt = f"""Task: {task}

Lessons from similar past tasks:
{lessons}

Create a step-by-step plan:"""
        
        plan = self.llm.predict(prompt)
        
        # Execute plan (simplified)
        try:
            # ... execution logic ...
            outcome = "Successfully completed task"
            success = True
        except Exception as e:
            outcome = f"Failed: {str(e)}"
            success = False
        
        # Store this episode
        self.episodic_memory.store_episode(task, steps, outcome, success)
        
        return outcome

Tool Use and Function Calling

Tools extend agents beyond text generation into concrete actions. Modern agents use function calling to interact with APIs, databases, calculators, search engines, and more.

Function Calling with OpenAI API

OpenAI's function calling allows structured tool invocation:

import openai
import json

class FunctionCallingAgent:
    def __init__(self, api_key, model="gpt-4-turbo"):
        self.client = openai.OpenAI(api_key=api_key)
        self.model = model
        self.available_functions = {}
    
    def register_tool(self, func, schema):
        """Register a Python function as a tool"""
        self.available_functions[func.__name__] = {
            "function": func,
            "schema": schema
        }
    
    def run(self, user_message, max_iterations=5):
        messages = [{"role": "user", "content": user_message}]
        
        for iteration in range(max_iterations):
            # Get function schemas
            functions = [
                tool["schema"]
                for tool in self.available_functions.values()
            ]
            
            # Call LLM with function definitions
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                functions=functions,
                function_call="auto"
            )
            
            response_message = response.choices[0].message
            
            # Check if LLM wants to call a function
            if response_message.function_call:
                # Extract function name and arguments
                function_name = response_message.function_call.name
                function_args = json.loads(response_message.function_call.arguments)
                
                # Execute the function
                function_to_call = self.available_functions[function_name]["function"]
                function_result = function_to_call(**function_args)
                
                # Add function call and result to conversation
                messages.append(response_message)
                messages.append({
                    "role": "function",
                    "name": function_name,
                    "content": json.dumps(function_result)
                })
            else:
                # LLM provided final answer
                return response_message.content
        
        return "Max iterations reached"

# Example tools
def get_weather(location: str, unit: str = "celsius"):
    """Get current weather for a location"""
    # Simulated weather data
    weather_data = {
        "Tokyo": {"temp": 22, "condition": "Sunny"},
        "London": {"temp": 15, "condition": "Cloudy"},
        "New York": {"temp": 18, "condition": "Rainy"}
    }
    
    data = weather_data.get(location, {"temp": 20, "condition": "Unknown"})
    return {
        "location": location,
        "temperature": data["temp"],
        "unit": unit,
        "condition": data["condition"]
    }

def calculate_trip_cost(distance_km: float, fuel_price_per_liter: float, fuel_efficiency_km_per_liter: float):
    """Calculate fuel cost for a trip"""
    liters_needed = distance_km / fuel_efficiency_km_per_liter
    total_cost = liters_needed * fuel_price_per_liter
    return {
        "distance": distance_km,
        "liters_needed": round(liters_needed, 2),
        "total_cost": round(total_cost, 2)
    }

# Register tools with schemas
agent = FunctionCallingAgent(api_key="your-api-key")

agent.register_tool(
    get_weather,
    schema={
        "name": "get_weather",
        "description": "Get the current weather for a specific location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city name, e.g., Tokyo, London"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
)

agent.register_tool(
    calculate_trip_cost,
    schema={
        "name": "calculate_trip_cost",
        "description": "Calculate the fuel cost for a road trip",
        "parameters": {
            "type": "object",
            "properties": {
                "distance_km": {
                    "type": "number",
                    "description": "Distance in kilometers"
                },
                "fuel_price_per_liter": {
                    "type": "number",
                    "description": "Price per liter of fuel"
                },
                "fuel_efficiency_km_per_liter": {
                    "type": "number",
                    "description": "Vehicle's fuel efficiency in km per liter"
                }
            },
            "required": ["distance_km", "fuel_price_per_liter", "fuel_efficiency_km_per_liter"]
        }
    }
)

# Use the agent
result = agent.run(
    "I'm planning a 500km trip from Tokyo. The weather looks good. "
    "If fuel costs 150 yen per liter and my car does 15 km/liter, "
    "how much will the fuel cost?"
)

# Agent will:
# 1. Call get_weather(location="Tokyo")
# 2. Call calculate_trip_cost(distance_km=500, fuel_price_per_liter=150, fuel_efficiency_km_per_liter=15)
# 3. Synthesize: "The weather in Tokyo is sunny at 22° C. For your 500km trip, you'll need approximately 33.33 liters of fuel, costing about 5000 yen."

Building Custom Tool Executors

For more control, implement custom tool execution logic:

from typing import Any, Callable, Dict, List
from enum import Enum

class ToolType(Enum):
    API_CALL = "api_call"
    COMPUTATION = "computation"
    DATABASE = "database"
    FILE_SYSTEM = "file_system"

class ToolExecutor:
    def __init__(self):
        self.tools = {}
        self.execution_history = []
    
    def register(self, name: str, func: Callable, tool_type: ToolType,
                 description: str, parameters: Dict):
        self.tools[name] = {
            "function": func,
            "type": tool_type,
            "description": description,
            "parameters": parameters,
            "call_count": 0
        }
    
    def execute(self, tool_name: str, **kwargs) -> Dict[str, Any]:
        if tool_name not in self.tools:
            return {
                "success": False,
                "error": f"Tool '{tool_name}' not found"
            }
        
        tool = self.tools[tool_name]
        
        # Validate parameters
        validation_result = self._validate_parameters(tool["parameters"], kwargs)
        if not validation_result["valid"]:
            return {
                "success": False,
                "error": f"Invalid parameters: {validation_result['errors']}"
            }
        
        # Execute with error handling and logging
        try:
            start_time = datetime.now()
            result = tool["function"](**kwargs)
            execution_time = (datetime.now() - start_time).total_seconds()
            
            # Log execution
            self.execution_history.append({
                "tool": tool_name,
                "parameters": kwargs,
                "result": result,
                "execution_time": execution_time,
                "timestamp": start_time
            })
            
            tool["call_count"] += 1
            
            return {
                "success": True,
                "result": result,
                "execution_time": execution_time
            }
        
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "tool": tool_name,
                "parameters": kwargs
            }
    
    def _validate_parameters(self, schema: Dict, provided: Dict) -> Dict:
        errors = []
        
        # Check required parameters
        for param_name, param_info in schema.items():
            if param_info.get("required", False) and param_name not in provided:
                errors.append(f"Missing required parameter: {param_name}")
            
            if param_name in provided:
                # Type validation
                expected_type = param_info.get("type")
                actual_value = provided[param_name]
                
                if expected_type == "string" and not isinstance(actual_value, str):
                    errors.append(f"{param_name} must be a string")
                elif expected_type == "number" and not isinstance(actual_value, (int, float)):
                    errors.append(f"{param_name} must be a number")
                elif expected_type == "boolean" and not isinstance(actual_value, bool):
                    errors.append(f"{param_name} must be a boolean")
        
        return {
            "valid": len(errors) == 0,
            "errors": errors
        }
    
    def get_tool_descriptions(self) -> List[Dict]:
        return [
            {
                "name": name,
                "description": tool["description"],
                "type": tool["type"].value,
                "parameters": tool["parameters"],
                "usage_count": tool["call_count"]
            }
            for name, tool in self.tools.items()
        ]
    
    def get_execution_stats(self) -> Dict:
        total_calls = len(self.execution_history)
        successful_calls = sum(1 for h in self.execution_history if "error" not in h)
        avg_execution_time = sum(h["execution_time"] for h in self.execution_history) / max(total_calls, 1)
        
        return {
            "total_calls": total_calls,
            "successful_calls": successful_calls,
            "success_rate": successful_calls / max(total_calls, 1),
            "average_execution_time": avg_execution_time,
            "most_used_tools": self._get_most_used_tools()
        }
    
    def _get_most_used_tools(self) -> List[tuple]:
        tool_counts = [(name, tool["call_count"]) for name, tool in self.tools.items()]
        return sorted(tool_counts, key=lambda x: x[1], reverse=True)[:5]

# Example: Advanced tool ecosystem
executor = ToolExecutor()

# Register computational tools
def prime_factors(n: int) -> List[int]:
    """Find prime factors of a number"""
    factors = []
    d = 2
    while d * d <= n:
        while n % d == 0:
            factors.append(d)
            n //= d
        d += 1
    if n > 1:
        factors.append(n)
    return factors

executor.register(
    name="prime_factorization",
    func=prime_factors,
    tool_type=ToolType.COMPUTATION,
    description="Finds prime factors of a given number",
    parameters={
        "n": {"type": "number", "required": True, "description": "Number to factorize"}
    }
)

# Register API tools
def fetch_github_repo(username: str, repo_name: str) -> Dict:
    """Fetch GitHub repository information"""
    # Simulated API call
    return {
        "full_name": f"{username}/{repo_name}",
        "stars": 1234,
        "forks": 567,
        "language": "Python",
        "description": "A sample repository"
    }

executor.register(
    name="github_repo_info",
    func=fetch_github_repo,
    tool_type=ToolType.API_CALL,
    description="Fetches information about a GitHub repository",
    parameters={
        "username": {"type": "string", "required": True},
        "repo_name": {"type": "string", "required": True}
    }
)

# Use the executor
result1 = executor.execute("prime_factorization", n=84)
# Returns: {"success": True, "result": [2, 2, 3, 7], "execution_time": 0.001}

result2 = executor.execute("github_repo_info", username="pytorch", repo_name="pytorch")
# Returns repository information

stats = executor.get_execution_stats()
# Returns usage statistics

Reflection and Self-Improvement

Reflection enables agents to critique their own outputs and iteratively improve. This meta-cognitive capability is crucial for handling ambiguous or complex tasks.

ReflexionAgent: Learning from Mistakes

class ReflexionAgent:
    def __init__(self, llm, tools, max_trials=3):
        self.llm = llm
        self.tools = tools
        self.max_trials = max_trials
        self.memory = []
    
    def solve_with_feedback(self, task, success_criteria):
        """Solve task with self-reflection and retry logic"""
        
        for trial in range(self.max_trials):
            # Generate solution attempt
            attempt = self._generate_attempt(task, trial)
            
            # Evaluate attempt
            evaluation = self._evaluate_attempt(attempt, success_criteria)
            
            # Store in memory
            self.memory.append({
                "trial": trial + 1,
                "attempt": attempt,
                "evaluation": evaluation
            })
            
            if evaluation["success"]:
                return {
                    "solution": attempt,
                    "trials_needed": trial + 1,
                    "reflection_history": self.memory
                }
            
            # Generate reflection on failure
            reflection = self._reflect_on_failure(task, attempt, evaluation)
            self.memory.append({
                "trial": trial + 1,
                "reflection": reflection
            })
        
        return {
            "solution": None,
            "trials_needed": self.max_trials,
            "error": "Failed to solve after max trials",
            "reflection_history": self.memory
        }
    
    def _generate_attempt(self, task, trial_number):
        if trial_number == 0:
            prompt = f"Solve this task: {task}"
        else:
            # Include reflections from previous trials
            previous_attempts = "\n\n".join([
                f"Trial {m['trial']}:\n"
                f"Attempt: {m.get('attempt', 'N/A')}\n"
                f"Evaluation: {m.get('evaluation', {}).get('feedback', 'N/A')}\n"
                f"Reflection: {m.get('reflection', 'N/A')}"
                for m in self.memory
                if 'reflection' in m
            ])
            
            prompt = f"""Previous attempts and reflections:
{previous_attempts}

Task: {task}

Based on the reflections above, provide an improved solution:"""
        
        return self.llm.predict(prompt)
    
    def _evaluate_attempt(self, attempt, success_criteria):
        eval_prompt = f"""Evaluate this solution against the criteria:

Solution: {attempt}

Success Criteria: {success_criteria}

Provide:
1. Does it meet the criteria? (Yes/No)
2. What's correct about the solution?
3. What's incorrect or missing?
4. Specific feedback for improvement

Format:
Success: [Yes/No]
Correct aspects: [...]
Issues: [...]
Feedback: [...]
"""
        
        evaluation_text = self.llm.predict(eval_prompt)
        
        # Parse evaluation
        success = "yes" in evaluation_text.lower().split("success:")[1].split("\n")[0]
        
        return {
            "success": success,
            "feedback": evaluation_text
        }
    
    def _reflect_on_failure(self, task, failed_attempt, evaluation):
        reflection_prompt = f"""Reflect on why this solution failed:

Task: {task}
Attempted Solution: {failed_attempt}
Evaluation: {evaluation['feedback']}

Provide a reflection that includes:
1. Root cause analysis: Why did this approach fail?
2. Key insights: What did we learn?
3. Strategy adjustment: What should we try differently?

Keep the reflection concise and actionable.
"""
        
        return self.llm.predict(reflection_prompt)

# Example usage
reflexion_agent = ReflexionAgent(llm, tools=[])

result = reflexion_agent.solve_with_feedback(
    task="Write a Python function to merge two sorted linked lists",
    success_criteria="Function must handle edge cases (empty lists, lists of different lengths), return a new merged sorted list, and have O(n+m) time complexity"
)

# The agent will:
# Trial 1: Generate initial solution
# If fails: Reflect on why (e.g., "Didn't handle empty list case")
# Trial 2: Generate improved solution incorporating reflection
# If fails: Reflect again (e.g., "Comparison logic was incorrect")
# Trial 3: Final attempt with all accumulated insights

Popular AI Agent Frameworks

Building agents from scratch provides maximum control but requires significant engineering effort. Several frameworks have emerged to streamline agent development.

LangChain: Modular Agent Framework

LangChain is the most widely adopted agent framework, offering composable components for building LLM applications.

Basic LangChain Agent

from langchain.agents import initialize_agent, AgentType
from langchain.agents import Tool
from langchain.llms import OpenAI
from langchain.utilities import SerpAPIWrapper, PythonREPL

# Initialize LLM
llm = OpenAI(temperature=0, model="gpt-4")

# Define tools
search = SerpAPIWrapper()
python_repl = PythonREPL()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Useful for answering questions about current events or finding recent information. Input should be a search query."
    ),
    Tool(
        name="Python_REPL",
        func=python_repl.run,
        description="Useful for executing Python code. Input should be valid Python code. Use this for calculations, data processing, or any computational task."
    )
]

# Initialize agent
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    max_iterations=5,
    early_stopping_method="generate"
)

# Run agent
result = agent.run(
    "What's the population of the capital of Japan? "
    "Then calculate what 15% of that population would be."
)

# Agent execution trace:
# Thought: I need to find the capital of Japan and its population
# Action: Search
# Action Input: "capital of Japan population"
# Observation: Tokyo is the capital with ~14 million people

# Thought: Now I need to calculate 15% of 14 million
# Action: Python_REPL
# Action Input: 14_000_000 * 0.15
# Observation: 2100000.0

# Thought: I now know the final answer
# Final Answer: The population of Tokyo (Japan's capital) is approximately 14 million people. 15% of that would be 2.1 million people.

Custom LangChain Agent with Memory

from langchain.agents import AgentExecutor, create_structured_chat_agent
from langchain.memory import ConversationBufferMemory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

class CustomLangChainAgent:
    def __init__(self, tools, model="gpt-4-turbo"):
        self.llm = ChatOpenAI(model=model, temperature=0)
        self.tools = tools
        
        # Initialize memory
        self.memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )
        
        # Create custom prompt
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful AI assistant with access to tools.
            
Available tools:
{tools}

Tool names: {tool_names}

When using tools, follow this format:
Thought: [reasoning about what to do]
Action: [tool name]
Action Input: [input for the tool]

When you have the final answer:
Thought: I now know the final answer
Final Answer: [your response to the user]

Begin!"""),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
            MessagesPlaceholder(variable_name="agent_scratchpad")
        ])
        
        # Create agent
        agent = create_structured_chat_agent(
            llm=self.llm,
            tools=self.tools,
            prompt=self.prompt
        )
        
        # Create agent executor
        self.agent_executor = AgentExecutor(
            agent=agent,
            tools=self.tools,
            memory=self.memory,
            verbose=True,
            max_iterations=10,
            handle_parsing_errors=True
        )
    
    def run(self, query):
        return self.agent_executor.invoke({"input": query})
    
    def clear_memory(self):
        self.memory.clear()

# Example: Building a research agent
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

def summarize_text(text: str, max_length: int = 100) -> str:
    """Summarize text to specified length"""
    words = text.split()
    if len(words) <= max_length:
        return text
    return " ".join(words[:max_length]) + "..."

tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Search Wikipedia for information about people, places, concepts, historical events, etc."
    ),
    Tool(
        name="Summarize",
        func=summarize_text,
        description="Summarizes long text into a shorter version. Input: text to summarize."
    )
]

research_agent = CustomLangChainAgent(tools)

# Multi-turn conversation with memory
response1 = research_agent.run("Who invented the telephone?")
response2 = research_agent.run("What year did that happen?")  # Uses memory of previous context
response3 = research_agent.run("Summarize his biography in 50 words")

AutoGPT: Autonomous Task Execution

AutoGPT represents a different paradigm: fully autonomous agents that pursue goals with minimal human intervention.

import json
from typing import List, Dict

class AutoGPTAgent:
    def __init__(self, llm, tools, workspace_dir="./agent_workspace"):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
        self.workspace_dir = workspace_dir
        self.memory = []
        self.goals = []
        
    def set_goals(self, goals: List[str]):
        """Set high-level goals for the agent"""
        self.goals = goals
    
    def run_autonomous(self, max_iterations=20):
        """Run autonomously until goals are achieved or max iterations reached"""
        
        for iteration in range(max_iterations):
            # Assess current state
            status = self._assess_progress()
            
            if status["goals_achieved"]:
                return {
                    "success": True,
                    "iterations": iteration + 1,
                    "final_state": status
                }
            
            # Generate next action plan
            action_plan = self._generate_action_plan(status)
            
            # Execute actions
            for action in action_plan["actions"]:
                result = self._execute_action(action)
                self.memory.append({
                    "iteration": iteration + 1,
                    "action": action,
                    "result": result
                })
            
            # Self-reflection
            reflection = self._reflect_on_progress()
            self.memory.append({
                "iteration": iteration + 1,
                "type": "reflection",
                "content": reflection
            })
        
        return {
            "success": False,
            "iterations": max_iterations,
            "message": "Max iterations reached without achieving all goals"
        }
    
    def _assess_progress(self) -> Dict:
        """Assess progress towards goals"""
        assessment_prompt = f"""Current goals:
{json.dumps(self.goals, indent=2)}

Memory of recent actions:
{json.dumps(self.memory[-5:], indent=2)}

Assess:
1. Which goals have been achieved?
2. Which goals are in progress?
3. Which goals haven't been started?
4. What obstacles have been encountered?

Provide assessment as JSON:
{{ 
  "goals_achieved": boolean,
  "completed_goals": [goal indices],
  "in_progress": [goal indices],
  "not_started": [goal indices],
  "obstacles": [list of obstacles]
 }}
"""
        
        response = self.llm.predict(assessment_prompt)
        try:
            return json.loads(response)
        except:
            return {"goals_achieved": False, "completed_goals": []}
    
    def _generate_action_plan(self, current_status: Dict) -> Dict:
        """Generate next actions based on current status"""
        planning_prompt = f"""You are an autonomous agent working towards these goals:
{json.dumps(self.goals, indent=2)}

Current status:
{json.dumps(current_status, indent=2)}

Available tools: {list(self.tools.keys())}

Generate a plan for the next 1-3 actions that will make progress towards the goals.

Format as JSON:
{{ 
  "reasoning": "Why these actions?",
  "actions": [
    {{ 
      "tool": "tool_name",
      "input": {{  }},
      "expected_outcome": "what should happen"
     }}
  ]
 }}
"""
        
        response = self.llm.predict(planning_prompt)
        try:
            return json.loads(response)
        except:
            return {"reasoning": "Parse error", "actions": []}
    
    def _execute_action(self, action: Dict) -> Dict:
        """Execute a single action"""
        tool_name = action.get("tool")
        if tool_name not in self.tools:
            return {"success": False, "error": f"Tool {tool_name} not found"}
        
        tool = self.tools[tool_name]
        tool_input = action.get("input", {})
        
        return tool.execute(**tool_input)
    
    def _reflect_on_progress(self) -> str:
        """Reflect on recent progress"""
        reflection_prompt = f"""Recent actions and results:
{json.dumps(self.memory[-3:], indent=2)}

Goals:
{json.dumps(self.goals, indent=2)}

Reflect:
1. Are we making good progress?
2. Should we change our strategy?
3. What should be prioritized next?

Provide a brief reflection (2-3 sentences):
"""
        
        return self.llm.predict(reflection_prompt)

# Example: Autonomous research and summary agent
auto_agent = AutoGPTAgent(llm, tools=[
    wikipedia_tool,
    file_writer_tool,
    summarizer_tool
])

auto_agent.set_goals([
    "Research the history of artificial intelligence",
    "Create a timeline of major AI breakthroughs",
    "Write a summary report and save it to a file"
])

result = auto_agent.run_autonomous(max_iterations=15)
# Agent will autonomously:
# 1. Search Wikipedia for AI history
# 2. Extract key dates and events
# 3. Organize into timeline format
# 4. Generate summary report
# 5. Save to file
# All without human intervention between steps

BabyAGI: Task-Driven Autonomous Agent

BabyAGI implements a task management system where the agent generates, prioritizes, and executes tasks.

from collections import deque
import re

class BabyAGIAgent:
    def __init__(self, llm, tools, objective):
        self.llm = llm
        self.tools = {tool.name: tool for tool in tools}
        self.objective = objective
        self.task_queue = deque()
        self.completed_tasks = []
        self.task_id_counter = 1
    
    def run(self, max_iterations=10):
        # Create initial task
        self.task_queue.append({
            "id": self.task_id_counter,
            "task": f"Develop a plan to achieve: {self.objective}"
        })
        self.task_id_counter += 1
        
        iteration = 0
        while self.task_queue and iteration < max_iterations:
            # Get next task
            current_task = self.task_queue.popleft()
            
            print(f"\n{'='*50}")
            print(f"Executing Task {current_task['id']}: {current_task['task']}")
            print(f"{'='*50}")
            
            # Execute task
            result = self._execute_task(current_task)
            
            # Store result
            self.completed_tasks.append({
                "id": current_task["id"],
                "task": current_task["task"],
                "result": result
            })
            
            # Generate new tasks based on result
            new_tasks = self._generate_new_tasks(current_task, result)
            
            # Prioritize and add new tasks
            prioritized_tasks = self._prioritize_tasks(new_tasks)
            for task in prioritized_tasks:
                task["id"] = self.task_id_counter
                self.task_id_counter += 1
                self.task_queue.append(task)
            
            iteration += 1
        
        return {
            "objective": self.objective,
            "completed_tasks": self.completed_tasks,
            "remaining_tasks": list(self.task_queue)
        }
    
    def _execute_task(self, task: Dict) -> str:
        """Execute a single task"""
        # Build context from completed tasks
        context = "\n".join([
            f"Task {t['id']}: {t['task']}\nResult: {t['result']}"
            for t in self.completed_tasks[-5:]  # Last 5 tasks
        ])
        
        execution_prompt = f"""You are an AI agent working towards this objective:
{self.objective}

Previously completed tasks:
{context}

Current task: {task['task']}

Available tools: {list(self.tools.keys())}

Execute this task and provide the result. If you need to use a tool, specify:
Tool: [tool_name]
Input: [tool_input]

Otherwise, provide your response directly.
"""
        
        response = self.llm.predict(execution_prompt)
        
        # Check if tool use is requested
        if "Tool:" in response:
            tool_name = re.search(r"Tool: (\w+)", response).group(1)
            if tool_name in self.tools:
                # Extract tool input (simplified parsing)
                input_match = re.search(r"Input: (.+?)(?:\n|$)", response, re.DOTALL)
                tool_input = input_match.group(1).strip() if input_match else ""
                
                tool_result = self.tools[tool_name].execute(query=tool_input)
                return f"Tool Result: {tool_result}"
        
        return response
    
    def _generate_new_tasks(self, completed_task: Dict, result: str) -> List[Dict]:
        """Generate new tasks based on completed task result"""
        generation_prompt = f"""Objective: {self.objective}

Last completed task:
Task: {completed_task['task']}
Result: {result}

Based on this result, generate a list of new tasks needed to continue progress towards the objective.

Requirements:
- Each task should be specific and actionable
- Tasks should build on the completed work
- Avoid redundancy with these completed tasks: {[t['task'] for t in self.completed_tasks]}

Provide tasks in this format:
1. [First new task]
2. [Second new task]
...
"""
        
        response = self.llm.predict(generation_prompt)
        
        # Parse numbered tasks
        tasks = []
        for line in response.split('\n'):
            match = re.match(r'^\d+\.\s*(.+)$', line.strip())
            if match:
                tasks.append({"task": match.group(1)})
        
        return tasks
    
    def _prioritize_tasks(self, tasks: List[Dict]) -> List[Dict]:
        """Prioritize tasks based on objective"""
        if not tasks:
            return []
        
        tasks_str = "\n".join([f"{i+1}. {t['task']}" for i, t in enumerate(tasks)])
        
        prioritization_prompt = f"""Objective: {self.objective}

Tasks to prioritize:
{tasks_str}

Reorder these tasks by priority (most important first) considering:
- Which tasks must be completed before others
- Which tasks contribute most directly to the objective
- Task dependencies

Provide the reordered list with numbers:
1. [Most important task]
2. [Second most important]
...
"""
        
        response = self.llm.predict(prioritization_prompt)
        
        # Parse prioritized tasks
        prioritized = []
        for line in response.split('\n'):
            match = re.match(r'^\d+\.\s*(.+)$', line.strip())
            if match:
                task_text = match.group(1)
                # Find matching task from original list
                for task in tasks:
                    if task_text.lower() in task['task'].lower() or task['task'].lower() in task_text.lower():
                        prioritized.append(task)
                        break
        
        # Add any tasks that weren't matched
        for task in tasks:
            if task not in prioritized:
                prioritized.append(task)
        
        return prioritized

# Example: Using BabyAGI for complex objective
babyagi = BabyAGIAgent(
    llm=llm,
    tools=[search_tool, calculator_tool, file_writer_tool],
    objective="Research quantum computing and create a beginner-friendly explanation with examples"
)

result = babyagi.run(max_iterations=10)

# BabyAGI execution flow:
# Task 1: Develop plan for quantum computing research
#   → Generates: [Research QC basics, Find examples, Create outline]
# Task 2: Research QC basics (prioritized first)
#   → Generates: [Research qubits, Research superposition, Research entanglement]
# Task 3: Research qubits
#   → Result: Explanation of qubits
# Task 4: Research superposition
#   → Result: Explanation of superposition
# ... continues until objective is achieved

Multi-Agent Systems

Single agents have limitations in handling complex, multi-faceted problems. Multi-agent systems distribute work across specialized agents that collaborate towards common goals.

Agent Collaboration Patterns

Hierarchical Organization

class ManagerAgent:
    """Manages and delegates to worker agents"""
    
    def __init__(self, llm, worker_agents: Dict[str, 'WorkerAgent']):
        self.llm = llm
        self.workers = worker_agents
    
    def execute_complex_task(self, task: str) -> Dict:
        # Decompose task into subtasks
        subtasks = self._decompose_task(task)
        
        # Assign subtasks to appropriate workers
        assignments = self._assign_tasks(subtasks)
        
        # Collect results from workers
        results = {}
        for worker_name, assigned_tasks in assignments.items():
            worker = self.workers[worker_name]
            results[worker_name] = []
            
            for subtask in assigned_tasks:
                result = worker.execute(subtask)
                results[worker_name].append(result)
        
        # Synthesize final result
        final_result = self._synthesize_results(task, results)
        
        return final_result
    
    def _decompose_task(self, task: str) -> List[Dict]:
        prompt = f"""Decompose this complex task into subtasks:
{task}

Available worker agents and their capabilities:
{self._get_worker_capabilities()}

Create a list of subtasks, each specifying:
- Description of subtask
- Which worker agent should handle it
- Any dependencies on other subtasks

Format as JSON array.
"""
        
        response = self.llm.predict(prompt)
        try:
            return json.loads(response)
        except:
            return []
    
    def _get_worker_capabilities(self) -> str:
        return "\n".join([
            f"- {name}: {worker.capabilities}"
            for name, worker in self.workers.items()
        ])
    
    def _assign_tasks(self, subtasks: List[Dict]) -> Dict[str, List]:
        assignments = {name: [] for name in self.workers.keys()}
        
        for subtask in subtasks:
            agent_name = subtask.get("assigned_to")
            if agent_name in assignments:
                assignments[agent_name].append(subtask["description"])
        
        return assignments
    
    def _synthesize_results(self, original_task: str, results: Dict) -> Dict:
        results_summary = json.dumps(results, indent=2)
        
        synthesis_prompt = f"""Original task: {original_task}

Results from worker agents:
{results_summary}

Synthesize these results into a cohesive final answer that addresses the original task.
"""
        
        final_answer = self.llm.predict(synthesis_prompt)
        
        return {
            "task": original_task,
            "worker_results": results,
            "final_answer": final_answer
        }

class WorkerAgent:
    """Specialized agent for specific tasks"""
    
    def __init__(self, name: str, capabilities: str, llm, tools):
        self.name = name
        self.capabilities = capabilities
        self.llm = llm
        self.tools = tools
    
    def execute(self, task: str) -> Dict:
        prompt = f"""You are {self.name}, specialized in: {self.capabilities}

Task: {task}

Execute this task using your capabilities and available tools: {[t.name for t in self.tools]}
"""
        
        result = self.llm.predict(prompt)
        
        return {
            "agent": self.name,
            "task": task,
            "result": result
        }

# Example: Building a research team
research_agent = WorkerAgent(
    name="ResearchAgent",
    capabilities="Searching for information, fact-checking, gathering data from multiple sources",
    llm=llm,
    tools=[search_tool, wikipedia_tool]
)

analysis_agent = WorkerAgent(
    name="AnalysisAgent",
    capabilities="Data analysis, statistical computation, pattern recognition",
    llm=llm,
    tools=[calculator_tool, data_analysis_tool]
)

writing_agent = WorkerAgent(
    name="WritingAgent",
    capabilities="Creating well-structured documents, reports, and summaries",
    llm=llm,
    tools=[file_writer_tool, summarizer_tool]
)

manager = ManagerAgent(
    llm=llm,
    worker_agents={
        "researcher": research_agent,
        "analyst": analysis_agent,
        "writer": writing_agent
    }
)

# Execute complex task
result = manager.execute_complex_task(
    "Create a comprehensive report on global electric vehicle adoption rates, "
    "including market analysis, growth trends, and future projections."
)

# Manager will:
# 1. Assign research task to ResearchAgent
# 2. Assign analysis task to AnalysisAgent  
# 3. Assign writing task to WritingAgent
# 4. Synthesize all results into final report

Debate and Consensus

Multiple agents debate different perspectives to reach better solutions:

class DebateSystem:
    def __init__(self, llm, num_agents=3, rounds=2):
        self.llm = llm
        self.num_agents = num_agents
        self.rounds = rounds
    
    def solve_by_debate(self, problem: str) -> Dict:
        # Initialize agents with different perspectives
        agents = [
            {"id": i, "stance": None, "arguments": []}
            for i in range(self.num_agents)
        ]
        
        debate_history = []
        
        # Initial round: each agent proposes solution
        for agent in agents:
            solution = self._generate_solution(problem, agent, [])
            agent["stance"] = solution
            agent["arguments"].append(solution)
            debate_history.append({
                "round": 0,
                "agent": agent["id"],
                "content": solution
            })
        
        # Debate rounds
        for round_num in range(1, self.rounds + 1):
            for agent in agents:
                # Agent reads other agents' arguments
                other_arguments = [
                    a["arguments"][-1]
                    for a in agents
                    if a["id"] != agent["id"]
                ]
                
                # Generate response
                response = self._generate_response(
                    problem, agent, other_arguments, round_num
                )
                
                agent["arguments"].append(response)
                debate_history.append({
                    "round": round_num,
                    "agent": agent["id"],
                    "content": response
                })
        
        # Final consensus
        consensus = self._reach_consensus(problem, agents)
        
        return {
            "problem": problem,
            "debate_history": debate_history,
            "final_consensus": consensus,
            "agent_final_stances": [a["arguments"][-1] for a in agents]
        }
    
    def _generate_solution(self, problem: str, agent: Dict, context: List) -> str:
        prompt = f"""You are Agent {agent['id']} in a debate to solve this problem:
{problem}

Propose your initial solution. Be specific and provide reasoning.
"""
        
        return self.llm.predict(prompt)
    
    def _generate_response(self, problem: str, agent: Dict, 
                          other_arguments: List[str], round_num: int) -> str:
        others_text = "\n\n".join([
            f"Other Agent's Argument:\n{arg}"
            for arg in other_arguments
        ])
        
        prompt = f"""You are Agent {agent['id']} in round {round_num} of a debate.

Problem: {problem}

Your previous stance: {agent['arguments'][-1]}

Other agents' arguments:
{others_text}

Respond by:
1. Addressing criticisms of your approach
2. Pointing out flaws in other approaches
3. Refining your solution based on the discussion
4. Or, if convinced, adopting a better solution

Provide your updated stance:
"""
        
        return self.llm.predict(prompt)
    
    def _reach_consensus(self, problem: str, agents: List[Dict]) -> str:
        all_arguments = "\n\n".join([
            f"Agent {agent['id']} Final Stance:\n{agent['arguments'][-1]}"
            for agent in agents
        ])
        
        consensus_prompt = f"""After debate, these agents proposed solutions to: {problem}

{all_arguments}

Synthesize the best elements from all arguments into a final consensus solution.
Explain which ideas from each agent were incorporated and why.
"""
        
        return self.llm.predict(consensus_prompt)

# Example: Using debate for problem solving
debate_system = DebateSystem(llm, num_agents=3, rounds=2)

result = debate_system.solve_by_debate(
    "Design a recommendation system for an e-commerce platform that balances "
    "accuracy, diversity, and business goals (conversion rate)"
)

# Output includes:
# - Round 0: Agent 0 proposes collaborative filtering
#           Agent 1 proposes content-based filtering
#           Agent 2 proposes hybrid approach
# - Round 1: Agents critique each other and refine
# - Round 2: Further refinement based on critiques
# - Final consensus: Synthesized best solution

Communication Protocols

Agents need structured communication to coordinate effectively:

from enum import Enum
from dataclasses import dataclass
from typing import Optional

class MessageType(Enum):
    REQUEST = "request"
    RESPONSE = "response"
    BROADCAST = "broadcast"
    QUERY = "query"

@dataclass
class Message:
    sender: str
    recipient: str
    msg_type: MessageType
    content: str
    context: Optional[Dict] = None
    requires_response: bool = False
    conversation_id: Optional[str] = None

class MessageBus:
    """Central message routing system for multi-agent communication"""
    
    def __init__(self):
        self.messages = []
        self.agents = {}
    
    def register_agent(self, agent_id: str, agent):
        self.agents[agent_id] = agent
    
    def send_message(self, message: Message):
        self.messages.append(message)
        
        # Route message
        if message.recipient == "broadcast":
            # Send to all agents except sender
            for agent_id, agent in self.agents.items():
                if agent_id != message.sender:
                    agent.receive_message(message)
        elif message.recipient in self.agents:
            # Send to specific agent
            self.agents[message.recipient].receive_message(message)
        else:
            print(f"Warning: Recipient {message.recipient} not found")
    
    def get_conversation(self, conversation_id: str) -> List[Message]:
        return [
            msg for msg in self.messages
            if msg.conversation_id == conversation_id
        ]

class CommunicativeAgent:
    """Agent with communication capabilities"""
    
    def __init__(self, agent_id: str, llm, message_bus: MessageBus):
        self.agent_id = agent_id
        self.llm = llm
        self.message_bus = message_bus
        self.inbox = []
        
        # Register with message bus
        message_bus.register_agent(agent_id, self)
    
    def receive_message(self, message: Message):
        self.inbox.append(message)
        
        # Auto-respond if required
        if message.requires_response:
            response = self._generate_response(message)
            self.send_message(
                recipient=message.sender,
                msg_type=MessageType.RESPONSE,
                content=response,
                conversation_id=message.conversation_id
            )
    
    def send_message(self, recipient: str, msg_type: MessageType, 
                     content: str, conversation_id: Optional[str] = None,
                     requires_response: bool = False):
        message = Message(
            sender=self.agent_id,
            recipient=recipient,
            msg_type=msg_type,
            content=content,
            conversation_id=conversation_id or self._generate_conversation_id(),
            requires_response=requires_response
        )
        self.message_bus.send_message(message)
    
    def broadcast(self, content: str, conversation_id: Optional[str] = None):
        self.send_message(
            recipient="broadcast",
            msg_type=MessageType.BROADCAST,
            content=content,
            conversation_id=conversation_id
        )
    
    def _generate_response(self, message: Message) -> str:
        prompt = f"""You are agent {self.agent_id}.

You received this message:
From: {message.sender}
Type: {message.msg_type.value}
Content: {message.content}

Generate an appropriate response:
"""
        
        return self.llm.predict(prompt)
    
    def _generate_conversation_id(self) -> str:
        import uuid
        return str(uuid.uuid4())[:8]

# Example: Multi-agent collaboration with messaging
message_bus = MessageBus()

coordinator = CommunicativeAgent("coordinator", llm, message_bus)
data_collector = CommunicativeAgent("data_collector", llm, message_bus)
analyzer = CommunicativeAgent("analyzer", llm, message_bus)
reporter = CommunicativeAgent("reporter", llm, message_bus)

# Coordinator initiates workflow
conversation_id = "proj_001"

# Step 1: Request data collection
coordinator.send_message(
    recipient="data_collector",
    msg_type=MessageType.REQUEST,
    content="Collect latest sales data for Q4 2024",
    conversation_id=conversation_id,
    requires_response=True
)

# Step 2: After receiving data, request analysis
coordinator.send_message(
    recipient="analyzer",
    msg_type=MessageType.REQUEST,
    content="Analyze sales trends and identify patterns",
    conversation_id=conversation_id,
    requires_response=True
)

# Step 3: Request report generation
coordinator.send_message(
    recipient="reporter",
    msg_type=MessageType.REQUEST,
    content="Generate executive summary report",
    conversation_id=conversation_id,
    requires_response=True
)

# Retrieve full conversation
conversation = message_bus.get_conversation(conversation_id)

Evaluation and Benchmarking

Measuring agent performance is crucial for understanding capabilities and limitations.

AgentBench: Comprehensive Agent Evaluation

AgentBench evaluates agents across 8 different environments:

class AgentBenchmark:
    def __init__(self, agent):
        self.agent = agent
        self.results = {}
    
    def run_benchmark_suite(self):
        """Run comprehensive benchmark tests"""
        
        # 1. Operating System Tasks
        os_score = self._test_os_tasks()
        
        # 2. Database Queries
        db_score = self._test_database_queries()
        
        # 3. Knowledge Graph Navigation
        kg_score = self._test_knowledge_graph()
        
        # 4. Web Browsing
        web_score = self._test_web_browsing()
        
        # 5. Code Generation
        code_score = self._test_code_generation()
        
        self.results = {
            "os_tasks": os_score,
            "database": db_score,
            "knowledge_graph": kg_score,
            "web_browsing": web_score,
            "code_generation": code_score,
            "overall": self._calculate_overall_score()
        }
        
        return self.results
    
    def _test_os_tasks(self) -> float:
        """Test agent's ability to perform OS operations"""
        tasks = [
            {
                "task": "Create a directory named 'test_folder' and create 3 text files in it",
                "validation": lambda: os.path.exists("test_folder") and len(os.listdir("test_folder")) == 3
            },
            {
                "task": "Find all Python files in the current directory and count lines of code",
                "validation": lambda result: isinstance(result, int) and result > 0
            },
            {
                "task": "Monitor CPU usage and alert if it exceeds 80%",
                "validation": lambda result: "cpu" in result.lower()
            }
        ]
        
        passed = 0
        for task_spec in tasks:
            try:
                result = self.agent.run(task_spec["task"])
                if task_spec["validation"](result):
                    passed += 1
            except:
                pass
        
        return passed / len(tasks)
    
    def _test_database_queries(self) -> float:
        """Test SQL generation and data retrieval"""
        # Test database with sample data
        test_cases = [
            {
                "question": "What is the average age of users who made purchases in the last month?",
                "expected_tables": ["users", "purchases"],
                "expected_operations": ["JOIN", "AVG", "WHERE"]
            },
            {
                "question": "List top 5 products by revenue",
                "expected_tables": ["products"],
                "expected_operations": ["ORDER BY", "LIMIT"]
            }
        ]
        
        score = 0
        for case in test_cases:
            result = self.agent.run(f"Generate SQL query: {case['question']}")
            
            # Check if result contains expected elements
            if all(table in result.lower() for table in case["expected_tables"]):
                score += 0.5
            if any(op in result.upper() for op in case["expected_operations"]):
                score += 0.5
        
        return score / len(test_cases)
    
    def _test_code_generation(self) -> float:
        """Test code generation quality"""
        test_problems = [
            {
                "description": "Write a function to find the longest palindromic substring",
                "test_cases": [
                    ("babad", ["bab", "aba"]),
                    ("cbbd", ["bb"])
                ]
            },
            {
                "description": "Implement binary search on a sorted array",
                "test_cases": [
                    (([1, 2, 3, 4, 5], 3), 2),
                    (([1, 3, 5, 7, 9], 6), -1)
                ]
            }
        ]
        
        passed = 0
        total = 0
        
        for problem in test_problems:
            code = self.agent.run(f"Write Python code: {problem['description']}")
            
            # Try to execute generated code
            try:
                exec_globals = {}
                exec(code, exec_globals)
                
                # Find the generated function
                func = next(v for v in exec_globals.values() if callable(v))
                
                # Test all test cases
                for test_input, expected in problem["test_cases"]:
                    total += 1
                    try:
                        if isinstance(test_input, tuple):
                            result = func(*test_input)
                        else:
                            result = func(test_input)
                        
                        if isinstance(expected, list):
                            if result in expected:
                                passed += 1
                        elif result == expected:
                            passed += 1
                    except:
                        pass
            except:
                pass
        
        return passed / max(total, 1)
    
    def _calculate_overall_score(self) -> float:
        return sum(self.results.values()) / len(self.results)

# Run benchmark
benchmark = AgentBenchmark(my_agent)
results = benchmark.run_benchmark_suite()

print(f"Overall Score: {results['overall']:.2%}")
print(f"OS Tasks: {results['os_tasks']:.2%}")
print(f"Database: {results['database']:.2%}")
print(f"Code Generation: {results['code_generation']:.2%}")

GAIA: General AI Assistant Benchmark

class GAIAEvaluator:
    """Evaluates agents on General AI Assistant Benchmark"""
    
    def __init__(self, agent):
        self.agent = agent
    
    def evaluate(self, test_set_path: str) -> Dict:
        """Evaluate agent on GAIA benchmark"""
        
        # Load GAIA test set
        with open(test_set_path, 'r') as f:
            test_cases = json.load(f)
        
        results = {
            "level_1": [],  # Simple factual questions
            "level_2": [],  # Multi-step reasoning
            "level_3": []   # Complex real-world tasks
        }
        
        for case in test_cases:
            level = case["level"]
            question = case["question"]
            expected_answer = case["answer"]
            
            # Agent attempts to answer
            agent_answer = self.agent.run(question)
            
            # Evaluate answer
            is_correct = self._evaluate_answer(
                agent_answer,
                expected_answer,
                case.get("evaluation_criteria", {})
            )
            
            results[f"level_{level}"].append({
                "question": question,
                "expected": expected_answer,
                "agent_answer": agent_answer,
                "correct": is_correct
            })
        
        # Calculate scores
        summary = {
            "level_1_accuracy": self._calculate_accuracy(results["level_1"]),
            "level_2_accuracy": self._calculate_accuracy(results["level_2"]),
            "level_3_accuracy": self._calculate_accuracy(results["level_3"]),
            "overall_accuracy": self._calculate_overall_accuracy(results)
        }
        
        return {
            "detailed_results": results,
            "summary": summary
        }
    
    def _evaluate_answer(self, agent_answer: str, expected: str, 
                        criteria: Dict) -> bool:
        """Evaluate if agent answer matches expected answer"""
        
        # Exact match
        if agent_answer.strip().lower() == expected.strip().lower():
            return True
        
        # Numerical tolerance
        if criteria.get("type") == "numerical":
            try:
                agent_num = float(agent_answer)
                expected_num = float(expected)
                tolerance = criteria.get("tolerance", 0.01)
                return abs(agent_num - expected_num) / expected_num < tolerance
            except:
                return False
        
        # Semantic similarity (using LLM)
        if criteria.get("type") == "semantic":
            eval_prompt = f"""Are these two answers semantically equivalent?

Answer 1: {agent_answer}
Answer 2: {expected}

Respond with only "Yes" or "No".
"""
            evaluation = self.agent.llm.predict(eval_prompt)
            return "yes" in evaluation.lower()
        
        return False
    
    def _calculate_accuracy(self, results: List[Dict]) -> float:
        if not results:
            return 0.0
        correct = sum(1 for r in results if r["correct"])
        return correct / len(results)
    
    def _calculate_overall_accuracy(self, results: Dict) -> float:
        all_results = []
        for level_results in results.values():
            all_results.extend(level_results)
        return self._calculate_accuracy(all_results)

# Example usage
gaia_eval = GAIAEvaluator(my_agent)
results = gaia_eval.evaluate("gaia_test_set.json")

print("GAIA Benchmark Results:")
print(f"Level 1 (Simple): {results['summary']['level_1_accuracy']:.2%}")
print(f"Level 2 (Multi-step): {results['summary']['level_2_accuracy']:.2%}")
print(f"Level 3 (Complex): {results['summary']['level_3_accuracy']:.2%}")
print(f"Overall: {results['summary']['overall_accuracy']:.2%}")

Production Deployment and Best Practices

Error Handling and Robustness

class RobustAgent:
    def __init__(self, llm, tools, max_retries=3):
        self.llm = llm
        self.tools = tools
        self.max_retries = max_retries
        self.error_log = []
    
    def execute_with_retry(self, task: str) -> Dict:
        """Execute task with automatic retry on failure"""
        
        for attempt in range(self.max_retries):
            try:
                result = self._execute_internal(task)
                
                # Validate result
                if self._is_valid_result(result):
                    return {
                        "success": True,
                        "result": result,
                        "attempts": attempt + 1
                    }
                else:
                    raise ValueError("Invalid result format")
            
            except Exception as e:
                self.error_log.append({
                    "attempt": attempt + 1,
                    "error": str(e),
                    "task": task,
                    "timestamp": datetime.now()
                })
                
                if attempt < self.max_retries - 1:
                    # Generate recovery strategy
                    recovery = self._generate_recovery_strategy(task, e)
                    print(f"Attempt {attempt + 1} failed. Recovery: {recovery}")
                else:
                    return {
                        "success": False,
                        "error": str(e),
                        "attempts": self.max_retries,
                        "error_log": self.error_log[-self.max_retries:]
                    }
        
        return {"success": False, "error": "Max retries exceeded"}
    
    def _execute_internal(self, task: str):
        # Actual execution logic
        return self.agent_logic(task)
    
    def _is_valid_result(self, result) -> bool:
        """Validate that result meets quality criteria"""
        if result is None:
            return False
        if isinstance(result, str) and len(result.strip()) == 0:
            return False
        if isinstance(result, dict) and not result:
            return False
        return True
    
    def _generate_recovery_strategy(self, task: str, error: Exception) -> str:
        recovery_prompt = f"""The agent failed with this error:
Task: {task}
Error: {str(error)}

Suggest a recovery strategy to fix this issue.
"""
        return self.llm.predict(recovery_prompt)

Rate Limiting and Cost Management

import time
from collections import deque

class CostManagedAgent:
    def __init__(self, llm, tools, max_cost_per_hour=10.0):
        self.llm = llm
        self.tools = tools
        self.max_cost_per_hour = max_cost_per_hour
        self.cost_tracker = deque()  # (timestamp, cost) tuples
    
    def execute(self, task: str) -> Dict:
        # Check if we're within budget
        if not self._check_budget():
            return {
                "success": False,
                "error": "Cost limit exceeded for this hour"
            }
        
        # Estimate cost before execution
        estimated_cost = self._estimate_cost(task)
        
        if self._would_exceed_budget(estimated_cost):
            return {
                "success": False,
                "error": f"Task would exceed budget (estimated:${estimated_cost:.2f})"
            }
        
        # Execute task
        start_time = time.time()
        result = self._execute_task(task)
        execution_time = time.time() - start_time
        
        # Calculate actual cost
        actual_cost = self._calculate_cost(task, result, execution_time)
        self.cost_tracker.append((time.time(), actual_cost))
        
        return {
            "success": True,
            "result": result,
            "cost": actual_cost,
            "execution_time": execution_time
        }
    
    def _check_budget(self) -> bool:
        # Remove entries older than 1 hour
        current_time = time.time()
        while self.cost_tracker and current_time - self.cost_tracker[0][0] > 3600:
            self.cost_tracker.popleft()
        
        # Calculate hourly cost
        hourly_cost = sum(cost for _, cost in self.cost_tracker)
        return hourly_cost < self.max_cost_per_hour
    
    def _estimate_cost(self, task: str) -> float:
        # Rough estimation based on task complexity
        token_estimate = len(task.split()) * 2  # Simple heuristic
        cost_per_1k_tokens = 0.03  # GPT-4 pricing
        return (token_estimate / 1000) * cost_per_1k_tokens
    
    def _would_exceed_budget(self, additional_cost: float) -> bool:
        current_hourly_cost = sum(cost for _, cost in self.cost_tracker)
        return current_hourly_cost + additional_cost > self.max_cost_per_hour
    
    def get_cost_report(self) -> Dict:
        current_time = time.time()
        hourly_cost = sum(
            cost for timestamp, cost in self.cost_tracker
            if current_time - timestamp <= 3600
        )
        
        return {
            "hourly_cost": hourly_cost,
            "remaining_budget": self.max_cost_per_hour - hourly_cost,
            "total_requests": len(self.cost_tracker)
        }

Frequently Asked Questions

Q1: What's the difference between an AI Agent and a chatbot?

A chatbot is designed for conversational interactions and responds to user inputs in a dialogue format. It's reactive and waits for user prompts. An AI Agent, on the other hand, is proactive and goal-oriented. It can:

Break down complex tasks autonomously
Use external tools without explicit instructions
Maintain state and context across multiple steps
Make decisions about what to do next based on intermediate results

Example: A chatbot might answer "What's the weather?" with weather information. An agent given the goal "Plan my day" would check the weather, review your calendar, suggest activities based on weather conditions, and potentially book reservations.

Q2: How do I choose between different planning strategies (CoT, ReAct, ToT)?

Choose based on task complexity and resource constraints:

Chain-of-Thought (CoT): Best for straightforward reasoning tasks where the path to solution is relatively linear. Fast and cost-effective.
- Use when: Tasks have clear sequential steps, like math problems or logical puzzles
ReAct: Ideal when the agent needs to gather information during execution. Balances reasoning with action.
- Use when: Tasks require external data, like "Research topic X and summarize findings"
Tree of Thoughts (ToT): For complex problems with multiple viable approaches where exploration is valuable. More expensive.
- Use when: Tasks are open-ended with trade-offs, like "Design a system architecture"

Q3: How much memory should my agent have?

Memory requirements depend on task complexity:

Short conversations (< 10 exchanges): Conversation buffer (4000-8000 tokens) is sufficient
Long sessions with context requirements: Add semantic memory with vector database
Learning from past experiences: Implement episodic memory to store task outcomes
Entity tracking: Add entity memory for applications involving multiple people/places/things

Start simple with conversation buffer, then add specialized memory as needs emerge.

Q4: When should I use multiple agents vs. a single powerful agent?

Use multiple agents when:

Tasks naturally decompose into specialized roles (research, analysis, writing)
Different subtasks require different tools or expertise
You want parallelization for speed
Individual agents can be simpler and more reliable than one complex agent

Use a single agent when:

Tasks are tightly coupled and require constant context sharing
Coordination overhead would outweigh benefits
The problem domain is narrow and well-defined

Q5: How do I handle agent hallucinations in production?

Implement multiple safeguards:

Grounding: Always retrieve factual information from reliable sources rather than relying on LLM knowledge
Verification: Have agents cite sources and cross-reference information
Confidence scoring: Ask the LLM to provide confidence levels for its outputs
Human-in-the-loop: For critical decisions, require human approval
Reflection: Use self-critique mechanisms to catch obvious errors

Example verification pattern:

def verified_fact_retrieval(agent, question):
    # Step 1: Search for answer
    answer = agent.search(question)
    
    # Step 2: Find supporting evidence
    evidence = agent.search(f"evidence for: {answer}")
    
    # Step 3: Verify consistency
    verification = agent.llm.predict(f"""
        Question: {question}
        Answer: {answer}
        Evidence: {evidence}
        
        Is the answer consistent with the evidence? 
        Identify any contradictions.
    """)
    
    return {
        "answer": answer,
        "evidence": evidence,
        "verification": verification
    }

Q6: What are the main failure modes of AI agents?

Common failure patterns:

Infinite loops: Agent gets stuck repeating the same action
- Mitigation: Set max iteration limits, detect repetition patterns
Tool misuse: Agent calls tools incorrectly or with invalid parameters
- Mitigation: Strict parameter validation, provide clear tool documentation
Context loss: Agent forgets important information from earlier steps
- Mitigation: Implement proper memory systems, summarize periodically
Goal drift: Agent pursues tangential objectives
- Mitigation: Regular goal checking, explicit success criteria
Over-confidence: Agent proceeds with uncertain information
- Mitigation: Implement confidence thresholds, require verification

Q7: How do I evaluate if my agent is working well?

Implement multi-level evaluation:

Functional Testing: - Define test cases with clear inputs and expected outputs - Measure success rate on standard tasks

Efficiency Metrics: - Steps required to complete tasks - Token usage and API costs - Time to completion

Quality Metrics: - Accuracy of final outputs (compared to ground truth) - Appropriateness of tool selection - Coherence of reasoning chains

User Experience: - Satisfaction scores from end users - Task completion rates - Error recovery success

Q8: Should I use open-source or proprietary LLMs for my agent?

Considerations:

Proprietary (GPT-4, Claude): - Pros: Superior reasoning, better tool use, less prompt engineering needed - Cons: Ongoing API costs, potential latency, data privacy concerns - Best for: Production applications where quality is critical

Open-source (Llama, Mistral): - Pros: No per-token costs, full data control, customizable - Cons: Requires infrastructure, may need fine-tuning, potentially lower quality - Best for: High-volume applications, sensitive data, budget constraints

Many production systems use hybrid approaches: proprietary LLMs for complex reasoning, open-source for routine tasks.

Q9: How do I prevent my agent from taking harmful actions?

Implement safety layers:

Action whitelist: Explicitly define allowed actions and tools
Approval gates: Require confirmation for high-risk actions
Sandbox environments: Test agent behavior in isolated environments first
Output filtering: Screen agent outputs for harmful content
Monitoring and alerts: Track agent behavior and flag anomalies

Example safety wrapper:

class SafeAgent:
    def __init__(self, base_agent, forbidden_actions):
        self.base_agent = base_agent
        self.forbidden_actions = forbidden_actions
    
    def execute(self, task):
        # Check task safety before execution
        if self._is_forbidden(task):
            return {
                "success": False,
                "error": "Task involves forbidden actions",
                "blocked_action": self._identify_forbidden(task)
            }
        
        # Execute with monitoring
        result = self.base_agent.execute(task)
        
        # Audit log
        self._log_action(task, result)
        
        return result

Q10: What's the future direction of AI agents?

Emerging trends:

Longer context windows: Enabling agents to maintain more information without external memory
Multimodal agents: Processing and generating images, audio, video alongside text
Improved reasoning: Better planning, more reliable multi-step execution
Agent marketplaces: Ecosystems of specialized agents that can be composed
Embodied agents: Integration with robotics and physical systems
Formal verification: Mathematical proofs of agent behavior and safety
Self-improvement: Agents that can autonomously improve their own capabilities

The field is rapidly evolving, with new architectures and techniques emerging continuously.

Conclusion

AI Agents represent a paradigm shift from passive language models to active, goal-oriented systems that can autonomously solve complex real-world problems. By combining planning, memory, tool use, and reflection, agents extend the capabilities of LLMs far beyond text generation.

We've explored the foundational concepts that distinguish agents from simple LLM interactions, examined the core capabilities that enable agent behavior, surveyed popular frameworks and implementation patterns, and discussed multi-agent systems and evaluation methodologies. We've also covered critical production concerns like error handling, cost management, and safety considerations.

The key takeaways for building effective agents:

Start simple: Begin with basic ReAct-style agents before adding complexity
Memory matters: Implement appropriate memory systems for your use case
Tool quality is crucial: Well-designed, reliable tools are more important than clever prompting
Plan for failure: Agents will fail; build robust error handling and recovery
Evaluate continuously: Establish metrics and test regularly
Safety first: Implement guardrails and monitoring from the start

As you build your own agents, remember that this is still an emerging field. Experimentation, iteration, and learning from failures are essential parts of the process. The techniques and patterns described here provide a solid foundation, but the most effective agent architectures will emerge from hands-on experience with real-world problems.

The future of AI agents is promising, with rapid advances in reasoning capabilities, tool use, and multi-agent coordination. By understanding the principles and practices outlined in this guide, you're well-equipped to harness these powerful systems and push the boundaries of what's possible with AI.

Additional Resources:

LangChain Documentation: https://docs.langchain.com/
AutoGPT Repository: https://github.com/Significant-Gravitas/AutoGPT
AgentBench Paper: https://arxiv.org/abs/2308.03688
GAIA Benchmark: https://arxiv.org/abs/2311.12983
OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling

This article has covered the fundamental concepts, practical implementations, and advanced considerations for building and deploying AI agents. Whether you're creating simple task automation or complex multi-agent systems, these principles and patterns will serve as a comprehensive guide to success in the rapidly evolving field of AI agents.