LangChain Agents with Tools Tutorial: Error Handling and Safety Patterns

January 18, 2026 30 minute read

LangChain Agents with Tools Tutorial: Error Handling and Safety Patterns

LangChain agents are smart helpers that can think and use tools to get tasks done. Imagine giving your computer a brain and a toolbox; that’s kind of what LangChain agents are like. They can browse the internet, calculate numbers, or even talk to other programs.

But just like any helper, sometimes things don’t go as planned. Tools might fail, the internet could be down, or the agent might get confused. This is where error handling and safety patterns become super important.

This guide will show you how to build robust LangChain agents that can handle unexpected problems gracefully. You will learn how to make your agents reliable and safe to use. We will cover everything from simple retries to advanced security considerations.

Understanding LangChain Agents and Tools

Before we dive into errors, let’s quickly review what LangChain agents and their tools are. An agent uses a language model, like a super-smart chatbot, to decide what to do next. It looks at your request and thinks about the steps needed to complete it.

Part of its thinking involves choosing the right “tools” from its toolbox. These tools are like mini-programs or functions that the agent can call upon. For example, a tool could be a calculator, a web search function, or a way to send an email.

The agent picks a tool, gives it some input, and then waits for the tool to give back an answer. It uses this answer to continue its thinking process. This cycle of thinking, acting, and observing helps the agent achieve its goal.

The Problem: Why Things Go Wrong

Even the smartest agents and most useful tools can run into trouble. Imagine your agent trying to look up today’s weather, but the weather website is down. Or maybe the agent asks a tool to add “apple” and “banana,” which doesn’t make sense.

These unexpected situations are called tool execution errors. They can stop your agent in its tracks, making it seem broken or unhelpful. Sometimes, the error might even give back confusing or wrong information.

We need to teach our agents how to deal with these problems gracefully. This means planning for failures, rather than just hoping they won’t happen. By thinking ahead, we can make our agents much more robust and reliable.

Core Concepts of Error Handling

Handling errors well means having a plan for different kinds of problems. Let’s look at some key ideas that help us build resilient agents. These concepts are like different backup plans for when things go wrong.

Graceful Degradation

Graceful degradation means that if one part of your system fails, the whole thing doesn’t just crash. Instead, it tries to keep working, perhaps with less functionality or by using a simpler method. For an agent, this might mean if a specific tool isn’t working, the agent tries to solve the problem with other tools or provides a less detailed answer.

Imagine a smart home system where the fancy weather sensor breaks. Instead of saying “I can’t tell you the weather,” it might check a basic online weather report. It’s still providing a service, even if it’s not the best possible one. This makes the user experience much smoother.

You want your LangChain agent to always try its best to deliver something useful. Even if it can’t complete the primary task, it should offer a reasonable fallback or explain the issue clearly. This prevents frustration and keeps your agent helpful.

Retry Mechanisms

Sometimes, a tool might fail just because of a temporary hiccup, like a brief internet connection drop. A retry mechanism means your agent tries the same action again after a short wait. It’s like knocking on a door again if no one answers the first time.

You usually don’t want to retry endlessly, though. There should be a limit to how many times an agent tries. Often, we use something called “exponential backoff,” which means waiting a little longer with each retry. This prevents overloading a struggling service.

Retries are excellent for intermittent problems that might fix themselves quickly. They make your agent more patient and persistent. You can often find pre-built solutions or design your own simple retry logic for your tools.

Timeout Handling

Have you ever waited forever for a webpage to load? That’s what happens when there’s no timeout handling. A tool might get stuck, waiting for a response that never comes. This can make your agent freeze and never finish its task.

Timeout handling sets a maximum amount of time a tool should wait for a response. If the tool doesn’t respond within that time, the agent stops waiting and considers it a failure. This prevents your agent from getting stuck indefinitely.

Setting sensible timeouts is crucial for keeping your agent responsive. It ensures that even if an external service is completely unresponsive, your agent can move on. This allows your agent to either retry, use a fallback, or report the problem.

Fallback Strategies

A fallback strategy is your agent’s backup plan. If an action fails even after retries and timeouts, what should the agent do next? This is where your fallback comes into play. It’s like having a plan B when plan A doesn’t work.

For example, if a tool meant to fetch real-time stock prices fails repeatedly, a fallback might be to provide the last known price. Another fallback could be to tell the user directly that the tool is unavailable. The simplest fallback is often to just return a helpful error message.

Having good fallback strategies ensures that your agent can always respond, even if it’s just to say “I’m sorry, I can’t do that right now.” This is much better than staying silent or crashing. You can even design a fallback tool that provides a default answer.

Implementing Error Handling in LangChain Agents

Now, let’s see how we can put these concepts into practice with LangChain agents. We’ll look at code examples to make these ideas concrete. You’ll learn how to wrap your tools with protective layers.

Catching Tool Errors

The most basic way to handle errors in Python (and thus in your LangChain tools) is using try-except blocks. This allows you to “try” to run some code and “catch” any errors that happen.

from langchain.tools import Tool
import requests # A common library for making web requests

def get_weather_data(city: str) -> str:
    """Fetches current weather data for a given city."""
    try:
        # Example of a tool that might fail (e.g., network error, API key invalid)
        api_key = "YOUR_WEATHER_API_KEY" # Replace with your actual API key
        url = f"http://api.weatherapi.com/v1/current.json?key={api_key}&q={city}"
        response = requests.get(url, timeout=5) # Added timeout here for safety
        response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
        data = response.json()
        temp_c = data['current']['temp_c']
        condition = data['current']['condition']['text']
        return f"The current temperature in {city} is {temp_c}°C with {condition}."
    except requests.exceptions.Timeout:
        return f"Weather tool timed out while fetching data for {city}. Please try again later."
    except requests.exceptions.RequestException as e:
        # Catch any other request-related errors (network, connection, etc.)
        return f"Could not fetch weather data for {city} due to a connection error: {e}"
    except KeyError:
        # If the API response structure changes or is unexpected
        return f"Could not parse weather data for {city}. Unexpected API response."
    except Exception as e:
        # Catch any other unexpected errors
        return f"An unexpected error occurred while fetching weather for {city}: {e}"

# Create a LangChain tool from the function
weather_tool = Tool(
    name="WeatherFetcher",
    func=get_weather_data,
    description="Useful for fetching current weather information for a specified city. Input should be a city name.",
)

# Example of how an agent might use it
# If get_weather_data fails, it will return an error string instead of crashing.
print(weather_tool.run("London"))
print(weather_tool.run("InvalidCity"))

In this example, the try block attempts to get weather data. If anything goes wrong, an except block catches the specific error. You can see different except blocks for different types of errors, like a network issue (requests.exceptions.RequestException) or a problem with the data itself (KeyError). This way, you can give specific feedback about what went wrong.

Retrying Failed Tools

For errors that might be temporary, implementing retries is a smart move. You can write your own retry logic or use a library that handles it for you. The tenacity library is excellent for this in Python.

First, you might need to install it: pip install tenacity.

from langchain.tools import Tool
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

# Define which exceptions should trigger a retry
retryable_exceptions = (requests.exceptions.ConnectionError, requests.exceptions.Timeout)

@retry(
    stop=stop_after_attempt(3), # Try up to 3 times
    wait=wait_exponential(multiplier=1, min=2, max=10), # Wait 2s, 4s, 8s, etc.
    retry=retry_if_exception_type(retryable_exceptions),
    reraise=True # Re-raise the last exception if all retries fail
)
def _fetch_data_with_retry(url: str, api_key: str, city: str):
    """Helper function to fetch data that can be retried."""
    print(f"Attempting to fetch weather for {city}...")
    full_url = f"http://api.weatherapi.com/v1/current.json?key={api_key}&q={city}"
    response = requests.get(full_url, timeout=5)
    response.raise_for_status() # This will raise an HTTPError for bad status codes
    return response.json()

def get_weather_data_with_retry(city: str) -> str:
    """Fetches current weather data for a given city with retries."""
    api_key = "YOUR_WEATHER_API_KEY" # Replace with your actual API key

    try:
        data = _fetch_data_with_retry(url="", api_key=api_key, city=city) # URL is constructed inside helper
        temp_c = data['current']['temp_c']
        condition = data['current']['condition']['text']
        return f"The current temperature in {city} is {temp_c}°C with {condition}."
    except retryable_exceptions as e:
        return f"Weather tool failed after multiple retries for {city} due to connection issue: {e}"
    except requests.exceptions.HTTPError as e:
        # Handle non-retryable errors like 404 (city not found) or 401 (bad API key)
        if e.response.status_code == 400:
            return f"Could not find weather data for '{city}'. Please check the city name."
        return f"Weather tool failed for {city} due to an API error ({e.response.status_code}): {e.response.text}"
    except KeyError:
        return f"Could not parse weather data for {city}. Unexpected API response."
    except Exception as e:
        return f"An unexpected error occurred while fetching weather for {city}: {e}"

# Create a LangChain tool from the function
weather_tool_retried = Tool(
    name="WeatherFetcherWithRetry",
    func=get_weather_data_with_retry,
    description="Useful for fetching current weather information for a specified city, includes retries for temporary issues. Input should be a city name.",
)

# Example usage (you might need to temporarily disable network or use a bad API key to see retries in action)
print(weather_tool_retried.run("Paris"))
print(weather_tool_retried.run("NonExistentCity"))

Here, the @retry decorator from tenacity is added to our _fetch_data_with_retry helper function. It tells Python to retry if specific network errors happen. It will try up to 3 times, waiting longer each time, before finally giving up and raising the error. This helps with graceful degradation as the agent tries harder before giving up.

Setting Timeouts for Tools

You already saw a timeout=5 parameter in our requests.get call. This is a very common way to implement timeout handling for web requests. For other types of operations or custom tools, you might need different approaches.

If your tool involves running external commands, you can often pass a timeout argument.

import subprocess
from langchain.tools import Tool

def run_long_command(command_args: list) -> str:
    """Runs a command and limits its execution time."""
    try:
        # Timeout is set to 10 seconds.
        # If the command takes longer, a TimeoutExpired error is raised.
        result = subprocess.run(command_args, capture_output=True, text=True, check=True, timeout=10)
        return result.stdout
    except subprocess.TimeoutExpired:
        return f"Command '{' '.join(command_args)}' timed out after 10 seconds."
    except subprocess.CalledProcessError as e:
        return f"Command failed with error: {e.stderr}"
    except Exception as e:
        return f"An unexpected error occurred: {e}"

long_command_tool = Tool(
    name="LongCommandRunner",
    func=run_long_command,
    description="Executes a system command and stops if it runs too long. Input should be a list of strings for the command and its arguments, e.g., ['sleep', '15'] or ['ls', '-l'].",
)

# Example usage
print(long_command_tool.run(['ls', '-l'])) # Should work fine
print(long_command_tool.run(['sleep', '5'])) # Will wait for 5 seconds
print(long_command_tool.run(['sleep', '15'])) # This will trigger the timeout

By adding timeout to subprocess.run, we ensure that our agent won’t get stuck waiting forever. This is vital for tools that might interact with unreliable external processes or execute long-running computations. You are effectively safeguarding your agent’s responsiveness.

Implementing Fallback Tools/Responses

When all else fails (retries don’t work, timeouts are hit), a fallback strategy provides a default answer or action. This prevents the agent from just stopping or giving a confusing error.

You can create a simple fallback tool that always returns a predefined message. The agent can then be configured to use this tool if others fail.

from langchain.tools import Tool

def default_fallback_response(query: str = "") -> str:
    """A simple fallback tool that provides a helpful default message."""
    return "I'm sorry, I couldn't complete your request using my tools right now. Please try rephrasing or ask me something else."

fallback_tool = Tool(
    name="DefaultFallback",
    func=default_fallback_response,
    description="Provides a generic helpful response when other tools fail.",
)

# Example: How an agent *might* integrate a fallback (conceptual, agent orchestrates this)
# If weather_tool_retried fails for "NonExistentCity", the agent could then choose to use fallback_tool.
print(fallback_tool.run("Get weather for Mars"))

Within a LangChain agent setup, you would typically manage this at the agent’s logic level. You might have your agent catch tool exceptions and then, in its internal _handle_tool_error or similar method, decide to use the DefaultFallback tool. This ensures the agent always provides a coherent response.

Safety Patterns for LangChain Agents

Beyond just handling errors, we must ensure our agents are safe. This means protecting them from bad inputs, securing their interactions, and preventing misuse. These safety patterns are crucial for building agents that are reliable and trustworthy.

Input Validation

One of the most critical safety patterns is input validation. This means checking any information you give to your tools or agent to make sure it’s correct, expected, and safe. Imagine a calculator tool: if you give it “hello” instead of a number, it can’t work. More importantly, bad inputs can sometimes be malicious.

Malicious input prevention is about stopping harmful commands or data from reaching your tools. For example, if your tool interacts with a database, a malicious user might try to use “SQL injection” to steal or delete data. If your tool uses file paths, someone might try to access files they shouldn’t.

You should always assume that any input your agent receives, especially from users, could be trying to cause harm. Think of it like checking ingredients before cooking: you only want good, safe items.

from langchain.tools import Tool
import re # For regular expressions, useful for pattern matching

def safe_calculator(expression: str) -> str:
    """A calculator tool with basic input validation."""
    # Only allow numbers, basic math symbols, and spaces
    # This prevents running arbitrary code or complex operations.
    if not re.fullmatch(r"^[0-9+\-*/().\s]+$", expression):
        return "Invalid input. Only numbers and basic math operations (+-*/) are allowed."
    
    try:
        # Use a safe evaluation method, like ast.literal_eval or a custom parser,
        # rather than eval() which can be dangerous.
        # For simplicity, we'll use eval() with extreme caution, but in real apps, AVOID eval().
        # A better approach would be to parse the expression into an Abstract Syntax Tree (AST)
        # and then evaluate only allowed operations.
        # Example of a safer (though more complex) approach:
        # import ast
        # expr_node = ast.parse(expression, mode='eval')
        # allowed_nodes = (ast.Expression, ast.BinOp, ast.Num, ast.Add, ast.Sub, ast.Mult, ast.Div)
        # for node in ast.walk(expr_node):
        #     if not isinstance(node, allowed_nodes):
        #         return "Forbidden operation detected in expression."
        # result = eval(expression) # Still risky, but illustrative.
        
        # For this example, let's just stick to the regex and a safer (but not foolproof) approach
        # A production-grade calculator would use a dedicated math library or AST parsing.
        # For example, using `numexpr` or `sympy.parsing.mathematica.parse_expr`
        
        # Since eval is dangerous, let's simulate a calculation after validation for simplicity
        # For actual calculation, use a dedicated, safe math library!
        return f"Result of {expression} is (simulated result for safety)."
    except ZeroDivisionError:
        return "Cannot divide by zero."
    except Exception:
        return "Error in calculation. Please check your expression."

calculator_tool_validated = Tool(
    name="ValidatedCalculator",
    func=safe_calculator,
    description="A simple calculator that only allows basic arithmetic and validates input to prevent malicious commands. Input should be a mathematical expression.",
)

print(calculator_tool_validated.run("2+2"))
print(calculator_tool_validated.run("import os; os.system('rm -rf /')")) # Malicious input
print(calculator_tool_validated.run("10/0"))

In this example, the safe_calculator tool uses re.fullmatch to ensure the input expression only contains numbers and basic math symbols. This is a simple form of malicious input prevention. It directly rejects anything that looks suspicious. For more complex validation, especially for structured data, you can use powerful libraries like Pydantic. Pydantic is a great input validation library that lets you define exactly what your data should look like.

For more advanced security, you might consider getting a security best practices guide to understand common attack vectors and defenses.

Output Sanitization

Just as important as validating inputs is output sanitization. This means cleaning up any data that comes out of your tools before your agent or a user sees it. Sometimes, a tool might return messy, unformatted, or even potentially harmful information.

For instance, a web scraping tool might grab some JavaScript code along with the text it’s supposed to get. If your agent then shows this raw output to a user on a webpage, it could lead to a “Cross-Site Scripting” (XSS) attack. Sanitizing means removing anything that isn’t plain, safe text.

from langchain.tools import Tool
from html import escape # Used to escape HTML characters

def sanitize_html_output(raw_html_content: str) -> str:
    """Sanitizes HTML content to prevent XSS attacks."""
    # This is a very basic sanitization. For production, use a dedicated library
    # like bleach or DOMPurify (for JS environments).
    # Here, we're simply escaping HTML special characters.
    escaped_content = escape(raw_html_content)
    
    # You might also want to remove specific tags or attributes if they were permitted
    # and now need to be stripped for a specific context.
    # For a complete solution, you would typically specify allowed tags and attributes.
    return escaped_content

def process_web_content(url: str) -> str:
    """Simulates fetching and sanitizing web content."""
    # In a real scenario, this would involve a web request
    # For demonstration, we'll use a sample malicious HTML string
    malicious_html = f"<p>Hello, this is content from {url}.</p><script>alert('You have been hacked!');</script>"
    
    # Sanitize the output before returning it
    safe_content = sanitize_html_output(malicious_html)
    
    return safe_content

web_content_processor_tool = Tool(
    name="WebContentProcessor",
    func=process_web_content,
    description="Fetches web content and sanitizes it to prevent security issues. Input should be a URL.",
)

print(web_content_processor_tool.run("https://httpbin.org/html"))

In this snippet, html.escape turns characters like < into <, making them harmless text instead of active HTML code. This output sanitization prevents malicious code from being executed by a browser or misused by other systems. Always clean up data from unknown sources.

Rate Limit Handling

Many online services and APIs have rate limits. These limits are put in place to prevent people from making too many requests too quickly, which could overload the service. If your agent hits a rate limit, its tools will start failing.

Your agent needs to be smart about these limits. Instead of just retrying immediately (which will just hit the limit again), it should wait. This often involves slowing down or pausing its requests.

from langchain.tools import Tool
import time
import random
import requests

# A simple class to simulate an API that has rate limits
class MockRateLimitedAPI:
    def __init__(self, requests_per_minute=5):
        self.requests_per_minute = requests_per_minute
        self.request_times = []
        
    def _check_rate_limit(self):
        now = time.time()
        # Remove requests older than 1 minute
        self.request_times = [t for t in self.request_times if now - t < 60]
        if len(self.request_times) >= self.requests_per_minute:
            # Simulate a 429 Too Many Requests error
            raise requests.exceptions.HTTPError("429 Client Error: Too Many Requests for url: mock-api", response=requests.Response())
        self.request_times.append(now)
    
    def fetch_data(self, item: str) -> str:
        self._check_rate_limit()
        time.sleep(random.uniform(0.1, 0.5)) # Simulate some processing time
        return f"Data for {item} (fetched at {time.ctime()})"

mock_api = MockRateLimitedAPI(requests_per_minute=3) # Very strict limit for testing

def get_item_data_with_rate_limit_handling(item: str) -> str:
    """Fetches data from a rate-limited API with handling."""
    max_retries = 5
    for attempt in range(max_retries):
        try:
            return mock_api.fetch_data(item)
        except requests.exceptions.HTTPError as e:
            if "429 Client Error" in str(e):
                wait_time = 2 ** attempt + random.uniform(0, 1) # Exponential backoff with jitter
                print(f"Rate limit hit! Waiting {wait_time:.2f} seconds before retrying... (Attempt {attempt + 1})")
                time.sleep(wait_time)
            else:
                return f"API error for {item}: {e}"
        except Exception as e:
            return f"An unexpected error occurred for {item}: {e}"
    return f"Failed to fetch data for {item} after {max_retries} attempts due to rate limits."

rate_limited_tool = Tool(
    name="RateLimitedDataFetcher",
    func=get_item_data_with_rate_limit_handling,
    description="Fetches specific item data from a rate-limited service, handles 429 errors gracefully. Input should be an item name.",
)

# Simulate hitting rate limits
for i in range(10):
    print(rate_limited_tool.run(f"Item-{i+1}"))

This example shows a tool that tries to fetch data, but if it gets a “Too Many Requests” (429) error, it waits. It uses an exponential backoff strategy, meaning it waits longer with each failed attempt. This prevents the tool from repeatedly hitting the rate limit. You can also use rate limiting services or libraries to help manage this across many tools and agents.

Security Considerations

Beyond specific input/output issues, there are broader security considerations when building agents. These involve how your agent operates and what access it has.

Least Privilege Principle: Your agent and its tools should only have the minimum permissions they need to do their job. If a tool only needs to read a file, it shouldn’t have permission to delete files. This minimizes the damage if a tool is compromised.
Secure API Key Management: API keys and other secrets should never be hardcoded directly into your code. Use environment variables, secure vaults, or secret management services. Treat them like your house keys.
Regular Security Audits: Regularly check your agent’s code and its dependencies for vulnerabilities. This is where security scanning tools can be very helpful. These tools automatically look for known weaknesses in your code.
Penetration Testing: For critical applications, consider penetration testing tools or services. Penetration testing involves simulating cyberattacks to find weaknesses before real attackers do.
Monitoring and Alerting: Keep an eye on your agent’s activity. If something unusual happens (like many failed tool calls or access attempts to sensitive areas), you should be alerted immediately.

These practices ensure your agent isn’t just functional but also resistant to attacks and misuse. Building with security in mind from the start is much easier than trying to fix it later.

Advanced Safety Patterns

As your LangChain agents become more complex and handle more important tasks, you’ll want to employ even more robust safety measures. These advanced patterns provide additional layers of protection and control.

Human-in-the-Loop

Sometimes, a task is too sensitive or complex for an agent to handle entirely on its own. This is where a human-in-the-loop approach comes in. It means the agent knows when to stop and ask for a human’s help or approval.

For example, if an agent is asked to perform a financial transaction, it might confirm the details with a user before proceeding. Or, if it encounters an error it can’t resolve, instead of giving a generic message, it could escalate the issue to a human operator.

from langchain.tools import Tool

def human_approval_tool(action_description: str) -> str:
    """A tool that requires human approval before proceeding."""
    print(f"\n--- Human Intervention Required ---")
    print(f"The agent wants to perform the following action: '{action_description}'")
    
    # In a real application, this would involve sending a notification
    # or displaying a UI prompt to a human user.
    response = input("Do you approve this action? (yes/no): ").lower().strip()
    
    if response == "yes":
        print("Human approved. Proceeding...")
        return "Action approved by human."
    else:
        print("Human denied. Aborting action.")
        return "Action denied by human."

human_approval_agent_tool = Tool(
    name="HumanApproval",
    func=human_approval_tool,
    description="Requires human approval for sensitive or critical actions. Input should be a description of the action.",
)

# Example usage within an agent's thought process
# An agent might decide to use this tool if its confidence is low
# or if the action is marked as high-risk.
print(human_approval_agent_tool.run("send an email to all clients saying 'hello'"))
print(human_approval_agent_tool.run("delete all marketing campaign data"))

This HumanApproval tool demonstrates how an agent can pause its work and ask for input. This is critical for preventing mistakes in high-stakes situations and helps build trust in your agent. You want to make sure your agent doesn’t do anything irreversible without proper oversight.

Monitoring and Alerting

Even with the best error handling and safety patterns, things can still go wrong in production. That’s why monitoring and alerting are essential. Monitoring means constantly watching your agent’s performance, resource usage, and error rates. Alerting means automatically notifying someone when critical issues are detected.

You should log all significant events: when tools are called, what inputs they receive, what outputs they return, and any errors that occur. These logs are invaluable for debugging and understanding your agent’s behavior.

When an error happens, especially one that wasn’t gracefully handled, an alert should be triggered. This could be an email, a Slack message, or a notification to a paging system. You want to know immediately if your agent stops working or behaves unexpectedly. This allows for quick detection and resolution of problems, minimizing downtime and negative impact.

Production Safety Frameworks

For large-scale, critical LangChain agent deployments, you might adopt production safety frameworks. These are established sets of practices, tools, and processes designed to ensure systems are reliable, secure, and performant in real-world use.

Concepts from Site Reliability Engineering (SRE) are highly relevant here. SRE focuses on applying software engineering principles to operations to create highly reliable and scalable systems. This includes error budgets, incident management, and post-mortems.

Adopting such frameworks helps you move beyond ad-hoc error handling to a systematic approach to operational safety. It makes your agent part of a well-managed, resilient ecosystem. You can learn more about these practices through reliability engineering courses or by exploring specific production safety frameworks.

Putting It All Together: A Comprehensive Example

Let’s imagine an agent designed to help with e-commerce support. It needs to fetch order details, calculate shipping, and maybe even look up product information. This agent would need robust error handling and safety.

import requests
import time
import random
import json
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from langchain.tools import Tool

# --- Affiliate Link Placeholders for context ---
# [Error Handling Best Practices Course](https://www.coursera.org/browse/computer-science/software-development)
# [Secure Input Validation Library](https://pydantic-docs.helpmanual.io/)
# [API Rate Limiting Service](https://redis.io/docs/data-types/probabilistic/count-min-sketch/)
# [Cybersecurity Best Practices Guide](https://owasp.org/)
# --- End Affiliate Link Placeholders ---

# 1. Mock External Services (with potential for errors/delays)
class MockOrderAPI:
    def get_order_details(self, order_id: str):
        if not order_id.isdigit():
            raise ValueError("Order ID must be a number.")
        
        order_id_int = int(order_id)
        if order_id_int % 3 == 0: # Simulate a temporary network error
            print(f"Simulating network error for order {order_id}...")
            raise requests.exceptions.ConnectionError("Simulated network issue.")
        if order_id_int % 5 == 0: # Simulate a 404 Not Found
            print(f"Simulating 404 for order {order_id}...")
            response = requests.Response()
            response.status_code = 404
            raise requests.exceptions.HTTPError(f"404 Not Found: Order {order_id} does not exist.", response=response)
        
        time.sleep(random.uniform(0.1, 0.3)) # Simulate latency
        return {"order_id": order_id, "customer": "John Doe", "items": ["Laptop", "Mouse"], "status": "Shipped"}

class MockShippingAPI:
    def calculate_shipping(self, destination: str, weight_kg: float):
        if not isinstance(destination, str) or not isinstance(weight_kg, (int, float)):
            raise ValueError("Invalid input for shipping calculation.")
        if weight_kg > 100: # Simulate a heavy package limit
            raise ValueError("Package too heavy for standard shipping.")
        
        time.sleep(random.uniform(0.2, 0.5))
        return {"destination": destination, "cost": weight_kg * 5.0 + 10.0}

class MockProductAPI:
    # This API will have a strict rate limit
    def __init__(self, requests_per_minute=2):
        self.requests_per_minute = requests_per_minute
        self.request_times = []
        
    def _check_rate_limit(self):
        now = time.time()
        self.request_times = [t for t in self.request_times if now - t < 60]
        if len(self.request_times) >= self.requests_per_minute:
            response = requests.Response()
            response.status_code = 429
            raise requests.exceptions.HTTPError("429 Client Error: Too Many Requests", response=response)
        self.request_times.append(now)
    
    def get_product_info(self, product_id: str):
        self._check_rate_limit()
        if not product_id.isalnum():
            raise ValueError("Product ID must be alphanumeric.")
        
        time.sleep(random.uniform(0.1, 0.4))
        if product_id == "LAPTOP123":
            return {"product_id": product_id, "name": "Super Laptop X", "price": 1200.00, "weight_kg": 2.5}
        elif product_id == "MOUSEPRO":
            return {"product_id": product_id, "name": "Pro Gaming Mouse", "price": 75.00, "weight_kg": 0.2}
        else:
            return None # Product not found

order_api = MockOrderAPI()
shipping_api = MockShippingAPI()
product_api = MockProductAPI(requests_per_minute=2)

# 2. Define Retryable Exceptions for tenacity
RETRYABLE_API_ERRORS = (requests.exceptions.ConnectionError, requests.exceptions.Timeout)

# 3. Enhanced Tool Definitions with Error Handling, Retries, Timeouts, Validation, and Fallbacks

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=5), reraise=True,
       retry=retry_if_exception_type(RETRYABLE_API_ERRORS))
def _get_order_details_retried(order_id: str):
    return order_api.get_order_details(order_id)

def get_customer_order_details(order_id: str) -> str:
    """
    Fetches customer order details with retries for network issues and specific error handling.
    Input should be a numerical order ID string.
    """
    # Input Validation
    if not order_id.isdigit():
        return "Invalid Order ID. Please provide a numerical order ID."

    try:
        details = _get_order_details_retried(order_id)
        # Output Sanitization (for JSON output, pretty printing is a form of sanitization too)
        return json.dumps(details, indent=2)
    except RETRYABLE_API_ERRORS:
        return f"Could not retrieve order {order_id} due to network issues after multiple retries. Please try again later."
    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 404:
            return f"Order {order_id} not found. Please check the ID."
        return f"An API error occurred for order {order_id}: {e}"
    except ValueError as e:
        return f"Error with order ID '{order_id}': {e}"
    except Exception as e:
        return f"An unexpected error occurred while fetching order {order_id}: {e}"

order_details_tool = Tool(
    name="GetOrderDetails",
    func=get_customer_order_details,
    description="Fetches customer order details by order ID, handles network errors and missing orders. Input: order_id (string).",
)

def calculate_shipping_cost(destination: str, order_id: str) -> str:
    """
    Calculates shipping cost for an order to a destination. Includes input validation and fallback.
    Input should be a JSON string like '{"destination": "New York", "order_id": "123"}'
    """
    try:
        args = json.loads(destination)
        dest = args.get("destination")
        order_id_for_weight = args.get("order_id")

        if not dest or not isinstance(dest, str):
            raise ValueError("Missing or invalid 'destination' in input.")
        if not order_id_for_weight or not isinstance(order_id_for_weight, str) or not order_id_for_weight.isdigit():
            raise ValueError("Missing or invalid 'order_id' in input for weight calculation.")

        # Try to get order details to determine weight, with error handling
        order_info = order_api.get_order_details(order_id_for_weight) # Using internal mock_api for weight
        # This part assumes a simplified weight calculation for demo.
        # In a real scenario, you'd fetch product details and sum up weights.
        total_weight_kg = 5.0 # Simplified fixed weight for demonstration

        shipping_cost_info = shipping_api.calculate_shipping(dest, total_weight_kg)
        return f"Shipping cost to {shipping_cost_info['destination']} for order {order_id_for_weight} is ${shipping_cost_info['cost']:.2f}."
    except ValueError as e:
        return f"Invalid input or calculation issue for shipping: {e}"
    except requests.exceptions.ConnectionError:
        return "Shipping calculation unavailable due to network issues. Using a fallback flat rate of $15." # Fallback strategy
    except Exception as e:
        return f"An unexpected error occurred during shipping calculation: {e}"

shipping_calc_tool = Tool(
    name="CalculateShippingCost",
    func=calculate_shipping_cost,
    description="Calculates shipping cost for an order to a destination. Input: JSON string with 'destination' and 'order_id'.",
)

# This tool demonstrates rate limit handling and output sanitization
def get_product_information(product_id: str) -> str:
    """
    Retrieves product information, handles rate limits, and ensures safe output.
    Input should be a product ID string.
    """
    # Input Validation
    if not product_id.isalnum():
        return "Invalid Product ID. Must be alphanumeric."

    max_retries = 3
    for attempt in range(max_retries):
        try:
            info = product_api.get_product_info(product_id)
            if info:
                # Output Sanitization: Ensuring JSON output is safe and readable
                return json.dumps(info, indent=2)
            else:
                return f"Product '{product_id}' not found."
        except requests.exceptions.HTTPError as e:
            if "429 Client Error" in str(e):
                wait_time = 2 ** attempt + random.uniform(0.1, 0.5) # Exponential backoff with jitter
                print(f"RATE LIMIT HIT (Product API). Waiting {wait_time:.2f}s... (Attempt {attempt + 1})")
                time.sleep(wait_time)
            else:
                return f"API error for product {product_id}: {e}"
        except ValueError as e:
            return f"Product API validation error: {e}"
        except Exception as e:
            return f"An unexpected error occurred fetching product {product_id}: {e}"
    
    return f"Failed to get product information for '{product_id}' after {max_retries} attempts due to rate limits."

product_info_tool = Tool(
    name="GetProductInfo",
    func=get_product_information,
    description="Retrieves detailed product information, handles rate limits. Input: product_id (string).",
)


# --- Demonstrating the tools with various scenarios ---
print("--- Testing GetOrderDetails Tool ---")
print(order_details_tool.run("123")) # Good order
print(order_details_tool.run("abc")) # Invalid input
print(order_details_tool.run("3"))   # Simulates network error, should retry and eventually fail if actual error occurs
print(order_details_tool.run("5"))   # Simulates 404 error
print("\n" + "="*30 + "\n")

print("--- Testing CalculateShippingCost Tool ---")
print(shipping_calc_tool.run('{"destination": "London", "order_id": "123"}'))
print(shipping_calc_tool.run('{"destination": 123, "order_id": "123"}')) # Invalid destination type
print(shipping_calc_tool.run('{"destination": "Berlin", "order_id": "invalid"}')) # Invalid order_id for weight
print("\n" + "="*30 + "\n")

print("--- Testing GetProductInfo Tool (Rate Limited) ---")
for i in range(5): # This will trigger rate limits
    print(product_info_tool.run("LAPTOP123"))
    time.sleep(0.5) # A small pause to make logs clearer, but not enough to avoid rate limit
print(product_info_tool.run("INVALID!PRODUCT")) # Invalid input
print("\n" + "="*30 + "\n")

This comprehensive example brings together many of the patterns we discussed. The get_customer_order_details tool has input validation, retry mechanisms for network issues, and specific error messages for graceful degradation. The calculate_shipping_cost tool demonstrates more input validation and a fallback strategy when a network error prevents exact calculation. Finally, get_product_information showcases rate limit handling with exponential backoff and output sanitization by ensuring valid JSON.

By carefully crafting each tool with these protective layers, you build an agent that is not just smart but also reliable and safe, even when external systems are temperamental or inputs are unexpected. You are ensuring that your agent can handle tool execution errors and protect itself from malicious input prevention while providing graceful degradation.

Learning More

Building robust and secure LangChain agents is an ongoing process. There’s always more to learn about making your systems resilient and safe.

To deepen your understanding of error handling strategies, consider an error handling course. These courses often cover advanced topics like circuit breakers, bulkheads, and distributed tracing. For ensuring your systems can handle failures and recover quickly, exploring reliability engineering courses can provide invaluable insights into building highly available applications.

You can also refer to the LangChain documentation for the latest features and best practices directly from the source. Additionally, staying updated on general software development principles around security and resilience will always benefit your agent development.

Conclusion

You’ve learned that building powerful LangChain agents means more than just giving them smart tools. It also means preparing them for when things inevitably go wrong. By implementing error handling and safety patterns, you make your agents much more dependable.

Remember to validate inputs, sanitize outputs, and build in ways to retry, timeout, and fallback when tools fail. These practices protect your agent from tool execution errors and malicious input prevention, leading to more graceful degradation. With these safety measures in place, you can trust your LangChain agents to perform their tasks effectively and securely.

Share on

X Facebook LinkedIn Bluesky

Coding Rhodes

LangChain Agents with Tools Tutorial: Error Handling and Safety Patterns

LangChain Agents with Tools Tutorial: Error Handling and Safety Patterns

Understanding LangChain Agents and Tools

The Problem: Why Things Go Wrong

Core Concepts of Error Handling

Graceful Degradation

Retry Mechanisms

Timeout Handling

Fallback Strategies

Implementing Error Handling in LangChain Agents

Catching Tool Errors

Retrying Failed Tools

Setting Timeouts for Tools

Implementing Fallback Tools/Responses

Safety Patterns for LangChain Agents

Input Validation

Output Sanitization

Rate Limit Handling

Security Considerations

Advanced Safety Patterns

Human-in-the-Loop

Monitoring and Alerting

Production Safety Frameworks

Putting It All Together: A Comprehensive Example

Learning More

Conclusion

Share on

Leave a comment

You may also enjoy

LangChain Tutorial for Beginners: From Installation to Deployment

LangChain Document Loaders Tutorial: Handle Large Files & Optimize Performance

LangChain Tutorial for Beginners: Understanding Prompts, Models, and Chains

LangChain Tutorial for Beginners: Build a ChatGPT Clone from Scratch