32 minute read

Debugging and Testing LangGraph Applications: Python Tutorial 2026 Edition

Welcome to the Future of LangGraph Debugging and Testing!

This guide will help you understand how to fix problems and check if your LangGraph programs work correctly. We are looking at practices for the year 2026, so you’ll learn modern ways to keep your code running smoothly. You’ll become a pro at debugging testing langgraph python 2026 applications.

Why Debugging and Testing is Super Important

Imagine building a complex robot that needs to follow many steps to make you a snack. If one step is wrong, the robot might make a mess or not give you anything at all. In the same way, your LangGraph programs have many steps, and we need to make sure each one is perfect. This helps you catch mistakes early and build reliable applications.

Setting Up Your Debugging Toolkit

Before you start fixing things, you need the right tools. Think of it like a detective needing their magnifying glass and notebook. We’ll explore some great helpers for LangGraph debugging tools.

Using Print Statements (Your First Debugging Friend)

Sometimes, the easiest way to see what’s happening is to just print messages. You can add print() statements in your LangGraph nodes to show you the current state or variables. This is like asking your robot to tell you what it’s thinking at each step.

Example: Simple Print Debugging in a Node

Let’s imagine you have a simple LangGraph application. You want to see what information is being passed around between your different “brain” parts, which we call nodes. Adding print statements helps you watch the data flow.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
from langgraph.graph import StateGraph, START
from typing import TypedDict, Annotated, List, Union
import operator

# This is like the robot's memory or current status
class AgentState(TypedDict):
    chat_history: List[str] # A list of things said
    current_thought: str    # What the robot is currently thinking
    question: str           # The user's latest question

def print_node_info(state: AgentState):
    """A node that prints the current state for debugging."""
    print(f"--- Entering print_node_info ---")
    print(f"Chat History: {state['chat_history']}")
    print(f"Current Thought: {state['current_thought']}")
    print(f"Question: {state['question']}")
    print(f"--- Exiting print_node_info ---")
    # We return the state, possibly with a small update, to pass it along
    return {"current_thought": "I'm checking the state now."}

def greet_user(state: AgentState):
    """A node that adds a greeting to the chat history."""
    greeting = "Hello there! How can I help you today?"
    print(f"Node: Greet User. Adding: '{greeting}'")
    return {"chat_history": state["chat_history"] + [greeting]}

# Now, let's build a simple graph
workflow = StateGraph(AgentState)
workflow.add_node("print_state", print_node_info) # Add our print node
workflow.add_node("greeter", greet_user) # Add our greeting node

workflow.set_entry_point("greeter") # Start by greeting the user
workflow.add_edge("greeter", "print_state") # After greeting, print the state

# Let's add a simple decision point
def decide_flow(state: AgentState):
    """Decides what to do next based on the current thought."""
    print(f"Node: Deciding next step based on thought: '{state['current_thought']}'")
    if "checking" in state['current_thought'].lower():
        return "end" # If we just checked, maybe we are done
    return "greeter" # Otherwise, maybe loop back or go to another step

workflow.add_conditional_edges(
    "print_state", # From the print_state node
    decide_flow,   # Use decide_flow function to determine next node
    {"end": START, "greeter": "greeter"} # If "end", go to start (or END), else go back to greeter
)
workflow.set_finish_point("print_state") # For this example, we can finish after print_state.

app = workflow.compile()

# Run the graph and watch the prints!
initial_inputs = {"chat_history": [], "current_thought": "initial idea", "question": "Hi LangGraph!"}
print("\n--- Running LangGraph Application ---")
final_output = app.invoke(initial_inputs)
print("--- LangGraph Application Finished ---")
print(f"Final State: {final_output}")

You can see the print messages appear right in your terminal when you run this code. This helps you follow the flow of your program step by step. It’s a quick and easy way to inspect variables at different stages and understand what your LangGraph is doing.

Using a Debugger (Stepping Through Your Code)

A proper debugger is like having a superpower. It lets you pause your program at any point and look at everything inside it. This is super helpful when things get complicated or you can’t figure out why a variable has the wrong value. Python comes with a built-in debugger called pdb, and most development environments like VS Code or PyCharm have excellent visual debuggers.

How to Use pdb with breakpoint()

You can add breakpoint() in your Python code. When your program reaches this line, it will automatically pause. Then, you can type commands to move through your code line by line, or inspect variables.

Let’s add a breakpoint() to one of our nodes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import pdb # Not strictly needed if using breakpoint() but good to know
# ... (rest of your LangGraph setup from above) ...

def problematic_node(state: AgentState):
    """A node where we might suspect an issue and want to debug."""
    print("Inside problematic_node: Preparing to process a complex task.")
    intermediate_value = state['question'].upper() # Maybe this upper() isn't always right
    
    # This line will pause your program!
    breakpoint() # Modern way (Python 3.7+). For older Python, use pdb.set_trace()
    
    print(f"Value after initial processing: {intermediate_value}")
    # Imagine more complex logic here that could go wrong
    final_result = f"Processed: {intermediate_value}"
    return {"current_thought": f"Just processed: {final_result}", "chat_history": state["chat_history"] + [final_result]}

# Add this new node to your workflow
workflow_with_breakpoint = StateGraph(AgentState)
workflow_with_breakpoint.add_node("greeter", greet_user)
workflow_with_breakpoint.add_node("problem_solver", problematic_node)
workflow_with_breakpoint.add_node("print_state", print_node_info)

workflow_with_breakpoint.set_entry_point("greeter")
workflow_with_breakpoint.add_edge("greeter", "problem_solver") # Go to problem_solver after greeting
workflow_with_breakpoint.add_edge("problem_solver", "print_state") # Then to print_state

workflow_with_breakpoint.add_conditional_edges(
    "print_state",
    decide_flow, # Using our existing decide_flow function
    {"end": START, "greeter": "greeter"}
)
workflow_with_breakpoint.set_finish_point("print_state") # Finish point for this example.

app_with_breakpoint = workflow_with_breakpoint.compile()

print("\n--- Running LangGraph with Breakpoint ---")
app_with_breakpoint.invoke({"chat_history": [], "current_thought": "initial idea", "question": "what is the best color?"})
print("--- LangGraph with Breakpoint Finished ---")

When you run this, your program will stop at breakpoint(). Your terminal will change, allowing you to type commands. Common commands include n (next line), s (step into a function), c (continue running until the next breakpoint or end), and p <variable_name> (print the value of a variable). This is a very powerful tool for debugging testing langgraph python 2026 applications.

LangSmith Tracing: Your X-Ray Vision for Graphs

LangSmith is an amazing tool that gives you a visual timeline of your LangGraph application. It’s like having an X-ray machine that shows you every single step your graph takes, what information goes into each node, and what comes out. LangSmith tracing is essential for understanding complex flows and seeing exactly where your graph might be making a wrong turn.

You just need to set a few special messages (environment variables) before running your program. Then, your LangGraph runs will automatically appear on the LangSmith website. You can then click on each step to see inputs, outputs, how long it took, and even the thoughts of your AI model. This helps a lot when you are unit testing graphs or looking at integration testing strategies.

Setting up LangSmith

Before you run your Python script, open your terminal and type these lines. Remember to replace YOUR_LANGCHAIN_API_KEY with your actual key from the LangSmith website. You can find more details on setting up LangSmith on the official LangSmith documentation website [https://docs.smith.langchain.com/].

1
2
3
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="YOUR_LANGCHAIN_API_KEY"
export LANGCHAIN_PROJECT="Your LangGraph Project Name" # Give your project a recognizable name

Once these are set, simply run your LangGraph application as usual. LangSmith will then capture all the details of your runs. This makes understanding error handling patterns and performance profiling much easier, as you can see a visual trace of every single operation your graph performs. If you have an internal blog on setting up LangSmith, you could link to it here: [link to your internal LangSmith setup blog post].

Testing Your LangGraph Applications

Debugging helps you find issues that already exist, but testing helps you prevent them from happening in the first place. Think of testing as giving your robot a series of tasks to make sure it can handle them all perfectly, even the tricky ones. We want to test every part of your LangGraph to ensure it’s robust.

Unit Testing Graphs: Testing Small Pieces

Unit tests focus on the smallest, individual parts of your code. For LangGraph, this means testing individual nodes or the functions that decide which path your graph takes. We want to make sure each piece works exactly as expected, all by itself, without relying on other parts of the graph. This is a key practice for robust debugging testing langgraph python 2026 applications.

Using pytest for LangGraph Nodes

pytest is a very popular testing framework for Python. It makes writing tests easy and helps you organize them well. You can write tests for your individual nodes just like you would for any other Python function.

First, you need to install pytest. Open your terminal and type: pip install pytest.

Then, you create a special test file. Usually, these files start with test_ (e.g., test_nodes.py). Let’s assume your LangGraph nodes and AgentState definition are in a file named my_langgraph_app.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# my_langgraph_app.py (This is what your main app file might look like)
# ... (include AgentState, print_node_info, greet_user, decide_flow definitions here) ...
# Example:
# from typing import TypedDict, List, Literal
# class AgentState(TypedDict):
#     chat_history: List[str]
#     current_thought: str
#     question: str
# def greet_user(state: AgentState):
#     greeting = "Hello there!"
#     return {"chat_history": state["chat_history"] + [greeting]}
# def decide_flow(state: AgentState) -> Literal["end", "greeter"]:
#     if "done" in state['current_thought'].lower():
#         return "end"
#     return "greeter"

# test_nodes.py (This is your separate test file)
import pytest
# We need to import the functions we want to test from your main app file
from my_langgraph_app import greet_user, print_node_info, decide_flow, AgentState
from typing import List, TypedDict

# A 'fixture' provides a common starting point for your tests, like a clean slate
@pytest.fixture
def initial_state_fixture() -> AgentState:
    """Provides a consistent initial state for tests."""
    return AgentState(chat_history=[], current_thought="start_test", question="Test Question?")

def test_greet_user_node(initial_state_fixture: AgentState):
    """Tests if the greet_user node correctly adds a greeting."""
    print("\nRunning test_greet_user_node...")
    result_state = greet_user(initial_state_fixture)
    # Check if the chat history now contains the greeting
    assert "Hello there!" in result_state["chat_history"]
    assert len(result_state["chat_history"]) == 1
    # Check that other parts of the state are unchanged (or as expected)
    assert result_state["current_thought"] == "start_test"
    assert result_state["question"] == "Test Question?"

def test_print_node_info_output(initial_state_fixture: AgentState, capsys):
    """Tests if print_node_info prints the correct information to the console."""
    print("\nRunning test_print_node_info_output...")
    print_node_info(initial_state_fixture)
    captured = capsys.readouterr() # `capsys` captures whatever is printed to the screen
    # Check if specific messages we expect were printed
    assert "--- Entering print_node_info ---" in captured.out
    assert "Chat History: []" in captured.out
    assert "Question: Test Question?" in captured.out
    # Also test the return value of the node if it modifies the state
    returned_state = print_node_info(initial_state_fixture)
    assert returned_state["current_thought"] == "I'm checking the state now."

# Example of testing a conditional function separately
def test_decide_flow_end_path():
    """Tests if decide_flow correctly returns 'end' when appropriate."""
    print("\nRunning test_decide_flow_end_path...")
    state_to_end = AgentState(chat_history=[], current_thought="I'm done!", question="")
    next_step = decide_flow(state_to_end)
    assert next_step == "end"

def test_decide_flow_greeter_path():
    """Tests if decide_flow correctly returns 'greeter' for other cases."""
    print("\nRunning test_decide_flow_greeter_path...")
    state_to_continue = AgentState(chat_history=[], current_thought="More to do", question="")
    next_step = decide_flow(state_to_continue)
    assert next_step == "greeter"

To run these tests, save the code above into test_nodes.py (and your LangGraph code in my_langgraph_app.py). Then, open your terminal in the same folder and just type pytest. This is a great example of pytest integration, making sure each small piece of your LangGraph works as expected. If you want to learn more about pytest fixtures, check out this guide: [link to your internal blog post on pytest fixtures].

Mocking LLM Responses: Simulating AI Brains

LangGraph applications often talk to large language models (LLMs), which are like smart AI brains. When you’re unit testing, you don’t want to actually call a real LLM every time. Why? Because it can be slow, expensive, and sometimes give slightly different answers. Mocking LLM responses means pretending to be the LLM and giving a fake, but predictable, answer.

This helps you test your graph’s logic without relying on the actual AI. It makes your tests much faster and more reliable. You can test how your graph behaves if the LLM says “yes,” “no,” or gives a specific piece of information.

How to Mock with unittest.mock

Python’s built-in unittest.mock library is perfect for this. You can replace the real LLM call with a “MagicMock” object. This mock object can be told exactly what to return when it’s called.

Let’s assume you have a node that calls an LLM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# my_langgraph_app.py (continued with an LLM node)
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.language_models import BaseChatModel # This is the type for LLMs

class AgentState(TypedDict):
    chat_history: List[str]
    current_thought: str
    question: str
    llm_response: str # To store the LLM's direct answer

def call_llm_node(state: AgentState, llm: BaseChatModel):
    """A node that calls an LLM with the user's question."""
    print(f"Node: Calling LLM with question: '{state['question']}'")
    # In a real app, 'llm' would be an instance of ChatOpenAI, ChatAnthropic, etc.
    response = llm.invoke([HumanMessage(content=state['question'])])
    print(f"Node: LLM responded: '{response.content}'")
    return {
        "llm_response": response.content,
        "chat_history": state["chat_history"] + [f"AI: {response.content}"]
    }

# test_mocking.py (your separate test file)
import pytest
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage # Import specific message types
from my_langgraph_app import call_llm_node, AgentState # Import the node and state

def test_call_llm_node_with_mocked_response():
    """Tests call_llm_node by mocking the LLM's response."""
    print("\nRunning test_call_llm_node_with_mocked_response...")
    mock_llm = MagicMock() # Create a fake LLM object
    
    # Tell the fake LLM what to return when its 'invoke' method is called
    mock_llm.invoke.return_value = AIMessage(content="Mocked LLM Answer: The capital of France is Paris.")

    initial_state = AgentState(
        chat_history=["User: What is the capital of France?"],
        current_thought="query LLM for info",
        question="What is the capital of France?"
    )
    
    # Pass the mocked LLM to your node function
    result_state = call_llm_node(initial_state, mock_llm)

    # Now, check the results
    assert result_state["llm_response"] == "Mocked LLM Answer: The capital of France is Paris."
    assert "AI: Mocked LLM Answer: The capital of France is Paris." in result_state["chat_history"]
    
    # You can also check if the mock LLM was called and with what arguments
    mock_llm.invoke.assert_called_once() # Make sure the LLM was called exactly once
    
    # Check the exact arguments passed to the LLM
    args, kwargs = mock_llm.invoke.call_args
    # We expect a list containing a HumanMessage with the question
    assert len(args[0]) == 1
    assert isinstance(args[0][0], HumanMessage)
    assert args[0][0].content == "What is the capital of France?"

def test_call_llm_node_with_different_mocked_response():
    """Tests the node's behavior with a different mocked LLM response."""
    print("\nRunning test_call_llm_node_with_different_mocked_response...")
    mock_llm = MagicMock()
    mock_llm.invoke.return_value = AIMessage(content="Mocked LLM Answer: I don't know.")

    initial_state = AgentState(
        chat_history=[],
        current_thought="ask unknown",
        question="What is the meaning of life?"
    )
    
    result_state = call_llm_node(initial_state, mock_llm)
    assert "I don't know" in result_state["llm_response"]
    assert "AI: Mocked LLM Answer: I don't know." in result_state["chat_history"]

This way, you can test specific behaviors of your call_llm_node without hitting an actual LLM service. This is critical for efficient debugging testing langgraph python 2026 applications, especially when dealing with cost or rate limits.

Testing Conditional Logic: Making Sure Your Graph Chooses Wisely

LangGraph is all about conditional logic – making decisions about where to go next based on the current state of your application. You need to thoroughly test these decision points to ensure your graph always takes the correct path, no matter what happens. This is like making sure your robot always goes to the kitchen for snacks and not the garage.

Example: Testing a decide_next_step Function

You saw a basic example of decide_flow earlier. Let’s make it a bit more complex and use a powerful pytest feature called parametrize to test many different scenarios easily.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# my_langgraph_app.py (continued with a more complex decision function)
from typing import Literal

def decide_next_step_complex(state: AgentState) -> Literal["search", "summarize", "respond", "end"]:
    """
    Decides the next action based on the current thought and question.
    - If "search" is in thought, go to search.
    - If "summarize" is in thought, go to summarize.
    - If "done" or "finish" is in thought, end the conversation.
    - Otherwise, default to "respond".
    """
    current_thought_lower = state['current_thought'].lower()
    print(f"Node: Deciding next step for thought: '{state['current_thought']}'")

    if "search" in current_thought_lower:
        return "search"
    elif "summarize" in current_thought_lower:
        return "summarize"
    elif "done" in current_thought_lower or "finish" in current_thought_lower:
        return "end"
    else:
        return "respond"

# test_conditional_logic.py (your separate test file)
import pytest
from my_langgraph_app import decide_next_step_complex, AgentState

@pytest.mark.parametrize(
    "thought_input, expected_next_node",
    [
        ("I need to search for current news.", "search"),
        ("Please summarize our chat so far.", "summarize"),
        ("Okay, I'm done with this conversation.", "end"),
        ("Finish up the interaction now.", "end"),
        ("What else can I help with?", "respond"), # Default path
        ("Search for prices of bananas.", "search"),
        ("Summarize the last 5 messages.", "summarize"),
        ("I'm not sure what to do.", "respond"),
        ("", "respond"), # Empty thought should default
    ]
)
def test_decide_next_step_complex_paths(thought_input: str, expected_next_node: str):
    """
    Tests if the decide_next_step_complex function correctly routes the graph
    based on different 'current_thought' inputs.
    """
    print(f"\nRunning test for thought: '{thought_input}' expecting: '{expected_next_node}'")
    state = AgentState(chat_history=[], current_thought=thought_input, question="dummy_q")
    result = decide_next_step_complex(state)
    assert result == expected_next_node, f"Expected '{expected_next_node}' but got '{result}' for thought '{thought_input}'"

def test_decide_next_step_complex_case_insensitivity():
    """Tests if the decision logic handles different cases correctly."""
    print("\nRunning test_decide_next_step_complex_case_insensitivity...")
    state_search_upper = AgentState(chat_history=[], current_thought="SEARCH this!", question="")
    assert decide_next_step_complex(state_search_upper) == "search"

    state_summarize_mixed = AgentState(chat_history=[], current_thought="SuMmArIzE now", question="")
    assert decide_next_step_complex(state_summarize_mixed) == "summarize"

The pytest.mark.parametrize decorator is incredibly useful. It lets you run the same test multiple times with different inputs (thought_input) and different expected outputs (expected_next_node). This is perfect for thoroughly testing conditional logic, making sure every possible path in your graph is handled correctly. If you’re building a complex agent, you might want to look at a separate blog post dedicated to [internal link to your blog post on building robust conditional agents].

Integration Testing Strategies: Testing the Whole Journey

While unit tests check small parts, integration tests check how different parts of your LangGraph work together. It’s like checking if all the robot’s arms, legs, and sensors communicate properly to make a snack, from start to finish. These tests make sure that when nodes are connected in a workflow, they pass information correctly and the overall flow works as intended.

Running the Full Graph with Specific Scenarios

You want to run your entire compiled LangGraph application with various starting states and expected outcomes. This helps catch issues that only appear when nodes interact or when information is transformed as it moves through the graph.

For integration tests, you’ll compile your actual LangGraph workflow and invoke it. You will then check the final state or the side effects (like database writes, external API calls - though these would typically be mocked).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# my_langgraph_app.py (Full workflow definition needed for integration tests)
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated, List, Union, Literal
import operator
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.language_models import BaseChatModel

# Assume AgentState, greet_user, print_node_info, call_llm_node, decide_next_step_complex are defined above.
# For a full example, you'd have these copied or imported.

# Simplified LangGraph for integration testing
class AgentState(TypedDict):
    chat_history: List[str]
    current_thought: str
    question: str
    llm_response: str
    summary: str

def greet_user(state: AgentState):
    greeting = "Hello from the graph!"
    return {"chat_history": state["chat_history"] + [f"AI: {greeting}"]}

def call_llm_node(state: AgentState, llm: BaseChatModel):
    response = llm.invoke([HumanMessage(content=state['question'])])
    return {"llm_response": response.content, "chat_history": state["chat_history"] + [f"AI: {response.content}"]}

def summarize_node(state: AgentState, llm: BaseChatModel):
    # In a real app, this would use the LLM to summarize chat_history
    print(f"Node: Summarizing chat history: {state['chat_history']}")
    mock_summary = f"Summary of {len(state['chat_history'])} messages."
    return {"summary": mock_summary, "chat_history": state["chat_history"] + [f"AI: {mock_summary}"]}

def decide_next_step_integration(state: AgentState) -> Literal["call_llm", "summarize_chat", "end_conversation"]:
    thought_lower = state['current_thought'].lower()
    if "summarize" in thought_lower:
        return "summarize_chat"
    elif "question" in thought_lower and state['question']:
        return "call_llm"
    return "end_conversation"

# Build the main workflow
integration_workflow = StateGraph(AgentState)
integration_workflow.add_node("greet", greet_user)
integration_workflow.add_node("call_llm", call_llm_node)
integration_workflow.add_node("summarize_chat", summarize_node)

integration_workflow.set_entry_point("greet")
integration_workflow.add_edge("greet", "call_llm") # After greeting, assume we always ask LLM for initial example

integration_workflow.add_conditional_edges(
    "call_llm",
    decide_next_step_integration,
    {
        "call_llm": "call_llm", # Loop for more LLM calls
        "summarize_chat": "summarize_chat",
        "end_conversation": END
    }
)
integration_workflow.add_edge("summarize_chat", END) # After summarizing, we end.

# You would compile this in your main application logic or just before testing
# app_for_integration_test = integration_workflow.compile()

# test_integration.py (your separate test file)
import pytest
from unittest.mock import MagicMock, patch
from my_langgraph_app import integration_workflow, AgentState, HumanMessage, AIMessage # Import your workflow and state

# Compile the graph once for all integration tests in this file
@pytest.fixture(scope="module")
def compiled_app_for_integration():
    """Compiles the LangGraph workflow once per test module."""
    return integration_workflow.compile()

@pytest.fixture
def mock_llm_for_integration():
    """Provides a consistent mock LLM for integration tests."""
    mock_llm = MagicMock()
    # Default response for general LLM calls
    mock_llm.invoke.return_value = AIMessage(content="Default mocked LLM answer to your question.")
    return mock_llm

def test_full_graph_simple_question_flow(compiled_app_for_integration, mock_llm_for_integration):
    """
    Tests a complete simple flow: greet -> LLM question -> end.
    The LLM is mocked to ensure predictable behavior.
    """
    print("\nRunning test_full_graph_simple_question_flow...")
    initial_state = AgentState(chat_history=[], current_thought="question for LLM", question="What is your favorite color?")

    # LangGraph's invoke accepts an 'input' and a 'config'.
    # We pass our mocked LLM through the config, specifically as part of the 'configurable' key.
    # The 'llm' key in configurable should match what your nodes expect.
    config = {"configurable": {"llm": mock_llm_for_integration}}
    
    # We also limit recursion to prevent infinite loops in tests if logic is wrong
    final_state = compiled_app_for_integration.invoke(initial_state, config=config, recursion_limit=5)

    assert "Hello from the graph!" in final_state["chat_history"][0]
    assert "Default mocked LLM answer to your question." in final_state["llm_response"]
    assert "AI: Default mocked LLM answer to your question." in final_state["chat_history"][-1]
    mock_llm_for_integration.invoke.assert_called_once() # Ensure LLM was called

def test_full_graph_summarize_path(compiled_app_for_integration, mock_llm_for_integration):
    """
    Tests a flow where the graph is routed to the 'summarize' node and then ends.
    We need to make sure decide_next_step_integration routes correctly.
    """
    print("\nRunning test_full_graph_summarize_path...")
    initial_state = AgentState(
        chat_history=["AI: Hello from the graph!", "AI: Default mocked LLM answer to your question."],
        current_thought="summarize the conversation",
        question="Please summarize."
    )
    
    config = {"configurable": {"llm": mock_llm_for_integration}} # Still need the LLM mock
    final_state = compiled_app_for_integration.invoke(initial_state, config=config, recursion_limit=5)

    assert "Hello from the graph!" in final_state["chat_history"][0]
    assert "Default mocked LLM answer to your question." in final_state["llm_response"] # From initial LLM call if graph loops
    assert "Summary of" in final_state["summary"]
    assert "AI: Summary of 2 messages." in final_state["chat_history"][-1] # The summarize node should update this
    # Note: the specific content of chat_history might vary based on how many times LLM is called
    # before summarize_node based on your decision logic.
    # This test assumes it goes greet -> LLM (once) -> summarize.
    # The actual LLM call count depends on graph structure. For this specific flow, only `summarize_node` might need `llm`.

    # Let's adjust mock for summarize_node if it uses LLM:
    # We mocked `call_llm_node` and `summarize_node` to not actually call an LLM.
    # If `summarize_node` *did* call an LLM, you'd configure the mock_llm_for_integration differently
    # or use another patch for that specific LLM call.

Remember, for integration tests involving LLMs, you often still want to mock the LLM to keep tests fast and predictable. The goal here is to test the flow and interactions between your nodes, not the LLM’s raw intelligence. If you need more complex mocking strategies, you can read our blog post on [internal link to your advanced mocking strategies for complex agents].

Advanced Debugging and Testing Techniques

Now that you’ve got the basics down, let’s look at more advanced ways to keep your LangGraph robust and performing well. These techniques are very important for debugging testing langgraph python 2026 applications, especially as they grow larger and more critical.

Logging Best Practices: Keeping a Diary of Your App

Logging is like your application writing a detailed diary. Instead of just printing to the screen, logs store information about what your program is doing, important events, warnings, and errors in a structured way. This is invaluable for understanding what happened after your program has finished running, especially when it’s deployed and you’re not actively watching it.

Python’s logging module is very powerful. You can configure it to save messages to files, send them to external services, and filter them by importance level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import logging
import time # For simulated delay

# Configure basic logging. In a real app, this would be more detailed,
# possibly reading from a config file.
# We set it to INFO level, meaning we see INFO, WARNING, ERROR, CRITICAL messages.
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__) # Get a logger for this module/file

# (Using AgentState from previous examples)
# class AgentState(TypedDict): ...

def logged_and_robust_node(state: AgentState):
    """A node that uses logging and includes basic error handling."""
    logger.info(f"Entered logged_and_robust_node to process: {state['question']}")
    try:
        # Simulate some processing that might fail
        if "error" in state['question'].lower():
            logger.warning("Simulating an error condition based on input.")
            raise ValueError("Simulated error: Problem with question processing.")
        
        # Simulate work
        time.sleep(0.1) 
        processed_data = state['question'].strip() + " [PROCESSED]"
        
        logger.debug(f"Successfully processed question. Intermediate data: {processed_data}") # DEBUG level, won't show with INFO level config
        return {"current_thought": f"Processed: {processed_data}", "chat_history": state["chat_history"] + [f"AI: {processed_data}"]}
    except ValueError as ve:
        logger.error(f"Specific processing error in node: {ve}", exc_info=True) # exc_info adds the full traceback
        # You might return an error state or propagate
        return {"current_thought": "Failed to process question", "error": str(ve)}
    except Exception as e:
        logger.critical(f"An unexpected critical error occurred: {e}", exc_info=True)
        return {"current_thought": "Critical system error", "error": str(e)}

# You would integrate this node into your LangGraph workflow.
# For example:
# workflow.add_node("robust_processor", logged_and_robust_node)
# workflow.add_edge("previous_node", "robust_processor")

By using different log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL), you can control how much detail you see. For example, during development, you might set the level to DEBUG to see everything. In production, you might set it to INFO or WARNING to only see important messages and errors. For more information on setting up advanced logging, you can refer to the official Python logging documentation [https://docs.python.org/3/library/logging.html].

Error Handling Patterns: What to Do When Things Go Wrong

No program is perfect, and errors will happen. Good error handling patterns mean your LangGraph can gracefully deal with problems instead of crashing entirely. This is about catching errors inside your nodes or at the graph level and deciding what to do next. It’s like having a backup plan for your robot if it can’t find the snacks.

In LangGraph, you might implement different strategies:

  1. Catch and Log: Catch an error in a node, log it, and perhaps return a special “error” state for the graph to handle.
  2. Retry: If an external service fails temporarily (like an LLM API timeout), you might try again after a short delay.
  3. Fallback: If one method fails, try an alternative, like providing a default answer instead of an LLM response.
  4. Terminate Gracefully: If an error is unrecoverable, ensure your graph stops cleanly and reports the issue.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import time
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type # pip install tenacity

# (Assume AgentState and BaseChatModel from before)
# class AgentState(TypedDict): ...

@retry(stop=stop_after_attempt(3), wait=wait_fixed(2), retry=retry_if_exception_type(Exception))
def _reliable_llm_call(llm_client: BaseChatModel, messages: List[HumanMessage]):
    """Internal function to call LLM with retries."""
    print(f"  Attempting LLM call for: {messages[0].content}")
    # Simulate an occasional failure
    if time.time() % 7 < 2: # Fail for ~2/7ths of the time
        raise ConnectionError("Simulated LLM connection error!")
    response = llm_client.invoke(messages)
    return response

def robust_llm_node_with_retries(state: AgentState, llm: BaseChatModel):
    """
    A LangGraph node that calls an LLM with built-in retry logic.
    If all retries fail, it returns an error state.
    """
    logger.info(f"Entered robust_llm_node_with_retries for question: {state['question']}")
    try:
        llm_messages = [HumanMessage(content=state['question'])]
        response = _reliable_llm_call(llm, llm_messages)
        logger.info("LLM call successful.")
        return {"llm_response": response.content, "chat_history": state["chat_history"] + [f"AI: {response.content}"]}
    except Exception as e:
        logger.error(f"LLM call permanently failed after retries for question: {state['question']} - Error: {e}", exc_info=True)
        # Fallback action: return a predefined error message or switch to a different node
        fallback_message = "I'm sorry, I couldn't connect to my brain. Please try again later."
        return {"llm_response": fallback_message, "chat_history": state["chat_history"] + [f"AI: {fallback_message}"], "error_flag": True}

# In your workflow, you would integrate this:
# workflow.add_node("llm_caller_robust", robust_llm_node_with_retries)
# And then have conditional edges based on 'error_flag' in the state.
# For example, if 'error_flag' is True, route to a human handoff node.

This example shows a retry mechanism using the tenacity library, which simplifies adding retries to functions. You can build even more sophisticated error handling into your LangGraph, perhaps with dedicated error handling nodes that take action when error_flag is set. Learn more about advanced error handling in Python in our dedicated article: [internal link to your blog on Python error handling].

Performance Profiling: Making Your Graph Faster

Sometimes your LangGraph might run slowly. Performance profiling helps you find the bottlenecks – the parts of your code that are taking the most time to execute. This is like finding out which part of your robot is moving slowest and needs a speed boost.

Tools like cProfile (built-in Python profiler) or SnakeViz (for visualizing cProfile output) can show you exactly where time is being spent. LangSmith also provides excellent timing information for each node and step, which is a great starting point for performance profiling, offering a visual breakdown of your graph’s execution time.

Basic Profiling with cProfile

You can use cProfile to run your entire LangGraph application and then get a report of which functions took the most time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import cProfile
import pstats
from my_langgraph_app import integration_workflow, AgentState, mock_llm_for_integration # Assuming these are available or mocked

def run_profiled_graph(app_to_profile, initial_input, config_for_app):
    """A helper function to run the graph that we want to profile."""
    print("\n--- Running profiled LangGraph ---")
    app_to_profile.invoke(initial_input, config=config_for_app, recursion_limit=10)
    print("--- Profiled LangGraph Finished ---")

# Let's mock an LLM client for profiling, so we don't hit real APIs
from unittest.mock import MagicMock, patch
mock_llm_for_profiling = MagicMock()
mock_llm_for_profiling.invoke.return_value = AIMessage(content="Mocked LLM profiling response.")

# Compile your workflow for profiling
app_for_profiling = integration_workflow.compile()

initial_inputs_profile = AgentState(chat_history=[], current_thought="start_profile", question="Profile this run.")
config_profile = {"configurable": {"llm": mock_llm_for_profiling}}

# To run profiling
profiler = cProfile.Profile()
profiler.enable() # Start recording
run_profiled_graph(app_for_profiling, initial_inputs_profile, config_profile)
profiler.disable() # Stop recording

# Print results to the console, showing the top 10 functions by cumulative time
stats = pstats.Stats(profiler).sort_stats('cumtime') # Sort by the total time spent in a function AND all functions it calls
print("\n--- Profiling Results (Top 10 by Cumulative Time) ---")
stats.print_stats(10) # Print only the top 10 entries

# You can also save the stats to a file and visualize them with SnakeViz:
# stats.dump_stats("profile_results.prof")
# Then run: snakeviz profile_results.prof

Analyzing these statistics helps you pinpoint functions or nodes that are consuming too much time. The cumtime (cumulative time) is especially useful as it shows the total time spent inside a function and all the functions it calls. You can also monitor performance over time in LangSmith to spot any slowdowns or regressions in your production applications.

CI/CD for LangGraph: Automating Quality

CI/CD stands for Continuous Integration and Continuous Delivery (or Deployment). It’s about automating the process of building, testing, and deploying your LangGraph application. This ensures that every change you make to your code is automatically checked for errors and can be quickly and reliably released. It’s like having an automated factory that checks every robot part before assembly.

What is CI? (Continuous Integration)

Continuous Integration means that developers frequently merge their code changes into a central repository (like GitHub or GitLab). After each merge, an automated system builds the application and runs all your tests (unit tests, integration tests, etc.). If any test fails, you know immediately, helping you fix issues quickly.

For LangGraph, this means:

  • Automated pytest runs: Every time you push code, all your test_*.py files are executed.
  • Linting/Code Quality Checks: Tools like flake8 or black ensure your code follows style guidelines, making it easier to read and maintain.
  • Security Scans: Checks your project’s dependencies for known security vulnerabilities.

This ensures that your debugging testing langgraph python 2026 efforts are backed by consistent quality checks, catching bugs before they become big problems.

What is CD? (Continuous Delivery/Deployment)

Continuous Delivery means that after the CI checks pass, your application is always in a state where it’s ready to be released to users at any time. Continuous Deployment takes it a step further, automatically deploying every successful change to your users.

For LangGraph, this could mean:

  • Automatically deploying your LangGraph microservice to a cloud server (e.g., AWS Lambda, Kubernetes, or a serverless platform).
  • Updating your running application with new nodes or improved conditional logic without needing a person to manually intervene.
  • Monitoring the health and performance of your deployed graph with tools like LangSmith.

Platforms like GitHub Actions, GitLab CI/CD, and Jenkins are popular for setting up these automated pipelines. You would create a special configuration file (often a .yml file) in your project’s repository that tells the CI/CD system exactly what steps to take.

Example: Simple GitHub Actions Workflow (.github/workflows/main.yml)

This file goes into a .github/workflows folder in your project. It tells GitHub to run specific commands whenever someone pushes new code or creates a pull request.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
name: LangGraph CI/CD

on:
  push:
    branches:
      - main # Run on pushes to the main branch
  pull_request:
    branches:
      - main # Run on pull requests targeting the main branch

jobs:
  build-and-test:
    runs-on: ubuntu-latest # Use a fresh Linux environment for each run
    steps:
    - name: Checkout code
      uses: actions/checkout@v3 # Get your code from the repository

    - name: Set up Python
      uses: actions/setup-python@v4 # Install Python
      with:
        python-version: '3.11' # Specify the Python version you use

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt # Install all packages listed in your requirements.txt
        # Also explicitly install LangGraph and related tools if not in requirements.txt
        pip install langgraph langchain-core langchain pytest tenacity python-dotenv

    - name: Run Pytest tests
      run: |
        pytest # This command will find and run all your test_*.py files
      env:
        LANGCHAIN_TRACING_V2: "true"
        # Access your API key securely from GitHub Secrets
        LANGCHAIN_API_KEY: $ 
        LANGCHAIN_PROJECT: "LangGraph-CI-Project" # A name for traces in LangSmith

    # You could add more steps here, for example, for linting, security scans, or deployment
    # - name: Run Linting (e.g., Flake8)
    #   run: |
    #     pip install flake8
    #     flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
    # - name: Deploy to staging environment (example, this would be more complex)
    #   if: success() && github.ref == 'refs/heads/main' # Only deploy if tests pass and on main branch
    #   run: |
    #     echo "Tests passed! Deploying LangGraph application to staging..."
    #     # Your deployment commands would go here, e.g., using Docker, serverless CLI, etc.
    #     # For example: aws lambda update-function-code --function-name MyLangGraphAgent --zip-file fileb://package.zip

This workflow automatically runs your tests whenever you push code or open a pull request. This is a robust way to ensure the quality of your debugging testing langgraph python 2026 applications before they reach users. You can find more details on setting up GitHub Actions on their official documentation [https://docs.github.com/en/actions]. For more advanced CI/CD patterns specifically for AI agents, you might want to read our article on [internal link to your CI/CD for AI Agents blog post].

Keeping Your LangGraph Healthy in 2026 and Beyond

You’ve learned about many powerful ways to debug and test your LangGraph applications. From simple print statements and interactive debuggers to visual tracing with LangSmith, and automated testing with pytest and CI/CD pipelines, each tool helps you build more reliable and robust AI agents. Remember, continuous testing and mindful debugging are keys to long-term success.

Tips for Future-Proofing Your LangGraph

  • Stay Updated: The LangGraph library and its surrounding ecosystem are evolving fast. Keep an eye on new features, bug fixes, and best practices. Regularly update your libraries.
  • Modular Design: Design your LangGraph nodes to be small, focused, and do one thing well. This makes them much easier to test in isolation, reuse in other graphs, and understand when debugging.
  • Document Everything: Write down how your graph works, what each node does, what inputs it expects, and how to test it. Good documentation helps others (and your future self!) quickly understand and maintain the code.
  • Monitor in Production: Don’t stop testing and debugging once your application is live. Use tools like LangSmith for continuous tracing and monitoring to keep an eye on your live applications. Spot problems, performance issues, and unexpected behaviors before your users do.

By following these steps, you’ll be well-equipped to build, debug testing langgraph python 2026, and maintain powerful, intelligent LangGraph applications for years to come.

Leave a comment