YouTube

An Evolving DAG for the LLM world - Julia Schottenstein of LangChain at Small Data SF 2024

2024/09/24

Building Agentic Systems with LangChain: From DAGs to Directed Cyclic Graphs

LangChain has emerged as a popular open-source framework for Python and TypeScript developers looking to build agentic systems that combine the power of Large Language Models (LLMs) with organizational data. The framework addresses a fundamental challenge in AI application development: while LLMs possess incredible capabilities, they lack context about specific businesses, applications, and recent events beyond their training cutoff dates.

Augmenting LLMs with Context and Tools

The core value proposition of LangChain lies in helping developers augment LLMs with:

Private documents and data for domain-specific reasoning
Tool usage through APIs with defined instructions
Up-to-date information to overcome training data limitations

This augmentation typically happens through chains - discrete, ordered steps similar to directed acyclic graphs (DAGs) in data pipelines. The most common implementation is the Retrieval-Augmented Generation (RAG) chain, where:

A question enters the system
Relevant context is retrieved from a vector database
The question, context, and prompt instructions are sent to the LLM
The LLM generates a contextually-aware response

The Evolution from Chains to Agents

While chains provide reliable, predetermined workflows, the future of AI applications increasingly points toward agents. In technical terms, an agent represents a system where the LLM decides the control flow dynamically rather than following predefined code paths. This fundamental shift transforms traditional DAGs into directed graphs that can include cycles.

The ability to iterate and learn from failures becomes crucial in agent design. Unlike deterministic code that produces identical results on repeated execution, LLMs can improve their performance on subsequent attempts by understanding what went wrong in previous iterations. This capability is exemplified by sophisticated code generation agents that:

Reflect on problems before execution
Generate and test multiple solution approaches
Iteratively refine outputs based on test results
Create cycles in their execution graphs for continuous improvement

Key Challenges in Building Reliable Agents

Planning and Reflection

Research demonstrates that agents perform significantly better when given explicit planning and reflection steps. Like a rock climber surveying potential routes before ascending, agents benefit from evaluating possible paths before execution. This pre-processing phase allows for more strategic decision-making and improved task completion rates.

Memory Management

Complex agent systems often involve multiple specialized sub-agents collaborating on tasks. This architecture, known as cognitive architecture, requires sophisticated memory management to:

Maintain shared state between agents
Preserve context across multiple sessions
Enable agents to learn from previous attempts
Facilitate collaboration in multi-agent workflows

Reliability Concerns

Agent reliability faces several obstacles:

LLM non-determinism in response generation
Task ambiguity from natural language inputs
Tool misuse when agents get stuck in repetitive patterns or select inappropriate APIs

Balancing Flexibility and Control with LangGraph

LangGraph represents LangChain's solution to the flexibility-reliability trade-off. This orchestration framework introduces several key innovations for agent development:

Controllability

The framework supports both explicit and implicit workflows, allowing developers to define guardrails while maintaining agent autonomy. This hybrid approach enables more predictable behavior without sacrificing the adaptive capabilities that make agents powerful.

Persistence Layer

A robust persistence layer provides shared memory and state management, essential for both individual agent sessions and multi-agent collaboration scenarios. This ensures continuity and context preservation across complex workflows.

Human-in-the-Loop Capabilities

Recognizing that fully autonomous agents may struggle with complex tasks, LangGraph incorporates human steering mechanisms. This allows users to guide agents when they encounter difficulties or make routing errors, improving overall task completion rates.

Streaming Support

To address latency concerns and improve user experience, the framework supports both token-by-token streaming and intermediate step visibility. This transparency helps users understand the agent's problem-solving process, particularly important when operations take extended time to complete.

Real-World Agent Applications

Several production agents demonstrate the practical applications of these concepts:

Roblox Studio AI creates entire virtual worlds from natural language prompts, generating scripts and assets automatically
TripAdvisor's travel agent builds personalized itineraries based on user preferences, group size, and travel dates
Replit's coding agent generates code, creates tests, and automates pull request creation

These applications showcase how agents can move beyond simple chat interfaces to become sophisticated task-completion systems that understand context, iterate on solutions, and deliver tangible value to users.

The Future of AI Orchestration

The evolution from traditional DAGs to directed cyclic graphs represents a fundamental shift in how we approach AI application development. While DAGs remain valuable for deterministic data pipelines, the ability to incorporate cycles opens new possibilities for building intelligent systems that can plan, reflect, and improve through iteration. As agent technology continues to mature, frameworks like LangChain and LangGraph provide the necessary tools to build reliable, flexible, and powerful agentic applications that can tackle increasingly complex real-world problems.

Related Videos

"From Curiosity to Impact How DoSomething Democratized Data" video thumbnail

2025-09-10

From Curiosity to Impact How DoSomething Democratized Data

Hear how DoSomething's data team escaped the enterprise data trap, achieving 20X cost reduction and transforming hours-long queries into seconds with MotherDuck.

YouTube

"How to Efficiently Load Data into DuckLake with Estuary" video thumbnail

2025-07-26

How to Efficiently Load Data into DuckLake with Estuary

Learn how DuckLake, MotherDuck, and Estuary enable fast, real-time data integration and analytics with modern open table formats, cloud data warehousing, and no-code streaming pipelines.

YouTube

"What can Postgres learn from DuckDB? (PGConf.dev 2025)" video thumbnail

20:44

2025-06-13

What can Postgres learn from DuckDB? (PGConf.dev 2025)

DuckDB an open source SQL analytics engine that is quickly growing in popularity. This begs the question: What can Postgres learn from DuckDB?

YouTube

Ecosystem

Talk