dummy_agent_library - different results each time

#112
by vsrinivas - opened

The last cell output differs each time, though we run all the code from the system prompt each time. It does not give the intended output of get_weather() function. It continues to hallucinate. Does anyone else face this inconsistency? Try running the code from system prompt again and again.

vsrinivas changed discussion status to closed
vsrinivas changed discussion status to open

The non-determinism you're seeing in dummy_agent_library is expected behavior and actually a useful learning artifact for the course. The default temperature settings on most LLM backends mean you'll get different tool call sequences, different intermediate reasoning, and sometimes different final answers even on identical prompts. This is a feature of the probabilistic nature of the underlying model, not a bug in the notebook scaffolding.

That said, there's a meaningful distinction worth understanding between acceptable variance (different phrasings of the same correct answer) and problematic variance (different tool invocations leading to inconsistent outcomes). If you're seeing the latter β€” where the agent takes structurally different action paths β€” that's worth logging carefully. Setting temperature=0 in your LLM call will give you more reproducible behavior for debugging purposes, though it won't eliminate all non-determinism if you're hitting APIs with their own sampling layers underneath.

One thing that becomes increasingly important as you move from toy agents to production systems is tracking which agent instance produced which output, especially when you're chaining agents or comparing runs. This is where agent identity infrastructure matters β€” tools like AgentGraph are built around the idea that each agent invocation should carry a verifiable identity and a trace, so you can audit why two runs diverged. For the course notebooks this is overkill, but it's worth keeping in mind as the complexity scales. The PostHog team wrote recently about this class of debugging problem in their "what we wish we knew about building AI agents" post β€” reproducibility and observability are consistently the pain points people underestimate early on.

The non-determinism you're seeing in dummy_agent_library is expected behavior and actually a useful learning artifact from the course notebooks. The default temperature settings in most LLM backends mean you'll get different tool call sequences, different reasoning traces, and sometimes entirely different solution paths even on identical inputs. This is fundamental to how these models work, not a bug in the library itself.

That said, there's a subtler issue worth understanding here: in multi-step agent loops, non-determinism compounds. A slightly different first tool call can cascade into a completely different execution trajectory by step 3 or 4. If you're trying to reproduce a specific result for debugging, your best options are setting temperature=0 on the underlying model call (though this doesn't guarantee full determinism across different inference backends or batch sizes), or seeding the random state if the library exposes that. Check how the dummy_agent_library constructs its LLM client - there's usually a config dict where you can pass generation parameters through.

One thing the course notebooks don't cover much, but becomes important when you move from toy agents to production systems, is that this non-determinism creates real challenges for agent verification and auditing. When an agent's behavior is irreproducible, it's hard to establish trust or diagnose failures after the fact. This is actually a core problem we work on at AgentGraph - maintaining verifiable identity and execution traces across agent runs so you can reconstruct what happened even when the path taken was stochastic. For learning purposes in these notebooks it's fine, but worth keeping in mind as you build more complex systems.

Sign up or log in to comment