Tool Usage Comparison: Negentropy vs Qwopus

by TAIL81 - opened 4 days ago

I feel that compared to 'Qwopus3.5' previously released by Jackrong, this model seems less capable with tool usage. Should we not consider this as part of the same series of models?

Jackrong

Owner 3 days ago

Thank you for the feedback 😄

The core goal of the Negentropy-claude-opus-4.7 series is reasoning / trace inversion: it mainly attempts to reconstruct compressed reasoning bubbles into more complete CoT traces, and use these reconstructed reasoning trajectories to change the model’s logical reasoning capabilities.

It was not specifically optimized for tool usage or function calling. So if your main use case is tool usage, it is entirely possible that Qwopus3.5 performs better in that area.

🧪 Negentropy is more suitable as an experimental model.

Thank you for your testing and feedback!

Zacharytack

3 days ago

•

edited 3 days ago

yo! vibe slopper here;

testing this model for various debugging tasks; it is far superior in terms of raw reasoning capacity, especially on edge cases. in Pi, running on the latest llama cpp turboquant pre-build, you can configure multiple models on different ports, and switch the model (i'm running Qwopus3.5-9B → Negentropy-Claude; Q5_K-M) you're using if you run into a reasoning-complexity-based bottleneck, mid-conversation; and the existing tool calls in the conversation are enough to coerce the model into continuing with relative ease.

You do have to do some surgical-style prompt smithing; however. This is what worked for me:

`// like, 25 different debug errors from IDE
└──────┴─────────────────────────────────┴──────────────────────────────┘

Please make small, indiviudal, atomized changes to ensure tool call parity.
This usually forms as fixing one error at a time.
Do not consider queued errors wile fixing the in-progress fix.
Be sure to disconsider nonrelevant information from already fixed tasks.
Do not make the exact same tool call twice in a row.`

And it's converged on solutions much more efficaciously than running Qwopus alone; although it still needs occasional steering.

edit:: also, Slight limitation of the negentropy model architecture/training paradigm; I've noticed that the model will occasionally 'over-reconstruct' already established thinking blocks as if it were the first time, possibly as some form of reward hacking.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment