tero,
@tero@rukii.net avatar

How to make LLMs perform without mistakes with a high reliability?

It's not really about hallucinations, but mostly about other kinds of mistakes. LLMs being stochastic do inherently have an uncontrollable random component in them.

You can set the temperature to zero, but you will still suffer from randomness not only because OpenAI models aren't actually fully deterministic, but also because the temperature only makes things more deterministic with a single prompt, and you would get the same result with a cache; it doesn't remove random variation between different prompts, and you would pretty rarely run inference with exactly identical prompts repeatedly anyhow.

You can destructure the task and make it easier for the bot and get radically improved performance, but still this will start giving diminishing returns especially with complex tasks. Destructuring the tasks will also show you which specific things the LLM actually struggles with, but that's a different topic.

Finally, you can set up validation feedback processes. These can bring the reliability of LLM systems as near 100% as you want.

How to build such effectively?

First of all, don't just add a review step. A review step is useful, but it's not the end. Typically LLMs won't highlight small errors if you just add a step for them to criticize the performance of the task. You will need to destructure this step as well for the maximum effect.

Give the chatbot a checklist to check for the output. Make it really clear that it should look at this from a critical angle, possibly even play out different roles in doing this checking, stereotypical characters from media are useful.

Then after all checkpoints have been checked, you can make the chatbot do a final evaluation result, evaluation summary, suggestions for prompt improvements (these aren't great yet, but will get better with new models, and already can give clues), highlight ambiguous parts, anything that will give you tools to improve the process.

Give all necessary helping information for doing this feedback you can. Your job is to make everything easy for the LLMs.

When an error has been detected – you shouldn't get these too often, if it's one error in five or more, your problem shouldn't be fixed by a validation feedback but by task destructuring – you can retry a couple of times and raise an alarm.

High-reliability systems can be built with LLMs, but you need to build them in specific, task-dependent ways.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • GTA5RPClips
  • JUstTest
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • megavids
  • lostlight
  • All magazines