[R] QuIP#: SOTA 2 bit LLMs

We're pleased to introduce QuIP#, a new SOTA LLM quantization method that uses incoherence processing from QuIP (the paper) & lattices to achieve 2 bit LLMs with near-fp16 performance! Now you can run LLaMA 2 70B on a 24G GPU w/out offloading!

QuIP# crushes all publicly available 2 bit PTQ methods on language modeling & zero shot tasks while being conceptually clean and simple. We’ve released quantized LLaMA, Mistral, and OpenHermes models, and a full codebase at https://github.com/Cornell-RelaxML/quip-sharp

More information on how QuIP# works here https://cornell-relaxml.github.io/quip-sharp/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • machinelearning
  • DreamBathrooms
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • tacticalgear
  • megavids
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • JUstTest
  • lostlight
  • All magazines