joneskind, (edited )
@joneskind@lemmy.world avatar

Most of 7b-8b models run just fine in 4bits quant and won’t use more than 4 or 5 GB of VRAM.

The only important metric is the amount of VRAM as the model must be loaded in VRAM for fast inference.

You could use CPU and RAM but it is really painfully slow.

If you got an Apple Silicon Mac it could be even simpler.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@lemmy.world
  • DreamBathrooms
  • mdbf
  • osvaldo12
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • cubers
  • slotface
  • khanakhh
  • kavyap
  • InstantRegret
  • Durango
  • JUstTest
  • everett
  • ethstaker
  • cisconetworking
  • Leos
  • provamag3
  • modclub
  • ngwrru68w68
  • tacticalgear
  • tester
  • megavids
  • normalnudes
  • anitta
  • lostlight
  • All magazines