stungeye,

Have you seen LLaVA?

The Large Language and Vision Assistant is a multimodal (image & text) #ai model.

It's an open-source approach to visual & language prompting, combining a #ml vision encoder & a large language model (#Vicuna #LLaMA #llm).

It's surprisingly good!

🧵1/n

AngryAnt,
@AngryAnt@mastodon.gamedev.place avatar

@stungeye The integration documentation is somewhat messy, but once we got it running on our systems it has been a very nice component in our projects.

stungeye,

@AngryAnt Nice! What have you been using it for?

AngryAnt,
@AngryAnt@mastodon.gamedev.place avatar

@stungeye Just test projects so far - mapping new tech, but one example is looking at getting more queryable & indexable data out of for example wiki pages or slide decks - stuff where the full meaning is a combination of text and images, where only looking at the text leaves a lot on the table.

We're primarily interested in solutions where our dependencies are fully self-hosted or something which can be treated as a commodity. Building a business on someone's service API... No thanks ;)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ai
  • DreamBathrooms
  • mdbf
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • InstantRegret
  • Durango
  • provamag3
  • everett
  • cisconetworking
  • Leos
  • normalnudes
  • cubers
  • modclub
  • ngwrru68w68
  • tacticalgear
  • megavids
  • anitta
  • tester
  • JUstTest
  • lostlight
  • All magazines