#javascript - kbin.social

joe, 5 hours ago to ai

LLaVA (Large Language-and-Vision Assistant) was updated to version 1.6 in February. I figured it was time to look at how to use it to describe an image in Node.js. LLaVA 1.6 is an advanced vision-language model created for multi-modal tasks, seamlessly integrating visual and textual data. Last month, we looked at how to use the official Ollama JavaScript Library. We are going to use the same library, today.

Basic CLI Example

Let’s start with a CLI app. For this example, I am using my remote Ollama server but if you don’t have one of those, you will want to install Ollama locally and replace const ollama = new Ollama({ host: 'http://100.74.30.25:11434' }); with const ollama = new Ollama({ host: 'http://localhost:11434' });.

To run it, first run npm i ollama and make sure that you have "type": "module" in your package.json. You can run it from the terminal by running node app.js <image filename>. Let’s take a look at the result.

The Image The Description

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/window-sign-580x423.jpg?resize=580%2C423&ssl=1 https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.06.55%E2%80%AFPM.png?resize=669%2C502&ssl=1

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/sandwiches-580x423.jpg?resize=580%2C423&ssl=1 https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.12.21%E2%80%AFPM.png?resize=669%2C502&ssl=1

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/concert-580x423.jpg?resize=580%2C423&ssl=1 https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.18.06%E2%80%AFPM.png?resize=669%2C502&ssl=1

Its ability to describe an image is pretty awesome.

Basic Web Service

So, what if we wanted to run it as a web service? Running Ollama locally is cool and all but it’s cooler if we can integrate it into an app. If you npm install express to install Express, you can run this as a web service.

The web service takes posts to http://localhost:4040/describe-image with a binary body that contains the image that you are trying to get a description of. It then returns a JSON object containing the description.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.41.20%E2%80%AFPM.png?resize=1024%2C729&ssl=1

Have any questions, comments, etc? Feel free to drop a comment, below.

https://jws.news/2024/how-can-you-use-llava-and-node-js-to-describe-an-image/

#AI #JavaScript #LLaVA #LLM #NodeJs #Ollama

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

The Image	The Description
https://i0.wp.com/jws.news/wp-content/uploads/2024/05/window-sign-580x423.jpg?resize=580%2C423&ssl=1	https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.06.55%E2%80%AFPM.png?resize=669%2C502&ssl=1
https://i0.wp.com/jws.news/wp-content/uploads/2024/05/sandwiches-580x423.jpg?resize=580%2C423&ssl=1	https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.12.21%E2%80%AFPM.png?resize=669%2C502&ssl=1
https://i0.wp.com/jws.news/wp-content/uploads/2024/05/concert-580x423.jpg?resize=580%2C423&ssl=1	https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-18-at-1.18.06%E2%80%AFPM.png?resize=669%2C502&ssl=1

tripleo, 11 hours ago to random

All you nutcases still using #Perl, what's actually wrong with it?

aka What are the sharp edges?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

mjgardner, 56 seconds ago

@tripleo I would also be remiss not to mention #Perl's included perltrap manual page, which notes both the strict and warnings pragmas and also has nice lists of things for those coming from other #programming languages and tools like #AWK, #C and #CPlusPlus, #JavaScript, #sed, and #shell.

https://perldoc.perl.org/perltrap

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ovid, 11 hours ago to Lisp

#Perl, #Smalltalk, and #Lisp are three powerful programming languages that share a common feature.

Nobody knows how the hell to capitalize them.

#programming #software

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ovid, 11 hours ago

@tripleo I always have to look up the capitalization of Smalltalk because I get it wrong every time.

Hmm ... #JavaScript should be in that list.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ecmascript_news, 21 hours ago to javascript

ECMAScript 2025 feature: duplicate named capturing groups for regular expressions
@rauschma
https://2ality.com/2024/05/proposal-duplicate-named-capturing-groups.html

#ECMAScript #JavaScript

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

zalasur, 21 hours ago to javascript

It's been almost a decade since I've done a live coding stream. This will be fun!

Today I'll be migrating my website from React to Lit, which is a lightweight framework built around web components. I have the scaffolding set up mostly, so now it's time to get this done.

Come watch. Ask questions in chat! You don't need to create an account, just a username is needed to participate.

https://video.surazal.net/w/5S7FPXJMZh1i1eqZLY9mcV

#Javascript #Node #Development #PeerTube #Stream #Streaming

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mrfunkedude

ecmascript_news, 23 hours ago to javascript

Node v22.2.0 (current)
@targos @nodejs
https://nodejs.org/en/blog/release/v22.2.0

#ECMAScript #JavaScript #NodeJS

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ecmascript_news, 23 hours ago to javascript

esbuild v0.21.3: decorator metadata and more
@evanw
https://github.com/evanw/esbuild/releases/tag/v0.21.3

#ECMAScript #JavaScript #esbuild

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...