joe,

A few weeks back, I thought about getting an AI model to return the “Flavor of the Day” for a Culver’s location. If you ask Llama 3:70b “The website https://www.culvers.com/restaurants/glendale-wi-bayside-dr lists “today’s flavor of the day”. What is today’s flavor of the day?”, it doesn’t give a helpful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.29.28%E2%80%AFPM.png?resize=1024%2C690&ssl=1

If you ask ChatGPT 4 the same question, it gives an even less useful answer.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.33.42%E2%80%AFPM.png?resize=1024%2C782&ssl=1

If you check the website, today’s flavor of the day is Chocolate Caramel Twist.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.41.21%E2%80%AFPM.png?resize=1024%2C657&ssl=1

So, how can we get a proper answer? Ten years ago, when I wrote “The Milwaukee Soup App”, I used the Kimono (which is long dead) to scrape the soup of the day. You could also write a fiddly script to scrape the value manually. It turns out that there is another option, though. You could use Scrapegraph-ai. ScrapeGraphAI is a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites, documents, and XML files. Just say which information you want to extract and the library will do it for you.

Let’s take a look at an example. The project has an official demo where you need to provide an OpenAI API key, select a model, provide a link to scrape, and write a prompt.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-12.35.29%E2%80%AFPM.png?resize=1024%2C660&ssl=1

As you can see, it reliably gives you the flavor of the day (in a nice JSON object). It will go even further, though because if you point it at the monthly calendar, you can ask it for the flavor of the day and soup of the day for the remainder of the month and it can do that as well.

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-1.14.43%E2%80%AFPM.png?resize=1024%2C851&ssl=1

Running it locally with Llama 3 and Nomic

I am running Python 3.12 on my Mac but when you run pip install scrapegraphai to install the dependencies, it throws an error. The project lists the prerequisite of Python 3.8+, so I downloaded 3.9 and installed the library into a new virtual environment.

Let’s see what the code looks like.

You will notice that just like in yesterday’s How to build a RAG system post, we are using both a main model and an embedding model.

So, what does the output look like?

https://i0.wp.com/jws.news/wp-content/uploads/2024/05/Screenshot-2024-05-09-at-2.28.10%E2%80%AFPM.png?resize=1024%2C800&ssl=1

At this point, if you want to harvest flavors of the day for each location, you can do so pretty simply. You just need to loop through each of Culver’s location websites.

Have a question, comment, etc? Please feel free to drop a comment, below.

https://jws.news/2024/how-to-use-ai-to-make-web-scraping-easier/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ai
  • ngwrru68w68
  • rosin
  • modclub
  • Youngstown
  • khanakhh
  • Durango
  • slotface
  • mdbf
  • cubers
  • GTA5RPClips
  • kavyap
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • JUstTest
  • osvaldo12
  • tester
  • tacticalgear
  • ethstaker
  • Leos
  • thenastyranch
  • everett
  • normalnudes
  • anitta
  • megavids
  • cisconetworking
  • provamag3
  • lostlight
  • All magazines