oliverandrich,
@oliverandrich@fosstodon.org avatar

Perfect, the combination of playwright, trafilatura and scrapinghub extracts all the information I need from a webpage. My little #python project evolves nicely.

oliverandrich,
@oliverandrich@fosstodon.org avatar

The screenshot shows the power of trafilatura. With a single statement it creates wonderful from any webpage it can extract content from. Extracting links, keeping the basic markup and structure, and dereference relative links.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • python
  • ethstaker
  • DreamBathrooms
  • tester
  • everett
  • khanakhh
  • magazineikmin
  • osvaldo12
  • thenastyranch
  • Youngstown
  • slotface
  • ngwrru68w68
  • kavyap
  • mdbf
  • cubers
  • provamag3
  • InstantRegret
  • anitta
  • Durango
  • tacticalgear
  • rosin
  • GTA5RPClips
  • modclub
  • cisconetworking
  • megavids
  • normalnudes
  • Leos
  • JUstTest
  • lostlight
  • All magazines