Need help on saving reddit threads (for post-blackout reasons) to Obsidian

AI-TRIGGER WARNING: I've asked ChatGPT to revise my writing because it was ass (writing a stream of coherent looking text is not my forte). Proceed at your own discretion.

Yes the emoji 's all on me, I've been too much influenced by Bing Chat lately---even ChatGPT took it out but then I pestered it to move it back.

Below this line it's all text that has been retouched by AI 😱:


Title: Archiving Reddit Threads During Protests: Suggestions Needed

Body:

Hello everyone,

As many of you are aware, numerous Reddit subreddits are temporarily closed due to the ongoing protest. While I completely support this action, it is causing some issues with my hobby research. Many posts are being deleted or replaced with placeholder scripts, leading to a loss of valuable information. Source: https://lemmy.ml/post/1259772

In an effort to address this, I have been using a script to save Reddit threads that I find interesting to my Personal Knowledge Management system: https://www.reddit.com/r/ObsidianMD/comments/104k0om/script_save_reddit_posts_to_obsidian/ . I have managed to successfully use it, but since I don't have a strong understanding of Ruby code 😅, I'm worried about its future functionality, especially if it depends on the Reddit API.

I recently discovered a thread discussing Reddit dumps: https://lemmy.nz/post/52092 . This discovery made me curious if it would be possible to modify the Ruby script to work with a local version of Reddit or even directly with the Reddit logs. To my understanding, these logs are in JSON format, but I haven't downloaded them yet.

Additionally, I've come across the concept of vector embeddings and a tool called Pinecone. Would it be more straightforward to use this tool to extract the necessary information, as opposed to manually searching through the data? Ideally, I would like to create a local search function, similar to Google, specifically for this dataset dump. However, I'm unsure of how to search a local database of Reddit submissions. I have found potential solutions such as Semantra and Qdrant, but I'm uncertain if these are the best tools for this task. Perhaps there is a more suitable option?

I will be honest, I don't have a strong background in technology, and this problem is proving to be quite complex. But I'm willing to tackle it. I would greatly appreciate any input or suggestions that you could provide.

Thank you in advance, everyone! 😊

trijste,

What about an intermediary, like one of the Chrome extensions that let you select text and create a note in your vault? Are you doing this sporadically, or automatically?

Yodadidas,

Can you see your 3 reply duplicates? If so, can you please delete them?

trijste,

What about an intermediary, like one of the Chrome extensions that let you select text and create a note in your vault? Are you doing this sporadically, or automatically?

grabyourmotherskeys,

Hi, I removed this and two other comments that were duplicates. I left the one that was replied to.

trijste,

Thanks!

trijste,

What about an intermediary, like one of the Chrome extensions that let you select text and create a note in your vault? Are you doing this sporadically, or automatically?

trijste,

What about an intermediary, like one of the Chrome extensions that let you select text and create a note in your vault? Are you doing this sporadically, or automatically?

dethb0y,

would be curious my self if there's some good way to do this

  • All
  • Subscribed
  • Moderated
  • Favorites
  • obsidianmd@lemmy.world
  • DreamBathrooms
  • everett
  • osvaldo12
  • magazineikmin
  • thenastyranch
  • rosin
  • normalnudes
  • Youngstown
  • Durango
  • slotface
  • ngwrru68w68
  • kavyap
  • mdbf
  • InstantRegret
  • JUstTest
  • ethstaker
  • GTA5RPClips
  • tacticalgear
  • Leos
  • anitta
  • modclub
  • khanakhh
  • cubers
  • cisconetworking
  • provamag3
  • megavids
  • tester
  • lostlight
  • All magazines