fooderick, to random

What is a good way of crawling and archiving a complete website for offline viewing? One of the services that has a lot of data that is important to me is shutting down without any options for archiving my data. It has a pretty JavaScript heavy UI and is protected with a login page that includes MFA. Ideally I'd be able to save it to a nice format like WARC.

morgandawn, to Hololive

PIN 1

I love (my corner of is the part that reads & writes & edits ). Think

Anyhow here is a post of gratitude to all the who showed up, stayed & did the work.

archiving & documenting them on , the fan-run

older & current fans for our

rescuing & our

links to in the replies

1/n

danie10, to random
@danie10@mastodon.social avatar

What’s the Best Way to Store Data for Decades or Centuries? Bottom line: No Technology that is Easy or Practical

The concern keeps coming up (I’ve also been pondering it a lot and posted in this last week what I’ve been doing).

This linked article does sum up the essentials very well, and this helps illustrate why this is a challenge for 20 or 60 years or especially ...continues

See https://gadgeteer.co.za/whats-the-best-way-to-store-data-for-decades-or-centuries-bottom-line-no-technology-that-is-easy-or-practical/

#archiving #technology

jasonnab, to programming


Is there any tool to download public posts and retoots and reply toots of an inactive Mastodon account? I forgot to backup my posts before migrating, and I don't know a method that doesn't involve disabling inactivity on the account (and not sure what that will do).

james, to mastodon
@james@dice.camp avatar

Some folks on mastodon delete posts after a period, sometimes for privacy sometimes to save server space.

Is there a nice way to download a thread/archive it?

I don’t want to distribute them, I think there’s two cases:

1 I like having an archive of stuff I’ve said so I can look at it years later.
2 people have good advice/essays and I’d like to read them in the future.

#mastodon #selfhosted #scripting #archiving #datahoarder

BBCRadio4, to DoctorWho

Matthew Sweet tells the extraordinary story of Doctor Who's hiatus between 1989 and 2005.

Doctor Who: The Wilderness Years

https://bbc.in/3QOIRZv

#DoctorWho #TV #ScienceFiction #drama #serial

BBCRadio4,

What happened to the 108 episodes of Doctor Who that were wiped in the 1960s?

An investigation, on BBC Sounds

https://bbc.in/3sQDJvW

#DoctorWho #TV #ScienceFiction #drama #serial #archiving

jasonnab, to music

#MUSIC #IDENTIFICATION HELP WANTED!
I've finally got transferred an open reel I bought from a recycling shop in #Toronto , ON, many many years ago.
Unfortunately I have no idea the artist, album, or song names!
Do you recognize any of the music, vocals or instrumentation in this sample I've prepared?

  • Four tracks total
  • Written on the box: NEON Rough Mixes
  • AMPEX 456 Master Reel

Thanks!

#Preservation #archival #archiving #digitization #reelToReel #musicID #Help #askFedi #rock #folk #ID

stefan, to VideoGames
@stefan@stefanbohacek.online avatar

"In the in-depth study that was published in partnership with the Software Preservation Network, it was revealed that just 13% of all games released before 2010 are commercially or readily available today."

https://insider-gaming.com/study-classic-games-unavailable

#videogames #archiving #history #preservation

amberage, to random
@amberage@eldritch.cafe avatar

So uh. Best software to rip DVDs? I've tried with VLC, but it spent half an hour going through the entire 2 hour movie and then rendered only the 8 second intro to file 😬

I don't need the whole menu and all, but I need to be able to get the video, the right audio track, and the right subtitle track. I've got a bunch of old DVDs here, 10+ years old and sometimes more, that I'd like to archive before bitrot sets in.

#archiving

sqrtminusone, to orgmode
@sqrtminusone@emacs.ch avatar

I've got an question.

I have an org file for a long-running project. It's getting hard to manage because there are lots of different tasks, events, etc.

I think I want to create an "archive version" of that file, which would have the same structure but store items, say, with a timestamp older than 2 months. That would require two basic steps:

  • extracting a subtree from the original file;
  • merging the extracted subtree into the archived version.

I could implement that, but I wonder if there is any existing way for that? Or some other approach that would address the same issue?

sqrtminusone,
@sqrtminusone@emacs.ch avatar

Thanks Amy @grinn for pointing me to the necessary pieces of org-refile! It would have taken much longer to figure out otherwise.

I've made a function that org-refiles the entry at point into "archive/<file-name>.org", preserving the header structure. I only had to implement creating nonexistent headers because `org-refile' can create just one level out-of-the-box.

And another function that performs that operation on all entries found by `org-ql'.

The code is here: https://sqrtminusone.xyz/configs/emacs/#archiving-records

voltagex, to random
@voltagex@aus.social avatar

The #Xbox360 Store goes offline next year. Get #archiving, folks.

kdecherf, to random
remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

#AI #GenerativeAI #HistoricalPreservation #Archiving #DataProtection #Cybersecurity #Privacy: "The National Archives and Records Administration (NARA) told employees Wednesday that it is blocking access to ChatGPT on agency-issued laptops to “protect our data from security threats associated with use of ChatGPT,” 404 Media has learned.

“NARA will block access to commercial ChatGPT on NARANet [an internal network] and on NARA issued laptops, tablets, desktop computers, and mobile phones beginning May 6, 2024,” an email sent to all employees, and seen by 404 Media, reads. “NARA is taking this action to protect our data from security threats associated with use of ChatGPT.”

The move is particularly notable considering that this directive is coming from, well, the National Archives, whose job is to keep an accurate historical record. The email explaining the ban says the agency is particularly concerned with internal government data being incorporated into ChatGPT and leaking through its services."

https://www.404media.co/national-archives-bans-employee-use-of-chatgpt/

cdfinder, to music
@cdfinder@techhub.social avatar

Peter Gabriel has released his new #album i/o, and we love this #music so much!

Peter is very interested in long term archival storage of audio data, and we have another connection we cannot disclose here.

Anyway, the new album is really great, please buy it from Bandcamp so the artist actually gets paid for it:

https://petergabriel.bandcamp.com/album/i-o

#audio #HighResAudio #ProgRock #PeterGabriel #Archiving #MusicLibrary #NeoFinder #macOS

aradiacollective, to archive
danie10, to music
@danie10@mastodon.social avatar

What’s the Value of 3 Million LPs in a Digital World? Easy! They can be Played still in 50+ Years’ Time!

The ARChive of Contemporary Music has one of the largest collections of vinyl records in the world and is in danger of losing its home. Its champions are making a case for the future of physical media.

If someplace like a university starts a digitization p ...continues

See https://gadgeteer.co.za/whats-the-value-of-3-million-lps-in-a-digital-world-easy-they-can-be-played-still-in-50-years-time/

#archiving #music #technology #vinyl

g3om4c, to ai

Harvard Library Innovation Lab: WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI

"...an open-source, highly-customizable Retrieval Augmented Generation tool the web archiving community can use to explore the intersection between web and . WARC-GPT allows for creating custom chatbots that use a set of files as their knowledge base, letting users explore collections through conversation." 👏

https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/

mhucka, to conservative

Occasional reminder that the Internet Archive provides a number of tools and browser plugins to let you send pages to the Wayback Machine (as well as check if a given page has been saved):

https://help.archive.org/help/save-pages-in-the-wayback-machine/

#InternetArchive #Archiving #WebArchiving #Preservation

ErikJonker, to AdobePhotoshop Dutch
@ErikJonker@mastodon.social avatar
geffrey, to markdown
@geffrey@pkm.social avatar

I (and ChatGPT) have written a script that exports tasks from MacOS’ Things 3 to Markdown and to todo.txt.

Soon, I'll post the script and a guide on https://plaintextjournal.com.

Let me know you if you’d like to be kept in the loop.

#plaintext #things3 #archiving #todotxt #markdown

a, to internet
@a@paperbay.org avatar

Checked my 6,921 bookmarks on Pinboard.in: 3,462 hit dead ends with 404s or expired domains, and many of the 3,459 left show fake content or parking pages. Only 21% from the last 2 years still work as expected. The lifespan of URLs is definitely shrinking.

Luckily some (~20%) are archived @internetarchive

What are we actually doing to avoid this?

#internet #archiving #archive #decay

remixtures, to random Portuguese
@remixtures@tldr.nettime.org avatar

#Archiving #AcademicPublishing #DigitalPreservation: "When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn't appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)

The good news is that large academic publishers appear to be reasonably good about getting things into archives; most of the unarchived issues stem from smaller publishers.

Eve acknowledges that the study has limits, primarily in that there may be additional archives he hasn't checked. There are some prominent dark archives that he didn't have access to, as well as things like Sci-hub, which violates copyright in order to make material from for-profit publishers available to the public. Finally, individual publishers may have their own archiving system in place that could keep publications from disappearing."

https://arstechnica.com/science/2024/03/study-finds-that-we-could-lose-science-if-publishers-go-bankrupt/

realsimon, (edited ) to random German
@realsimon@mastodon.green avatar

I'd like to help archiving websites.

With the wayback machine you can only save a single web page but not recursively the whole website, right?

What would be the best way to do it locally?

wget does the trick but I haven't figured out how to also download all media with it.

WebHTTrack is weird.

#archiving #waybackmachine

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • lostlight
  • All magazines