#Archiving - kbin.social

fooderick, 10 months ago to random

What is a good way of crawling and archiving a complete website for offline viewing? One of the services that has a lot of data that is important to me is shutting down without any options for archiving my data. It has a pretty JavaScript heavy UI and is protected with a login page that includes MFA. Ideally I'd be able to save it to a nice format like WARC.

#archiving #archivist #web #InternetArchive #scraping

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Need help on saving reddit threads (for post-blackout reasons) to Obsidian

AI-TRIGGER WARNING: I've asked ChatGPT to revise my writing because it was ass (writing a stream of coherent looking text is not my forte). Proceed at your own discretion....

morgandawn, 1 year ago to Hololive

PIN 1

I love #FandomHistory (my corner of #fandom is the part that reads & writes #fanfiction & edits #fanvids). Think #ArchiveOfOurOwn #AO3

Anyhow here is a post of gratitude to all the #volunteers who showed up, stayed & did the work.

#fanzine #digitizing #archiving

#fanvid archiving & documenting them on #fanlore, the fan-run #wiki

#interviewing older & current fans for our #OralHistory

rescuing & #preserving our #FanArt

#introduction #intro

links to #projects in the replies

1/n

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

danie10, 1 year ago to random

What’s the Best Way to Store Data for Decades or Centuries? Bottom line: No Technology that is Easy or Practical

The concern keeps coming up (I’ve also been pondering it a lot and posted in this last week what I’ve been doing).

This linked article does sum up the essentials very well, and this helps illustrate why this is a challenge for 20 or 60 years or especially ...continues

See https://gadgeteer.co.za/whats-the-best-way-to-store-data-for-decades-or-centuries-bottom-line-no-technology-that-is-easy-or-practical/

#archiving #technology

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

jasonnab, 1 year ago to programming

#MastodonHelp
Is there any tool to download public posts and retoots and reply toots of an inactive Mastodon account? I forgot to backup my posts before migrating, and I don't know a method that doesn't involve disabling inactivity on the account (and not sure what that will do).

#mastodonBackup #backups #archival #archiving #python #bash #shell #storage #datahoarder

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

james, 9 months ago to mastodon

Some folks on mastodon delete posts after a period, sometimes for privacy sometimes to save server space.

Is there a nice way to download a thread/archive it?

I don’t want to distribute them, I think there’s two cases:

1 I like having an archive of stuff I’ve said so I can look at it years later.
2 people have good advice/essays and I’d like to read them in the future.

#mastodon #selfhosted #scripting #archiving #datahoarder

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

BBCRadio4, 6 months ago to DoctorWho

Matthew Sweet tells the extraordinary story of Doctor Who's hiatus between 1989 and 2005.

Doctor Who: The Wilderness Years

https://bbc.in/3QOIRZv

#DoctorWho #TV #ScienceFiction #drama #serial

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ TexasObserver, CultureDesk, greenback

BBCRadio4, 6 months ago

What happened to the 108 episodes of Doctor Who that were wiped in the 1960s?

An investigation, on BBC Sounds

https://bbc.in/3sQDJvW

#DoctorWho #TV #ScienceFiction #drama #serial #archiving

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ miss_s_b, jann, greenback, AdamBishop +1 more

jasonnab, 9 months ago to music

#MUSIC #IDENTIFICATION HELP WANTED!
I've finally got transferred an open reel I bought from a recycling shop in #Toronto , ON, many many years ago.
Unfortunately I have no idea the artist, album, or song names!
Do you recognize any of the music, vocals or instrumentation in this sample I've prepared?

Four tracks total

Written on the box: NEON Rough Mixes

AMPEX 456 Master Reel

Thanks!

#Preservation #archival #archiving #digitization #reelToReel #musicID #Help #askFedi #rock #folk #ID

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

OC If you want to save the existing reddit content for future off-reddit use, you should get involved with Archiveteam

Archiveteam's Reddit project is working to save reddit content from the hungry maw of corporate destruction....

stefan, 11 months ago to VideoGames

"In the in-depth study that was published in partnership with the Software Preservation Network, it was revealed that just 13% of all games released before 2010 are commercially or readily available today."

https://insider-gaming.com/study-classic-games-unavailable

#videogames #archiving #history #preservation

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ botwiki

amberage, 3 months ago to random

So uh. Best software to rip DVDs? I've tried with VLC, but it spent half an hour going through the entire 2 hour movie and then rendered only the 8 second intro to file 😬

I don't need the whole menu and all, but I need to be able to get the video, the right audio track, and the right subtitle track. I've got a bunch of old DVDs here, 10+ years old and sometimes more, that I'd like to archive before bitrot sets in.

#archiving

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ TheOtterDragon

sqrtminusone, 1 month ago to orgmode

I've got an #orgmode question.

I have an org file for a long-running project. It's getting hard to manage because there are lots of different tasks, events, etc.

I think I want to create an "archive version" of that file, which would have the same structure but store items, say, with a timestamp older than 2 months. That would require two basic steps:

extracting a subtree from the original file;

merging the extracted subtree into the archived version.

I could implement that, but I wonder if there is any existing way for that? Or some other approach that would address the same issue?

#emacs

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

sqrtminusone, 1 month ago

Thanks Amy @grinn for pointing me to the necessary pieces of org-refile! It would have taken much longer to figure out otherwise.

I've made a function that org-refiles the entry at point into "archive/<file-name>.org", preserving the header structure. I only had to implement creating nonexistent headers because `org-refile' can create just one level out-of-the-box.

And another function that performs that operation on all entries found by `org-ql'.

The code is here: https://sqrtminusone.xyz/configs/emacs/#archiving-records

#emacs #orgmode

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ sachac

voltagex, 9 months ago to random

The #Xbox360 Store goes offline next year. Get #archiving, folks.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kdecherf, 5 months ago to random

🔗 Où il est question de conservation https://linuxfr.org/news/ou-il-est-question-de-conservation

#Archiving

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

remixtures, 1 month ago to ai Portuguese

#AI #GenerativeAI #HistoricalPreservation #Archiving #DataProtection #Cybersecurity #Privacy: "The National Archives and Records Administration (NARA) told employees Wednesday that it is blocking access to ChatGPT on agency-issued laptops to “protect our data from security threats associated with use of ChatGPT,” 404 Media has learned.

“NARA will block access to commercial ChatGPT on NARANet [an internal network] and on NARA issued laptops, tablets, desktop computers, and mobile phones beginning May 6, 2024,” an email sent to all employees, and seen by 404 Media, reads. “NARA is taking this action to protect our data from security threats associated with use of ChatGPT.”

The move is particularly notable considering that this directive is coming from, well, the National Archives, whose job is to keep an accurate historical record. The email explaining the ban says the agency is particularly concerned with internal government data being incorporated into ChatGPT and leaking through its services."

https://www.404media.co/national-archives-bans-employee-use-of-chatgpt/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ donwatkins

cdfinder, 6 months ago to music

Peter Gabriel has released his new #album i/o, and we love this #music so much!

Peter is very interested in long term archival storage of audio data, and we have another connection we cannot disclose here.

Anyway, the new album is really great, please buy it from Bandcamp so the artist actually gets paid for it:

https://petergabriel.bandcamp.com/album/i-o

#audio #HighResAudio #ProgRock #PeterGabriel #Archiving #MusicLibrary #NeoFinder #macOS

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ cdfinder

aradiacollective, 5 months ago to archive

MERRY CHRISTMAS, DID YOU SAY YOU WANTED AN ARCHIVAL PROJECT FOR THE SEASON?! :thinkergunsunglasses: :blobwizard: 🎄

Yes! Our "Aradia Archives" has finally arrived! Check it out here: https://aradiacollective.com/grimoire

Read more on our post here: https://aradiacollective.dreamwidth.org/840.html

Please enjoy!

#MagicalGirls #MagicalGirl #MagicalBoys #Archiving #Archive #comics #comic #webcomics #webcomic #Preservation

A screenshot of the main page of the archive. It features a sidebar full of pertinent links to categories and further explanations about the database. There is a small update text section with today's date!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Hika

danie10, 5 months ago to music

What’s the Value of 3 Million LPs in a Digital World? Easy! They can be Played still in 50+ Years’ Time!

The ARChive of Contemporary Music has one of the largest collections of vinyl records in the world and is in danger of losing its home. Its champions are making a case for the future of physical media.

If someplace like a university starts a digitization p ...continues

See https://gadgeteer.co.za/whats-the-value-of-3-million-lps-in-a-digital-world-easy-they-can-be-played-still-in-50-years-time/

#archiving #music #technology #vinyl

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

g3om4c, 3 months ago to ai

Harvard Library Innovation Lab: WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI

"...an open-source, highly-customizable Retrieval Augmented Generation tool the web archiving community can use to explore the intersection between web #archiving and #AI. WARC-GPT allows for creating custom chatbots that use a set of #web #archive files as their knowledge base, letting users explore collections through conversation." 👏

https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/ #webarchiving #digipres #digital

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jkramersmyth, gimulnautti

mhucka, 3 months ago to conservative

Occasional reminder that the Internet Archive provides a number of tools and browser plugins to let you send pages to the Wayback Machine (as well as check if a given page has been saved):

https://help.archive.org/help/save-pages-in-the-wayback-machine/

#InternetArchive #Archiving #WebArchiving #Preservation

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ travisfw

ErikJonker, 3 months ago to AdobePhotoshop Dutch

Preservation challenges in the digital age.
https://www.nature.com/articles/d41586-024-00616-5?s=09
#archiving #digital #preservation

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

geffrey, 6 months ago to markdown

I (and ChatGPT) have written a script that exports tasks from MacOS’ Things 3 to Markdown and to todo.txt.

Soon, I'll post the script and a guide on https://plaintextjournal.com.

Let me know you if you’d like to be kept in the loop.

#plaintext #things3 #archiving #todotxt #markdown

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rpallred

a, 3 months ago to internet

Checked my 6,921 bookmarks on Pinboard.in: 3,462 hit dead ends with 404s or expired domains, and many of the 3,459 left show fake content or parking pages. Only 21% from the last 2 years still work as expected. The lifespan of URLs is definitely shrinking.

Luckily some (~20%) are archived @internetarchive

What are we actually doing to avoid this?

#internet #archiving #archive #decay

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ liaizon, gverdun

remixtures, 3 months ago to random Portuguese

#Archiving #AcademicPublishing #DigitalPreservation: "When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.

At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn't appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)

The good news is that large academic publishers appear to be reasonably good about getting things into archives; most of the unarchived issues stem from smaller publishers.

Eve acknowledges that the study has limits, primarily in that there may be additional archives he hasn't checked. There are some prominent dark archives that he didn't have access to, as well as things like Sci-hub, which violates copyright in order to make material from for-profit publishers available to the public. Finally, individual publishers may have their own archiving system in place that could keep publications from disappearing."

https://arstechnica.com/science/2024/03/study-finds-that-we-could-lose-science-if-publishers-go-bankrupt/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

realsimon, 23 days ago (edited 23 days ago) to random German

I'd like to help archiving websites.

With the wayback machine you can only save a single web page but not recursively the whole website, right?

What would be the best way to do it locally?

wget does the trick but I haven't figured out how to also download all media with it.

WebHTTrack is weird.

#archiving #waybackmachine

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...