- home - photo - video - archivify
- blog (3 popular posts ⬇️)- the odds of running into someone you know in nyc - ii: donating to qri today is like buying bitcoin in 2010 - super free will - books, essays, and other readings - things - people who interest me - people with great websites - how to make (almost) anything [MIT 6.943]
Published January 7, 02020, Page Last Modified January 28, 02020
pip install beautifulsoup4
pip install archivenow
python archivify.py input.html
While the script works just fine, the Internet Archive sometimes throws errors when trying to archive a link (403 Forbidden, 502 Server Error: Bad Gateway for url). In these cases, the script doesn’t add an ‘(a)’.
import sys from bs4 import BeautifulSoup from archivenow import archivenow filename = sys.argv f_in = open(filename, "r") text = f_in.read() soup = BeautifulSoup(text.decode('UTF-8'), 'html.parser') for a in soup.find_all('a'): link = a.get('href') # Save link to Internet Archive archived_link = archivenow.push(link, "ia") print('archived_link') # If there was no error archiving if not archived_link.startswith('Error'): # Add ' (a)' with archived link after linked text archived_tag = soup.new_tag("a", href=archived_link) archived_tag.append("a") a.insert_after(")") a.insert_after(archived_tag) a.insert_after(" (") # Write contents to new file filename_out = filename.split('.') + '-archived.' + filename.split('.') f_out = open(filename_out, "w") f_out.write(soup.encode('UTF-8'))
Zuckerman, Andrew, "Archivify", January 7, 02020, http://andzuck.com/projects/archivify/
Get notified when I write new essays
Comments powered by Talkyard.