freshcrate
Home > Frameworks > waybackpy

waybackpy

Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.

Description

<!-- markdownlint-disable MD033 MD041 --> <div align="center"> <img src="https://raw.githubusercontent.com/akamhy/waybackpy/master/assets/waybackpy_logo.svg"><br> <h3>A Python package & CLI tool that interfaces with the Wayback Machine API</h3> </div> <p align="center"> <a href="https://github.com/akamhy/waybackpy/actions?query=workflow%3ATests"><img alt="Unit Tests" src="https://github.com/akamhy/waybackpy/workflows/Tests/badge.svg"></a> <a href="https://codecov.io/gh/akamhy/waybackpy"><img alt="codecov" src="https://codecov.io/gh/akamhy/waybackpy/branch/master/graph/badge.svg"></a> <a href="https://pypi.org/project/waybackpy/"><img alt="pypi" src="https://img.shields.io/pypi/v/waybackpy.svg"></a> <a href="https://pepy.tech/project/waybackpy?versions=2*&versions=1*&versions=3*"><img alt="Downloads" src="https://pepy.tech/badge/waybackpy/month"></a> <a href="https://app.codacy.com/gh/akamhy/waybackpy?utm_source=github.com&utm_medium=referral&utm_content=akamhy/waybackpy&utm_campaign=Badge_Grade_Settings"><img alt="Codacy Badge" src="https://api.codacy.com/project/badge/Grade/6d777d8509f642ac89a20715bb3a6193"></a> <a href="https://github.com/akamhy/waybackpy/commits/master"><img alt="GitHub lastest commit" src="https://img.shields.io/github/last-commit/akamhy/waybackpy?color=blue&style=flat-square"></a> <a href="#"><img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/waybackpy?style=flat-square"></a> <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> </p> --- # <img src="https://github.githubassets.com/images/icons/emoji/unicode/2b50.png" width="30"></img> Introduction Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs. Wayback Machine has 3 client side APIs. - SavePageNow or Save API - CDX Server API - Availability API These three APIs can be accessed via the waybackpy either by importing it from a python file/module or from the command-line interface. ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f3d7.png" width="20"></img> Installation **Using [pip](https://en.wikipedia.org/wiki/Pip_(package_manager)), from [PyPI](https://pypi.org/) (recommended)**: ```bash pip install waybackpy ``` **Using [conda](https://en.wikipedia.org/wiki/Conda_(package_manager)), from [conda-forge](https://anaconda.org/conda-forge/waybackpy) (recommended)**: See also [waybackpy feedstock](https://github.com/conda-forge/waybackpy-feedstock), maintainers are [@rafaelrdealmeida](https://github.com/rafaelrdealmeida/), [@labriunesp](https://github.com/labriunesp/) and [@akamhy](https://github.com/akamhy/). ```bash conda install -c conda-forge waybackpy ``` **Install directly from [this git repository](https://github.com/akamhy/waybackpy) (NOT recommended)**: ```bash pip install git+https://github.com/akamhy/waybackpy.git ``` ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f433.png" width="20"></img> Docker Image Docker Hub: [hub.docker.com/r/secsi/waybackpy](https://hub.docker.com/r/secsi/waybackpy) Docker image is automatically updated on every release by [Regulary and Automatically Updated Docker Images](https://github.com/cybersecsi/RAUDI) (RAUDI). RAUDI is a tool by [SecSI](https://secsi.io), an Italian cybersecurity startup. ## <img src="https://github.githubassets.com/images/icons/emoji/unicode/1f680.png" width="20"></img> Usage ### As a Python package #### Save API aka SavePageNow ```python >>> from waybackpy import WaybackMachineSaveAPI >>> url = "https://github.com" >>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0" >>> >>> save_api = WaybackMachineSaveAPI(url, user_agent) >>> save_api.save() https://web.archive.org/web/20220118125249/https://github.com/ >>> save_api.cached_save False >>> save_api.timestamp() datetime.datetime(2022, 1, 18, 12, 52, 49) ``` #### CDX API aka CDXServerAPI ```python >>> from waybackpy import WaybackMachineCDXServerAPI >>> url = "https://google.com" >>> user_agent = "my new app's user agent" >>> cdx_api = WaybackMachineCDXServerAPI(url, user_agent) ``` ##### oldest ```python >>> cdx_api.oldest() com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 >>> oldest = cdx_api.oldest() >>> oldest com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 >>> oldest.archive_url 'https://web.archive.org/web/19981111184551/http://google.com:80/' >>> oldest.original 'http://google.com:80/' >>> oldest.urlkey 'com,google)/' >>> oldest.timestamp '19981111184551' >>> oldest.datetime_timestamp datetime.datetime(1998, 11, 11, 18, 45, 51) >>> oldest.statuscode '200' >>> oldest.mimetype 'text/html' ``` ##### newest ```python >>> newest = cdx_api.newest() >>> newest com,google)/ 20220217234427 http://@google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 563 >>> newe

Release History

VersionChangesUrgencyDate
3.0.6Imported from PyPI (3.0.6)Low4/21/2026
3.0.5## What's Changed * undo drop python3.6 by @akamhy in https://github.com/akamhy/waybackpy/pull/163 **Full Changelog**: https://github.com/akamhy/waybackpy/compare/3.0.4...3.0.5 [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/3.0.5/v3.0.5.zip/download)Low2/18/2022
3.0.4## What's Changed * Move metadata from __init__.py into setup.cfg by @eggplants in https://github.com/akamhy/waybackpy/pull/153 * add sort param support in CDX API class by @akamhy in https://github.com/akamhy/waybackpy/pull/156 * Add sort, use_pagination and closest by @akamhy in https://github.com/akamhy/waybackpy/pull/158 * Cdx based oldest newest and near by @akamhy in https://github.com/akamhy/waybackpy/pull/159 **Full Changelog**: https://github.com/akamhy/waybackpy/compare/3.0.3.Low2/18/2022
3.0.3## What's Changed * Dropped Python 3.4 to 3.6, both inclusive. * Catch 429 and 509 status code for save page now API * Increase the default CDX limit from 5000 to 25000 records per API call. * Added type hint * The package will now close the sessions explicitly. * Removed useless code. * Added docstrings. ## New Contributors * @eggplants made their first contribution in https://github.com/akamhy/waybackpy/pull/124 * @deepsource-autofix made their first contribution in https://github.Low2/9/2022
3.0.2Nothing changed wrt to the previous version but creating a release for Conda forge. Replace the NON-ASCII character figlet with ASCII character figlet. see https://github.com/conda-forge/staged-recipes/pull/17643 [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/3.0.2/v3.0.2.zip/download)Low1/25/2022
3.0.1## What's Changed * escape '.' before 'archive.org' by @akamhy in https://github.com/akamhy/waybackpy/pull/112 * Update setup.py by @rafaelrdealmeida in https://github.com/akamhy/waybackpy/pull/114 * do not use f-strings in setup.py by @akamhy in https://github.com/akamhy/waybackpy/pull/115 ## New Contributors * @rafaelrdealmeida made their first contribution in https://github.com/akamhy/waybackpy/pull/114 See also https://github.com/conda-forge/staged-recipes/pull/17634 and https://Low1/25/2022
3.0.0## What's Changed - 3 different APIs have now 3 different classes, WaybackMachineCDXServerAPI, WaybackMachineSaveAPI and WaybackMachineAvailabilityAPI. - CLI now supports the CDX API. - The past Url class will be continued to be supported, don't need to worry that your old code will break. - Get is now deprecated, it was a bad idea even trying to add tasks meant for urllib. **Full Changelog**: https://github.com/akamhy/waybackpy/compare/2.4.4...3.0.0Low1/18/2022
2.4.4- When the response code is 509, raise an error with an explanation (based on the actual error message contained in the response HTML). - Fix typo [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.4.4/v2.4.4.zip/download)Low9/3/2021
2.4.3- Fix redirect issues with HTTP and HTTPS redirection - More stable archiving [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.4.3/v2.4.3.zip/download)Low4/2/2021
2.4.2- added CLI Arg --file, if this Arg is not used with known URLs than waybackpy will not save the output URLs in file. - added cached_save flag on waybackpy URL object, if the returned saved archive is older than 3 mins the flag is true else false. - BUG FIX : the CLI --json arg was not returning valid JSON instead JSON loaded python dict. This is now fixed. [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.4.2/v2.4.2.zip/Low1/24/2021
2.4.1- Change str repr of cdxsnapshot to cdx line - Support unix ts as an arg in near - Don't fetch more pages if >=2 pages are empty, Pagination API - Don't use pagination API if total pages <= 2 - The Cdx method get() now gets the last fetched archive by default [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.4.1/v2.4.1.zip/download)Low1/12/2021
2.4.0- Cdx API now fully supported [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.4.0/v2.4.0.zip/download)Low1/10/2021
2.3.3- Added support for querying CDX Pagination API - Cdx class is publicly available to be used in third party code. - Some methods of Url now used Cdx Pagination API [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.3.3/v2.3.3.zip/download)Low1/4/2021
2.3.2- Better error messages for CLI users. - FIXED BUG: removed code from __init__ that was fetching availability API without instruction. [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.3.2/v2.3.2.zip/download)Low1/2/2021
2.3.1- Fixed bug: Url.__init__() was making unnecessary requests to the availability checking API. [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.3.1/v2.3.1.zip/download)Low1/1/2021
2.3.0- Now using requests package instead of urllib.request. requests package is bettor for handling unusual redirects and other issues. - Now using threading for checking live URLs. - Improve code quality and formatting. - And now we also have a new cool logo. - Docs are no longer hosted on readthedocs, but https://akamhy.github.io/waybackpy/ [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.3.0/v2.3.0.zip/download)Low12/13/2020
2.2.0Changes: - Added `archive_url` and `--archive_url` in the wrapper and CLI respectively. This just is an alias for the `newest` method. - All the return types of archive URLs are not strings anymore but instance of the Url class. - Added `JSON` and `--json` in the wrapper and CLI respectively. Used to read the API response of the avialiblity API. - the `len()` method on Url objects will now return the age of the archive. [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-buttonLow10/17/2020
2.1.92.1.9 [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.1.9/v2.1.9.zip/download)Low10/2/2020
2.1.81) New feature - known urls list 2) Updated Readme [![Download waybackpy](https://a.fsdn.com/con/app/sf-download-button)](https://sourceforge.net/projects/waybackpy/files/2.1.8/v2.1.8.zip/download)Low10/2/2020
2.1.7New regex added for parsing the archive URL.Low8/9/2020
2.1.6- fix issues with cliLow7/24/2020
2.1.5- minor bug fixes Low7/24/2020
2.1.4- removed duplicate method which should improve the error handlingLow7/23/2020
2.1.3- Support CLI - Code refactoring - bug fixes - better exceptionsLow7/22/2020
2.1.2- Minor bug fixes. - Updated index.rst - 2 new test introducedLow7/20/2020
2.1.1- Minor bug fixes - Example replit links changed to my account.Low7/19/2020
2.1.0- Updates for recent API changes - Updated documentationLow7/19/2020
2.0.2Release 2.0.2Low7/18/2020
2.0.1No Time out for final save() try.Low7/18/2020
2.0.0OOP basedLow7/18/2020
v1.6Release v1.6Low5/7/2020
v1.4Release v1.4Low5/5/2020
v1.3Release v1.3Low5/5/2020
v1.2support for get() fix bug with near()Low5/5/2020
v1.1First release of waybackpy !Low5/4/2020

Dependencies & License Audit

Loading dependencies...

Similar Packages

ctranslate2Fast inference engine for Transformer models4.7.1
orbax-checkpointOrbax Checkpoint0.11.36
tensorflowTensorFlow is an open source machine learning framework for everyone.2.21.0
pre-commitA framework for managing and maintaining multi-language pre-commit hooks.v4.6.0
azure-core-tracing-opentelemetryMicrosoft Azure Azure Core OpenTelemetry plugin Library for Pythonazure-template_0.1.0b6187637