.gitignore |
dag wrap |
2024-09-24T04:29:07+09:00 | |
Cargo.lock |
first commit |
2024-09-22T11:37:46+09:00 | |
Cargo.toml |
dag wrap |
2024-09-24T04:29:07+09:00 | |
LICENSE |
first commit |
2024-09-22T11:37:46+09:00 | |
README.md |
dag wrap |
2024-09-24T04:29:07+09:00 | |
async_web_mirror.py |
dag wrap |
2024-09-24T04:29:07+09:00 | |
main.py |
fix typo |
2024-09-22T16:47:18+09:00 | |
requirements.txt |
first commit |
2024-09-22T11:37:46+09:00 | |
src |
dag wrap |
2024-09-24T04:29:07+09:00 |
A very fast website copy script using a cuckoo hash table & xxhash & DAG. There are still many problems.
I feel sad about disappearing websites, and I’m thinking of ways to save them even faster.
Websites are our memories.
Let everyone rise up and preserve disappearing historical websites, leaving them for the future.
For all geeks and for those who love the internet. If you find an interesting website, please contact me.
Furthermore, with the -w
option, you can set higher priorities based on the URL. I don't think other website mirroring software has this feature.
Collisions are avoided by the cuckoo hash table and generated by the ultra-fast xxhash.
It consists of xxh32 and xxh64 as different hash values.
deps
pip install maturin
pip install -r requirements.txt
You can build the CuckooHashtables implemented in Rust and install it using pip. This will allow you to call it from your Python code. If you prefer not to install it globally, you can also install it from within a virtual environment.
maturin build
pip install target/wheels/your_package_name.whl
chmod +x main.py
or
pip install target/wheels/your_package_name.whl --force-reinstall
chmod +x main.py
python3 ./main.py
usage: main.py [-h] [-c CONNECTIONS] [-w WEIGHTS [WEIGHTS ...]]
[-v EXCLUDE [EXCLUDE ...]]
url output_dir
BSD 3-Clause License Copyright (c) 2024, haturau Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
xxhashとcuckoohashtablesを使った高速なサイトのミラーを作成する
0
1
0