haturatu / cuckooget

Rust (36.31%)
Python (31.96%)
Other (31.74%)

.gitignore

dag wrap

2024-09-24T04:29:07+09:00

Cargo.lock

first commit

2024-09-22T11:37:46+09:00

Cargo.toml

dag wrap

2024-09-24T04:29:07+09:00

LICENSE

first commit

2024-09-22T11:37:46+09:00

README.md

dag wrap

2024-09-24T04:29:07+09:00

async_web_mirror.py

dag wrap

2024-09-24T04:29:07+09:00

main.py

fix typo

2024-09-22T16:47:18+09:00

requirements.txt

first commit

2024-09-22T11:37:46+09:00

src

dag wrap

2024-09-24T04:29:07+09:00

cuckooget

What

A very fast website copy script using a cuckoo hash table & xxhash & DAG. There are still many problems.
I feel sad about disappearing websites, and I’m thinking of ways to save them even faster.

Websites are our memories.
Let everyone rise up and preserve disappearing historical websites, leaving them for the future.
For all geeks and for those who love the internet. If you find an interesting website, please contact me.

Furthermore, with the -w option, you can set higher priorities based on the URL. I don't think other website mirroring software has this feature.

Collisions are avoided by the cuckoo hash table and generated by the ultra-fast xxhash.
It consists of xxh32 and xxh64 as different hash values.

Install

deps

pip install maturin
pip install -r requirements.txt

You can build the CuckooHashtables implemented in Rust and install it using pip. This will allow you to call it from your Python code. If you prefer not to install it globally, you can also install it from within a virtual environment.

maturin build
pip install target/wheels/your_package_name.whl

chmod +x main.py

or

pip install target/wheels/your_package_name.whl --force-reinstall

chmod +x main.py

Usage

python3 ./main.py

usage: main.py [-h] [-c CONNECTIONS] [-w WEIGHTS [WEIGHTS ...]]
               [-v EXCLUDE [EXCLUDE ...]]
               url output_dir
BSD 3-Clause License

Copyright (c) 2024, haturau

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

クローン


このレポジトリについて

xxhashとcuckoohashtablesを使った高速なサイトのミラーを作成する

0

1

0


最終コミット

dag wrap
2024-09-24T04:29:07+09:00

リリース

作成中・・・

寄付

作成中・・・