|
||
---|---|---|
src | ||
LICENSE | ||
README.md | ||
generate |
README.md
wibu-rss
⚠ The regex used to parse HTML content may break if the site owners decided to radically change their content layout for some reason. If it happens, let me know.
Generates RSS feeds from the following Indonesian otaku/weeb news sites:
- www.risamedia.com
- kanau.org
- mediaformasi.com, they have their own RSS feed. Well, this doesn't hurt.
- tirto.id, not actually an otaku site, but hope it's helpful.
If you just want to consoom the feeds, you can get them here.
Tools
shup
, a POSIX HTML parser (here)curl
sed
,grep
,awk
- and other tools from the GNU Core Utilities.
Usage
The entry point is ./generate
which will source files/functions inside the src
folder. Outputs are in the feeds
folder (will be created by the script if doesn't exist).
Notes:
- Since each site has different structures (wacky HTML + JavaScript), each will have their unique file/function.
- The actual xml file isn't reverse chronological, but most RSS readers are smart enough to sort them reverse chronologically.
Before running a cronjob:
- Make sure to tell cron the path to
shup
, e.g. by definingPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
, or wherevershup
is installed, in the crontab. - Customize/edit the
dir
variable ingenerate
to make the directory absolute.