summaryrefslogtreecommitdiffstatshomepage
path: root/docs/usage/scraping.rst
diff options
context:
space:
mode:
authorWolfgang Müller2024-03-05 18:08:09 +0100
committerWolfgang Müller2024-03-05 19:25:59 +0100
commitd1d654ebac2d51e3841675faeb56480e440f622f (patch)
tree56ef123c1a15a10dfd90836e4038e27efde950c6 /docs/usage/scraping.rst
downloadhircine-0.1.0.tar.gz
Initial commit0.1.0
Diffstat (limited to 'docs/usage/scraping.rst')
-rw-r--r--docs/usage/scraping.rst90
1 files changed, 90 insertions, 0 deletions
diff --git a/docs/usage/scraping.rst b/docs/usage/scraping.rst
new file mode 100644
index 0000000..37bae98
--- /dev/null
+++ b/docs/usage/scraping.rst
@@ -0,0 +1,90 @@
+Scraping
+========
+
+**hircine** comes with a generic scraper interface that allows scraping comic
+metadata from virtually any source. A number of scrapers for common file
+formats and websites are :ref:`included <builtin-scrapers>` in the base
+installation. Refer to :doc:`/plugins/index` if you want to write your own.
+
+
+Scraper sources
+---------------
+
+Usually, a scraper will access a location on the web or a local file on your
+disk. The former may be an online API, whilst the latter may be a `JSON
+<https://www.json.org/json-en.html>`_ file like `gallery-dl
+<https://github.com/mikf/gallery-dl>`_'s ``info.json``.
+
+For local files, two locations are considered. The comic's archive may contain
+this file, or it may be stored as sidecar file alongside the archive in the
+``content/`` directory.
+
+.. _sidecar-files:
+
+Archive & sidecar files
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Sidecar files need to be prefixed with the full name of the archive. For
+example, if a scraper accesses a file named ``info.json`` for an archive
+``Hoshiiro GirlDrop Comic Anthology.zip``, the following locations will be
+considered:
+
++----------+-------------------------------------------------------------+
+| Location | Name |
++==========+=============================================================+
+| Archive | ``info.json`` |
++----------+-------------------------------------------------------------+
+| Sidecar | ``content/Hoshiiro GirlDrop Comic Anthology.zip.info.json`` |
++----------+-------------------------------------------------------------+
+
+.. note::
+
+ If a file exists in both locations, the sidecar file is preferred.
+
+.. _scraper-interface:
+
+Scraper interface
+-----------------
+
+If a comic has scrapers available, they will be shown in the *Scrape* tab.
+Selecting the desired scraper and clicking on the *Scrape* button will start
+the scraping process.
+
+.. image:: /_images/scraper.jpg
+ :align: center
+ :alt: Scraping a comic.
+
+Once the scraper has returned results, they are shown in the pane below. Only
+results that differ from existing comic metadata will be displayed.
+
+Metadata that should not be kept may be deselected. For groups with a larger
+set of entries, the selection may be inverted to quickly deselect the whole
+group, or to only select a few entries. Pressing the *Merge* button will update
+the comic with the selected metadata.
+
+Options
+^^^^^^^
+
+By default, **hircine** does not automatically create missing metadata entries.
+This can be controlled using the *Create missing items* option.
+
+.. note::
+
+ Scrapers always return :term:`qualified tags <qualified tag>` (the namespace
+ is set to ``none`` if it could not be determined). When requested to create
+ a missing qualified tag, the namespace and tag will be created (if needed),
+ and the tag will be marked as applicable to the namespace.
+
+ A qualified tag is considered to be missing if any of the following apply:
+
+ 1. The namespace does not exist.
+ 2. The tag does not exist.
+ 3. The tag is not applicable to the namespace.
+
+
+Modifying scraper results
+-------------------------
+
+**hircine** allows modifying results that are returned by a scraper without
+having to change the scraper logic. Refer to the documentation on
+:doc:`/plugins/index` for more.