From d1d654ebac2d51e3841675faeb56480e440f622f Mon Sep 17 00:00:00 2001
From: Wolfgang Müller
Date: Tue, 5 Mar 2024 18:08:09 +0100
Subject: Initial commit

---
 docs/usage/scraping.rst | 90 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)
 create mode 100644 docs/usage/scraping.rst

(limited to 'docs/usage/scraping.rst')
diff --git a/docs/usage/scraping.rst b/docs/usage/scraping.rst
new file mode 100644
index 0000000..37bae98
--- /dev/null
+++ b/docs/usage/scraping.rst
@@ -0,0 +1,90 @@
+Scraping
+========
+
+**hircine** comes with a generic scraper interface that allows scraping comic
+metadata from virtually any source. A number of scrapers for common file
+formats and websites are :ref:`included <builtin-scrapers>` in the base
+installation. Refer to :doc:`/plugins/index` if you want to write your own.
+
+
+Scraper sources
+---------------
+
+Usually, a scraper will access a location on the web or a local file on your
+disk. The former may be an online API, whilst the latter may be a `JSON
+<https://www.json.org/json-en.html>`_ file like `gallery-dl
+<https://github.com/mikf/gallery-dl>`_'s ``info.json``.
+
+For local files, two locations are considered. The comic's archive may contain
+this file, or it may be stored as sidecar file alongside the archive in the
+``content/`` directory.
+
+.. _sidecar-files:
+
+Archive & sidecar files
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Sidecar files need to be prefixed with the full name of the archive. For
+example, if a scraper accesses a file named ``info.json`` for an archive
+``Hoshiiro GirlDrop Comic Anthology.zip``, the following locations will be
+considered:
+
++----------+-------------------------------------------------------------+
+| Location | Name                                                        |
++==========+=============================================================+
+| Archive  | ``info.json``                                               |
++----------+-------------------------------------------------------------+
+| Sidecar  | ``content/Hoshiiro GirlDrop Comic Anthology.zip.info.json`` |
++----------+-------------------------------------------------------------+
+
+.. note::
+
+   If a file exists in both locations, the sidecar file is preferred.
+
+.. _scraper-interface:
+
+Scraper interface
+-----------------
+
+If a comic has scrapers available, they will be shown in the *Scrape* tab.
+Selecting the desired scraper and clicking on the *Scrape* button will start
+the scraping process.
+
+.. image:: /_images/scraper.jpg
+   :align: center
+   :alt: Scraping a comic.
+
+Once the scraper has returned results, they are shown in the pane below. Only
+results that differ from existing comic metadata will be displayed.
+
+Metadata that should not be kept may be deselected. For groups with a larger
+set of entries, the selection may be inverted to quickly deselect the whole
+group, or to only select a few entries. Pressing the *Merge* button will update
+the comic with the selected metadata.
+
+Options
+^^^^^^^
+
+By default, **hircine** does not automatically create missing metadata entries.
+This can be controlled using the *Create missing items* option.
+
+.. note::
+
+   Scrapers always return :term:`qualified tags <qualified tag>` (the namespace
+   is set to ``none`` if it could not be determined). When requested to create
+   a missing qualified tag, the namespace and tag will be created (if needed),
+   and the tag will be marked as applicable to the namespace.
+
+   A qualified tag is considered to be missing if any of the following apply:
+
+   1. The namespace does not exist.
+   2. The tag does not exist.
+   3. The tag is not applicable to the namespace.
+
+
+Modifying scraper results
+-------------------------
+
+**hircine** allows modifying results that are returned by a scraper without
+having to change the scraper logic. Refer to the documentation on
+:doc:`/plugins/index` for more.
-- 
cgit v1.2.3-2-gb3c3