diff options
Diffstat (limited to 'docs/usage/scraping.rst')
-rw-r--r-- | docs/usage/scraping.rst | 90 |
1 files changed, 90 insertions, 0 deletions
diff --git a/docs/usage/scraping.rst b/docs/usage/scraping.rst new file mode 100644 index 0000000..37bae98 --- /dev/null +++ b/docs/usage/scraping.rst @@ -0,0 +1,90 @@ +Scraping +======== + +**hircine** comes with a generic scraper interface that allows scraping comic +metadata from virtually any source. A number of scrapers for common file +formats and websites are :ref:`included <builtin-scrapers>` in the base +installation. Refer to :doc:`/plugins/index` if you want to write your own. + + +Scraper sources +--------------- + +Usually, a scraper will access a location on the web or a local file on your +disk. The former may be an online API, whilst the latter may be a `JSON +<https://www.json.org/json-en.html>`_ file like `gallery-dl +<https://github.com/mikf/gallery-dl>`_'s ``info.json``. + +For local files, two locations are considered. The comic's archive may contain +this file, or it may be stored as sidecar file alongside the archive in the +``content/`` directory. + +.. _sidecar-files: + +Archive & sidecar files +^^^^^^^^^^^^^^^^^^^^^^^ + +Sidecar files need to be prefixed with the full name of the archive. For +example, if a scraper accesses a file named ``info.json`` for an archive +``Hoshiiro GirlDrop Comic Anthology.zip``, the following locations will be +considered: + ++----------+-------------------------------------------------------------+ +| Location | Name | ++==========+=============================================================+ +| Archive | ``info.json`` | ++----------+-------------------------------------------------------------+ +| Sidecar | ``content/Hoshiiro GirlDrop Comic Anthology.zip.info.json`` | ++----------+-------------------------------------------------------------+ + +.. note:: + + If a file exists in both locations, the sidecar file is preferred. + +.. _scraper-interface: + +Scraper interface +----------------- + +If a comic has scrapers available, they will be shown in the *Scrape* tab. +Selecting the desired scraper and clicking on the *Scrape* button will start +the scraping process. + +.. image:: /_images/scraper.jpg + :align: center + :alt: Scraping a comic. + +Once the scraper has returned results, they are shown in the pane below. Only +results that differ from existing comic metadata will be displayed. + +Metadata that should not be kept may be deselected. For groups with a larger +set of entries, the selection may be inverted to quickly deselect the whole +group, or to only select a few entries. Pressing the *Merge* button will update +the comic with the selected metadata. + +Options +^^^^^^^ + +By default, **hircine** does not automatically create missing metadata entries. +This can be controlled using the *Create missing items* option. + +.. note:: + + Scrapers always return :term:`qualified tags <qualified tag>` (the namespace + is set to ``none`` if it could not be determined). When requested to create + a missing qualified tag, the namespace and tag will be created (if needed), + and the tag will be marked as applicable to the namespace. + + A qualified tag is considered to be missing if any of the following apply: + + 1. The namespace does not exist. + 2. The tag does not exist. + 3. The tag is not applicable to the namespace. + + +Modifying scraper results +------------------------- + +**hircine** allows modifying results that are returned by a scraper without +having to change the scraper logic. Refer to the documentation on +:doc:`/plugins/index` for more. |