Scraping ======== **hircine** comes with a generic scraper interface that allows scraping comic metadata from virtually any source. A number of scrapers for common file formats and websites are :ref:`included ` in the base installation. Refer to :doc:`/plugins/index` if you want to write your own. Scraper sources --------------- Usually, a scraper will access a location on the web or a local file on your disk. The former may be an online API, whilst the latter may be a `JSON `_ file like `gallery-dl `_'s ``info.json``. For local files, two locations are considered. The comic's archive may contain this file, or it may be stored as sidecar file alongside the archive in the ``content/`` directory. .. _sidecar-files: Archive & sidecar files ^^^^^^^^^^^^^^^^^^^^^^^ Sidecar files need to be prefixed with the full name of the archive. For example, if a scraper accesses a file named ``info.json`` for an archive ``Hoshiiro GirlDrop Comic Anthology.zip``, the following locations will be considered: +----------+-------------------------------------------------------------+ | Location | Name | +==========+=============================================================+ | Archive | ``info.json`` | +----------+-------------------------------------------------------------+ | Sidecar | ``content/Hoshiiro GirlDrop Comic Anthology.zip.info.json`` | +----------+-------------------------------------------------------------+ .. note:: If a file exists in both locations, the sidecar file is preferred. .. _scraper-interface: Scraper interface ----------------- If a comic has scrapers available, they will be shown in the *Scrape* tab. Selecting the desired scraper and clicking on the *Scrape* button will start the scraping process. .. image:: /_images/scraper.jpg :align: center :alt: Scraping a comic. Once the scraper has returned results, they are shown in the pane below. Only results that differ from existing comic metadata will be displayed. Metadata that should not be kept may be deselected. For groups with a larger set of entries, the selection may be inverted to quickly deselect the whole group, or to only select a few entries. Pressing the *Merge* button will update the comic with the selected metadata. Options ^^^^^^^ By default, **hircine** does not automatically create missing metadata entries. This can be controlled using the *Create missing items* option. .. note:: Scrapers always return :term:`qualified tags ` (the namespace is set to ``none`` if it could not be determined). When requested to create a missing qualified tag, the namespace and tag will be created (if needed), and the tag will be marked as applicable to the namespace. A qualified tag is considered to be missing if any of the following apply: 1. The namespace does not exist. 2. The tag does not exist. 3. The tag is not applicable to the namespace. Modifying scraper results ------------------------- **hircine** allows modifying results that are returned by a scraper without having to change the scraper logic. Refer to the documentation on :doc:`/plugins/index` for more.