Big Structural Overhaul #8

Open
opened 9 months ago by zocker · 0 comments
Owner

These changes are co-dependent. Missing out one is either impossible or makes the other changes or itself later on unreasonable higher. A complete rewrite might be the best solution. It feels harder to introduce them all at once, but it is (most probable) easier in the long term.

Concept

The database needs a big change, which needs to introduce following changes:

  • ExId support (some is written)
    • URIs become ExIds as well
  • Table with extraction state for (exid, extractor) with properties
    • last updated successfully
    • last time tried
    • failed last time (bool)
    • needs explicit retry (e.g. requested by user, aside of normal refreshing)
    • block further retries (e.g. because deleted)
    • error message
  • Uniform MediaElement / MediaCollection / Tag to a single table
    • move properties to their own table
      • storing their source ((extractor, exid)?)
      • storing last change date
      • allowing multiple entries if from different sources
    • how to differentiate elements & collections from tags?
      • collections have sources, tags not
      • collections may have an order (see below)
      • elements have playable uris/exids
  • Assignments of elements need
    • source (extractor; manual / confirmed)
      • extractor data can be confirmed by hand
      • manual/confirmed can be disabled (not deleted!) by extractors for later review
    • order for collections
      • seasons will be their own element in the future to avoid duplicated orders
    • tags are like properties, collections are solely defined by a single “number deciding the order of all children of that parent” representation (be it manual/extracted)
  • extractor overhaul
    • allow plugins/extractors DB caching against exid/uris
    • new differentiation, but can be grouped together (for cache):
      • metadata extractor: collects properties & tag assignments
      • collection extractor: builds tree of elements (e.g. show, seasons, episodes)
        • can define objects as fully defined (saying this one extractor is its only source; to allow for object removals & detection of duplicates)
      • watch extractor: decides, if & how a element can be watched
    • remove aggregated type as that can be achieved directly with recursion
  • regular DB checks resulting in reports (failures of these indicate a software bug)
    • are caches up to date? (if not, do fix)
    • check if URIs can become proper ExIds (ask metadata extractors & Wikidata)
    • in parent-child element relations
      • different children not having the same order number (unless its 0)
      • either all children or no children have order number 0
      • depending on relation type, order must be 0
      • per relation type, detect cycles
      • find non-sensible multiple relations types
        • consists & part-of (equal & reversed)
        • consists & blocking (equal & reversed)
      • detect blocking cycles (consists with in-order & blocking combined)
      • detect collections with childs from different sources which are declared as fully defined
    • in sibling element relations
      • avoid same & reversed sibling relations (by id1 < id2)
      • links with likeness 0 (make sense with overwriting likeness)
      • TODO discuss; both elements aren’t tags on relation 1 (otherwise they may be merged)

Steps

  • check other ORMs for how they are working different from PonyORM
    • need not to move away from Pony now, but keep way open
    • but if others seem way better, consider migrating away now
    • check if implicit transaction handling is supported by others as well
  • branch off with a second testing setup with a fully clean DB
  • write & use interfaces for everything
    • DB manager protocol which handles creation & saving (adding to sessions)
      • if possible, make it mostly independent from the ORM used
    • translate concepts from here to Protocols
    • migrate extractors to Protocols (make them independent from DB)
    • migrate templates to Protocols
  • find a way for Flask request handling code to be mostly DB independent
  • migrate Flask request handling to Protocols
  • make more steps
  • run database checks with reporting regularly
  • fix bugs which make database checks fail

Database Migration

Gladly, I’m the only one known to run this system. These changes need also to be reflected in the database for existing items:

  • URI to ExId migration (to .uri type, everything afterwards should be checked regularly)
  • MediaElement / MediaCollection / Tag to Elements
  • remap tagkey & mediaelement_mediaelement against new ids
Rationals on all existing tables

Rational on all existing tables

Table migration? Reasoning
collectionurimapping yes URIs to ExIds
element_lookup_cache no needs overhaul afterwards
mediacollection yes property unification & element unification
mediacollectionlink yes to assignments
mediacollection_tag yes to assignments
mediaelement yes to assignments
mediaelement_mediaelement TODO to assignments
mediaelement_tag yes to assignments
mediathumbnail no gets only linked by unified properties
mediathumbnailcache no see mediathumbnail
mediaurimapping yes to assignments
tag yes property unification & element unification
tagkey partly only needs relinking against unified tags
tag_tag yes to assignments
**These changes are co-dependent. Missing out one is either impossible or makes the other changes or itself later on unreasonable higher. A complete rewrite might be the best solution. It feels harder to introduce them all at once, but it is (most probable) easier in the long term.** ## Concept The database needs a big change, which needs to introduce following changes: - ExId support (*some is written*) - URIs become ExIds as well - Table with extraction state for `(exid, extractor)` with properties - last updated successfully - last time tried - failed last time (bool) - needs explicit retry (e.g. requested by user, aside of normal refreshing) - block further retries (e.g. because deleted) - error message - Uniform MediaElement / MediaCollection / Tag to a single table - move properties to their own table - storing their source (`(extractor, exid)`?) - storing last change date - allowing multiple entries if from different sources - *how to differentiate elements & collections from tags?* - collections have sources, tags not - collections may have an order (see below) - elements have **playable** uris/exids - Assignments of elements need - source (extractor; manual / confirmed) - extractor data can be confirmed by hand - manual/confirmed can be disabled (not deleted!) by extractors for later review - order for collections - seasons will be their own element in the future to avoid duplicated orders - tags are like properties, collections are solely defined by a single “number deciding the order of all children of that parent” representation (be it manual/extracted) - extractor overhaul - allow plugins/extractors DB caching against exid/uris - new differentiation, but can be grouped together (for cache): - metadata extractor: collects properties & tag assignments - collection extractor: builds tree of elements (e.g. show, seasons, episodes) - can define objects as fully defined (saying this one extractor is its only source; to allow for object removals & detection of duplicates) - watch extractor: decides, if & how a element can be watched - remove aggregated type as that can be achieved directly with recursion - regular DB checks resulting in **reports** (failures of these indicate a software bug) - are caches up to date? (if not, do fix) - check if URIs can become proper ExIds (ask metadata extractors & Wikidata) - in parent-child element relations - different children not having the same order number (unless its 0) - either all children or no children have order number 0 - depending on relation type, order must be 0 - per relation type, detect cycles - find non-sensible multiple relations types - consists & part-of (equal & reversed) - consists & blocking (equal & reversed) - detect blocking cycles (consists with in-order & blocking combined) - detect collections with childs from different sources which are declared as fully defined - in sibling element relations - avoid same & reversed sibling relations (by id1 < id2) - ~~links with likeness 0~~ (make sense with overwriting likeness) - *TODO discuss*; both elements aren’t tags on relation 1 (otherwise they may be merged) ## Steps - [ ] check other ORMs for how they are working different from PonyORM - need not to move away from Pony now, but keep way open - but if others seem way better, consider migrating away now - check if implicit transaction handling is supported by others as well - [ ] branch off with a second testing setup with a fully clean DB - [ ] write & use interfaces for everything - [ ] DB manager protocol which handles creation & saving (adding to sessions) - if possible, make it mostly independent from the ORM used - [ ] translate concepts from here to Protocols - [ ] migrate extractors to Protocols (make them independent from DB) - [ ] migrate templates to Protocols - [ ] find a way for Flask request handling code to be mostly DB independent - [ ] migrate Flask request handling to Protocols - [ ] ***make more steps*** - [ ] run database checks with reporting regularly - [ ] fix bugs which make database checks fail ## Database Migration Gladly, I’m the only one known to run this system. These changes need also to be reflected in the database for existing items: - [ ] URI to ExId migration (to `.uri` type, everything afterwards should be checked regularly) - [ ] MediaElement / MediaCollection / Tag to Elements - [ ] remap tagkey & mediaelement_mediaelement against new ids <details> <summary>Rationals on all existing tables</summary> ### Rational on all existing tables | Table | migration? | Reasoning | | --- | --- | --- | | collectionurimapping | yes | URIs to ExIds | | element_lookup_cache | no | needs overhaul afterwards | | mediacollection | yes | property unification & element unification | | mediacollectionlink | yes | to assignments | | mediacollection_tag | yes | to assignments | | mediaelement | yes | to assignments | | mediaelement_mediaelement | **TODO** | to assignments | | mediaelement_tag | yes | to assignments | | mediathumbnail | no | gets only linked by unified properties | | mediathumbnailcache | no | see mediathumbnail | | mediaurimapping | yes | to assignments | | tag | yes | property unification & element unification | | tagkey | partly | only needs relinking against unified tags | | tag_tag | yes | to assignments | </details>
zocker self-assigned this 9 months ago
zocker added this to the Releasable Alpha State milestone 9 months ago
zocker changed title from Big Database Overhaul to Big Structural Overhaul 8 months ago
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: zocker/streamlined#8
Loading…
There is no content yet.