- It uses the current directory to determine the location of your documents. If executed from another directory, it might access other files and store scans elsewhere.
- label the binder with the range of IDs it will contain (starting at 1 for the first binder & section)
- (optionally) each binder should have separators which you can label with the IDs they (will) contain
2. Create a directory on your computer for your scanned PDFs and copy the `./maintain.py` script into it
- make it executable with `chmod +x ./maintain.py`
- you can already create the sub directories for the categories (which can be nested as well), however you can also create them dynamically
3. Customize the configuration to your liking
- for now, the configuration is stored inside the script at the beginning
- especially configure the scan sources of your scanner (`USE_ADF_BY_DEFAULT` / `ADF_SCAN_SOURCE` / `FLATBET_SCAN_SOURCE`)
4. Migrate your old documents into the system by applying the steps below per document / page
- the order is irrelevant, I recommend in chronical order as future documents will be inserted in the same order
#### Example
- I decided to hold up to 300 pages (600 sides or IDs) per binder
- Each binder is separated by separators into 6 sections containing each 50 pages (100 sides or IDs)
- If a document will span multiple sections or binders but is separated, I will still insert it correctly because I honor my system more than the "integrity" of the document
### How To Add New Documents
I recommend to follow the steps one by one per page at the beginning so the order does not get messed up.
If you feel safe, you can start to batch the tasks if you insert multiple pages at once.
*I seperate each document into its pages so it can be scanned automatically if possible, I even remove staples if required.*
Per page:
1. Scan both sides with `./maintain.py scan`
- really either scan both sides or recall `./maintain.py scan` after each page because otherwise `scanimage` will mess up the IDs as it is not aware that the ID pairs should match front & back page
- with `--adf` you can force use [ADF][wiki-adf] and it will continue to scan all pages available
- with `--flatbed` you can force use the flatbed (e.g. for "special" documents)
- by default, it will automatically apply OCR and convert the documents to PDFs
- to speed up conversion of multiple pages by using parallel, add `--skip-convert` and execute `./maintain.py convert --output-commands | parallel` after scanning
- after scanning, you might remove empty back pages, the script will still select the next ID correctly (see `./maintain.py next-id`)
2. Add ring holes using a hole punch if required
- OR insert document into a plastic wrap with ring holes
3. Insert document into the latest binder at the end
- Check on the IDs assigned if you want to place the document behind the next separator or in the next binder
### How To Sort & Combine
Per default, the documents are called `outXXXX.jpg` or `outXXXX.png`.
If you want to add date & title to your document or sort it into a category,
you can use `./maintain.py merge --id <IDs>`.
`<IDs>` might be a comma separated list of IDs which can be
- a single ID, e.g. `123`
- a single ID with its counterpart ID (the other side), suffix `+`, e.g. `453+,88+` == `453,454,87,88`
- a single ID with its following page, suffix `++`, e.g. `869++` == `869+,871+` == `869,870,871,872`
- an ID range, start and end separated by `-`, e.g. `123-128` == `123,124,125,126,127,128`
- a suffix of `#` (compatible to `+`, `++` and ranges) will also select all "context pages", by default ±10 pages, e.g. `100#` == `90-110` (not useful for merge but for other commands)
The order of the IDs will determine the order of the pages later on. However for merge:
- single IDs will be completed to both sides so both sides end up in the same PDF at the end
- missing IDs will be ignored (so missing back pages might not cause any error)
You can append `--view` so you will see the resulting document to verify.
If you abort the process before answering the last question, e.g. by using `CTRL+C`, nothing will be changed.
First, it asks for the date of the document.
By default, the current date will be proposed.
By using the arrow keys, you can select one of all dates found inside the document.
Second, it asks for the title of the document.
To assist you, the most used words per page will be displayed above.
Because each side has its own ID and each document its own date, the titles are not required to be unique.
I even recommend using the same title for documents of the same kind.
At last, it asks you where to put the document to.
If one document was already sorted into a category, it will proposed.
You can browse through all categories using the arrow keys and search through them using `CTRL+R`.
Because no database is held, you can rename the files manually as well.
## My Use Case
I am kind of a perfectionist and a lazy person,
which resulted in that I throwed every paper document I received in a single ring binder.
I did not came up with a "perfect" list of categories and how to distribute them accross different binders to
- minimize space (binders) required to hold all (important) documents
- allow each category to allow all documents which I might receive in future
- be able to find a required document quickly
Also I want all my documents to be accessable on all my devices.
This is easy to accomplish with already digital documents,
however our world requires real paper documents, especially in Germany.
So I wanted to scan every document to be able to store them on my personal cloud.
However the documents there would also be required to be sorted approriatly.
At least the digital world has the advantage that resorting documents in new categories scales a lot better and might also be automatable.
But still, this would require me to keep both worlds, the analog and the digital world sorted which means more work.
To solve both problems in an easy way,
I introduced a system to "store" all my paper documents in binders so I only need to sort them in the digital world.
If I then might need the original paper document, I can search for the desired document on my computer and look up where it is stored.
## Other Projects
- see [Awesome-Selfhosted][awesome-selfhosted]
<!-- References (sorted alphabetically) -->
[awesome-selfhosted]: https://github.com/awesome-selfhosted/awesome-selfhosted#document-management= "Document Management on Awesome-Selfhosted"
[ocrmypdf-github]: https://github.com/jbarlow83/OCRmyPDF "OCRmyPDF on GitHub"