paper{s}pace documentation
Table of Contents
Features
Start page
Document view
Works with your existing files
paper{s}pace works with your existing file structure. You only need to point it to your documents folder and paper{s}pace will do its job. It doesn’t matter in which subfolder the file exists; as long as it is supported, it will be indexed and searchable within a couple of seconds.
Full-text search for your files
After paper{s}pace has found your files, it will recognize the available text, and you can search for the content through its full-text search. Combine this with your tags, and you will never miss an important document again.
Tagging your PDFs
Adding tags to your files is as easy as entering a name. paper{s}pace will manage all tags on its own. No need for complicated tag management.
If you wish, you can also add additional information to your tags, and paper{s}pace will automatically apply these tags to your documents.
Ingest from your email provider
paper{s}pace can watch your mail inboxes for incoming documents and automatically ingest attachments with the information of the email.
If the email contains a PDF and passes all filters, the attachment along with the email content is stored in paper{s}pace.
Reminds you of upcoming tasks
With the ability to have a special type of document, called a task, paper{s}pace will remind you by email one day in advance that you have to take action.
Simple edit functionality
Currently only supported for PDFs
A simple way to re-sort, rotate, or delete pages.
Supported file formats
- Images (jpg/png)
Installation
There are two ways of installing or using paper{s}pace. You can install it on a server by hand or use Docker for an easy installation. Only Docker is officially supported.
Docker
The easiest installation method is checking out or downloading one of the samples in examples and starting the application with docker-compose.
There are two files:
- complete-with-ftp.yml
Includes the whole application and an additional FTP server, which you can point your document scanner to and then upload documents directly into paper{s}pace - minimal-docker-compose.yml
Includes only the application
First decide which one you want. Copy it to your hard drive and rename it to docker-compose.yml. Then configure the properties you need and want in the environment block.
OCR_LANGUAGE: '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]>'
APPLICATION_HOST: 'REPLACE WITH HOSTNAME OR IP' #what is the domain for paper{s}pace. Defaults to http://localhost:8080
OCR_LANGUAGE: '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]> '
ENABLE_MAIL: '<SHOULD WE SEND MAILS? [true|false]> '
MAIL_TO_ADDRESS: '<WHO SHOULD RECEIVE MAILS [string]> '
MAIL_FROM_ADDRESS: '<THE SENDER OF THE MAILS [string]> '
MAIL_ATTACH_DOCUMENTS: '<SHOULD WE ATTACH THE DOCUMENTS TO THE MAIL? [true|false]> '
MAILING_HOST: '<YOUR MAIL HOST THE APPLICATION SHOULD CONNECT TO. [string]> '
MAILING_PORT: '<THE PORT FOR YOUR MAIL HOST [integer]> '
MAILING_PROTOCOL: '<WHICH MAIL PROTOCOLL SHOULD WE USE [smtp|pop3]> '
MAILING_SMTP_AUTH: '<SHOULD WE AUTHENTICATE? [true|false]> '
MAILING_SMTP_USE_STARTTLS: '<SHOULD WE USE STARTSL? [true|false]> '
MAILING_USERNAME: '<WHICH USER SHOULD CONNECT TO YOUR MAIL PROVIDER? [string]> '
MAILING_PASSWORD: '<THE PASSWORD OF THE MAIL USER [true|false]> '
CHECK_MAIL: 'SHOULD WE CHECK INCOMING EMAILS [true/false] default: false'
MAIL_IMAP_HOST: 'REPLACE WITH IMAP SERVER [string]'
MAIL_IMAP_PORT: 'REPLACE WITH IMAP SERVER PORT [int] default: 993'
MAIL_IMAP_USERNAME: 'REPLACE WITH IMAP USERNAME [string]'
MAIL_IMAP_PASSWORD: 'REPLACE WITH IMAP PASSWORD [string]'
MAIL_IMAP_FOLDERS: 'REPLACE WITH IMAP FOLDERS TO WATCH [string] default: "INBOX'
MAIL_IMAP_MARK_AS_READ: 'AFTER PROCESSING, SHOULD WE MARK THE MAIL AS READ [true|false] default: "false'
When you don’t want paper{s}pace to send out emails at all, remove everything that starts with MAIL_ and MAILING_, and either set ENABLE_MAIL: ‘false’ or remove that key.
If you have chosen complete-with-ftp.yml, you also have to set the environment variable PASV_ADDRESS to the IP address or hostname where the FTP server is available. For example, PASV_ADDRESS:192.168.1.111, and probably change the password in VSFTPD_USER_1: ‘scanner:password:9876:’. If you want to change the username of the FTP user, please also change the mount point under volumes. For a complete list of configuration options, please go to wildscamp/vsftpd.
After you have made the changes to the configuration, you can start paper{s}pace with Docker Compose. Open a terminal, navigate to the folder where your docker-compose.yml resides, and execute:
docker-compose up -d
This will start paper{s}pace locally, and you can open it by navigating your browser to http://localhost:8080.
Using existing documents
In the sample configurations, we work with a named volume. If you already have a folder with your documents you want to use, you have to mount this into the container. paper{s}pace expects the following folders to be writable for the user with the UID 9876.
/storage/tasks #default location for task documents
/storage/documents #default location for documents
/storage/binary #storage location for preview images of documents
/storage/database #location of the database file
To work with your existing folders, you have to change the volumes section in the docker-compose.yml under the service api, and if you have chosen the compose file with the FTP server, you also have to change the mount point in the ftp service.
Let’s assume your current documents reside under /home/paperspace/documents/ and your tasks will be stored under /home/paperspace/tasks/. Then you have to change the volumes section in your docker-compose.yml like this:
version: "3.4"
services:
api:
...
volumes:
- paperspace:/storage
- /home/paperspace/tasks/:/storage/tasks
- /home/paperspace/documents:/storage/documents
...
ftp:
...
volumes:
- /home/paperspace:/home/virtual/scanner/data
Bare Metal Installation from Source
Install required software
This tutorial assumes a system based on Ubuntu 20.04 LTS. If you are running a different distribution, please adapt the commands.
All following commands assume that your documents are in German. If you have a different language, please change deu to your language code. For example, for Spanish, instead of installing tesseract-ocr-deu you would install tesseract-ocr-spa.
sudo apt-get install tesseract-ocr tesseract-ocr-deu openjdk-11-jdk-headless git npm python3-pip && sudo pip3 install stapler
Fetch the latest source code
git clone https://gitlab.com/dedicatedcode/paperspace.git
Install Solr
cd /opt
sudo wget https://archive.apache.org/dist/lucene/solr/8.3.1/solr-8.3.1.tgz
sudo tar xzf solr-8.3.1.tgz solr-8.3.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-8.3.1.tgz
Create the Solr data directory referenced in search/config/conf/core.properties. You can change this to your preferred location.
sudo mkdir -p /data/solr/
sudo chown solr:solr /data/solr
Copy the Solr configuration:
sudo cp -r paperspace/search/config/conf /var/solr/data/core_documents
Restart Solr:
sudo systemctl restart solr.service
Build and install paper{s}pace
Install app
- Build the app
cd paperspace/api && ./gradlew build
- Create user, folder, and copy the app
sudo useradd -M -s /bin/false paperspace
sudo mkdir /opt/paperspace-app
sudo cp paperspace/api/build/libs/api.jar /opt/paperspace-app/app.jar
sudo chown -R paperspace:paperspace /opt/paperspace-app
- Create application.properties with this content
database.location=database/paperspace.db
search.host=localhost
search.port=8983
ocr.language=deu
app.host=http://localhost:8080
ocr.datapath=/usr/share/tesseract-ocr/4.00/tessdata
storage.folder.tasks=/storage/tasks
storage.folder.documents=/storage/documents
storage.folder.binaries=binary
# REMOVE IF YOU DONT WANT TO USE MAILS ###
email.enabled=true
email.target-address=
email.sender-address=
email.attach_documents=false
spring.mail.host=
spring.mail.port=
spring.mail.protocol=smtp
spring.mail.test-connection=false
spring.mail.properties.mail.smtp.auth=true
spring.mail.properties.mail.smtp.starttls.enable=true
spring.mail.username=
spring.mail.password=
# REMOVE IF YOU DONT WANT TO CHECK MAILS FOR PDFS ###
email.imap.enabled=true
email.imap.host=
email.imap.port=
email.imap.username=
email.imap.password=
email.imap.folders=INBOX
email.imap.mark_as_read=false
Adjust the ocr.language property to your language code and also the app.host property to where the application is reachable. Replace /storage/tasks and /storage/documents with the paths to your documents. If you want to have paper{s}pace email you about upcoming tasks or new documents, fill the properties starting with email and spring.mail. If you don’t need emailing, simply delete the whole block from application.properties. After this, you should have the /opt folder populated like this:
/opt/paperspace-app/
├── app.jar
└── application.properties
- Create service files
sudo tee <<EOF /etc/systemd/system/paperspace-app.service >/dev/null
[Unit]
Description=paperspace-app
After=syslog.target
[Service]
User=paperspace
WorkingDirectory=/opt/paperspace-app
ExecStart=java -jar /opt/paperspace-app/app.jar
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target
EOF
Reload systemd configurations:
sudo systemctl daemon-reload
- Start services
sudo systemctl enable paperspace-app
sudo systemctl start paperspace-app
If everything is set up, the app should now be accessible at http://localhost:8080 and should greet you with an empty result. You can start dropping PDFs into the documents or tasks folder now.
Update
If you are using the bare-metal installation, please make sure that you are always using the latest Solr schema files. Otherwise, some features will not work or may break. Copy the Solr configuration:
sudo cp -r paperspace/search/config/conf /var/solr/data/core_documents
Restart Solr:
sudo systemctl restart solr.service
Usage
Most of the functionality paper{s}pace provides should be self-explanatory. Below are explanations for a few features.
Search
The search field in the top-right corner is your main entry point for finding your documents. You can search with a simple string like invoice, and this will return all documents that contain the word invoice in either the document text, the description, or the title. If you tag your documents, you can narrow the results down by clicking one or multiple tags in the left area.
You can prefix a word with + or -. This will include or exclude results with the given search term. For example, searching for invoice +2020 will return every document that contains the words invoice and 2020. Searching for invoice -2020 will return every document that contains the word invoice but not 2020.
Tag management
If you open the page management > tags, you can add search patterns to your tags. If one of the incoming documents contains the string you entered, the tag will automatically be attached to the document.
You can add multiple patterns to a tag separated by ,. If you want to negate a pattern, add a ! before it. For example:
- 2024,2025 will match any documents that contain the words 2024 and 2025.
- Daniel,!Daniela will match any documents that contain the word Daniel but must not contain the word Daniela.
Email management
If you open the page management > email filters, you can adjust the processing of incoming emails.
Subject filter (whitelist)
Here you can add a regex that is matched against the subject of the email. If any are set here, at least one has to match to consider this email for further processing.
Attachment filter (blacklist)
If the email passed the subject filter and has attachments, it will be handled by paper{s}pace; however, sometimes there is not only the invoice attached to the email but also other PDFs.
With this filter, you can filter out unwanted types of attachments.
FAQ
I don’t see the edit button on my documents?
Please make sure that Stapler is available for the user who runs your installation. At the moment, only PDF editing is enabled.