paper{s}pace

This is the documentation for version 2.+ of paper{s}pace. Please find the current version under latest documentation

Documentation

Table of Contents

Features

main view of paper{s}pace

works with your existings files

paper{s}pace does indead work with your existing file structure. You only need to point it to your documents folder and paper{s}pace will do it's job. It doesn't matter in which subfolder the file exists, as long as it is supported it will be indexed and searchable in a couple of seconds.

fulltext searching your files

After paper{s}pace has found your files, it will recognize the available text and you can search for the content through it's full text search. Combine this with your tags and you will never miss an importand document again.

tagging your pdf's

Adding tags to your files is as easy as entering a name. paper{s}pace will manage all tags on its own. No need for a complicated tag management.

reminds you of upcoming tasks

With the ability to have a special type of document, called a task. paper{s}pace will remind you per email one day in advance that you have to take action.

supported file formats

  • pdf
  • images (pdf/png)

Usage

Most of the functionality paper{s}pace provides should be pretty self explainatory. I will only explain some of the features.

Search

The search field in the top right corner provides your main entry point in finding your documents. You can search by a simple string like invoice and this will deliver you all documents which contain the word invoice in either the document text, the description or the title. If you tagged your documents you can narrow the results down by clicking on one or multiple of the tags in the left area.

You can prefix a word with + or -. This will include or exclude results with the given search term. For example searching for invoice +2020 will return every document which contains the words invoice AND 2020. Searching for invoice -2020 will return every document which contains the word invoice BUT NOT 2020

Installation

Attention: There is no upgrade path from version 1 of paper{s}pace. Please uninstall all previous installations.

There are two ways of installing or using paper{s}pace. You can install it on a server by hand or use docker for an easy way of installing paper{s}pace. Only docker is officially supported.

Docker

The easiest installation method is checking out or downloading one of the samples in examples and start the application with docker-compose.

There are two files:

complete-with-ftp.yml
Includes the whole application and an additional ftp server which you can point your document scanner to and then upload documents directly into paper{s}pace
minimal-docker-compose.yml
Includes only the application

First decide which one you want. Copy it to your hard drive and rename it to docker-compose.yml Then configure the properties you need and want in the environment block.

OCR_LANGUAGE: '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]>'
APPLICATION_HOST: 'REPLACE WITH HOSTNAME OR IP' #what is the domain for paper{s}pace. Defaults to http://localhost:8080
ENABLE_MAIL:  '<ENABLE MAILS? [true|false]> '
MAIL_TO_ADDRESS:  '<WHO SHOULD RECEIVE MAILS [string]> '
MAIL_FROM_ADDRESS:  '<THE SENDER OF THE MAILS [string]> '
MAIL_ATTACH_DOCUMENTS:  '<SHOULD WE ATTACH THE DOCUMENTS TO THE MAIL? [true|false]> '
MAILING_HOST:  '<YOUR MAIL HOST THE APPLICATION SHOULD CONNECT TO. [string]> '
MAILING_PORT:  '<THE PORT FOR YOUR MAIL HOST [integer]> '
MAILING_PROTOCOL:  '<WHICH MAIL PROTOCOLL SHOULD WE USE [smtp|pop3]> '
MAILING_SMTP_AUTH:  '<SHOULD WE AUTHENTICATE? [true|false]> '
MAILING_SMTP_USE_STARTTLS:  '<SHOULD WE USE STARTSL? [true|false]> '
MAILING_USERNAME:  '<WHICH USER SHOULD CONNECT TO YOUR MAIL PROVIDER? [string]> '
MAILING_PASSWORD:  '<THE PASSWORD OF THE MAIL USER [true|false]> '
OCR_LANGUAGE:  '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]> '

When you don't want paper{s}pace to send out emails at all, remove everything which starts with MAIL_ and MAILING_ and either set ENABLE_MAIL: 'false' or remove that key.

If you have choosen complete-with-ftp.yml you also have to set the environment variable PASV_ADDRESS to the ip adress or hostname the ftp is available. For example PASV_ADDRESS:192.168.1.111 and propably change the password in VSFTPD_USER_1: 'scanner:password:9876:'. if you want to change also the username of the ftp user, please also change the mount point under volumes. For a complete list of configuration options please go to wildscamp/vsftpd.

After you have made the changes to the configuration you can start paper{s}pace with docker compose. Simply open a terminal, navigate to the folder your docker-compose.yml resides and execute

docker-compose up -d

This will start paper{s}pace locally and you can open it by navigating your browser to http://localhost:8080

using existing documents

In the sample configurations we work with a named volume. If you already have a folder with your documents you want to use, you have to mount this into the container. paper{s}pace expects the following folders to be writeable for the user with the uid 9876.

/storage/tasks #default location for task documents
/storage/documents #default location for documents
/storage/binary #storage location for preview images of documents
/storage/database #location of the database file

To work with your existing folders you have to change the volumes section in the docker-compose.yml under the service api and if you have choosen the compose file with the ftp server you also have to change the mount point at the service ftp

Let's assume your current documens resides under /home/paperspace/documents/ and your tasks will be stored under /home/paperspace/tasks/ Then you have to change the section volumes in your docker-compose.yml like this

version: "3.4"
services:
  api:
  ...
    volumes:
      - paperspace:/storage
      - /home/paperspace/tasks/:/storage/tasks
      - /home/paperspace/documents:/storage/documents
  ...
  ftp:
    ...
    volumes:
      -  /home/paperspace:/home/virtual/scanner/data

Bare Metal Installation from Source

Install required software

This tutorial assumes a system based on Ubuntu 20.04 TLS. If you are running on a different distribution, please adapt the commands.

all following commands assume that your documents are in german. If you have a different language, please change deu into your language code. For example spanish, instead of installing tesseract-ocr-deu you would install tesseract-ocr-spa

sudo apt-get install tesseract-ocr tesseract-ocr-deu openjdk-11-jdk-headless git npm && pip install stapler

fetch latest source code

git clone https://gitlab.com/dedicatedcode/paperspace.git

Install Solr

cd /opt
sudo wget https://archive.apache.org/dist/lucene/solr/8.3.1/solr-8.3.1.tgz
sudo tar xzf solr-8.3.1.tgz solr-8.3.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-8.3.1.tgz

create solr data directory referenced in search/config/conf/core.properties You can change this to your preferred place.

sudo mkdir -p /data/solr/
sudo chown solr:solr /data/solr

copy solr configuration

sudo cp -r paperspace/search/config/conf /var/solr/data/core_documents

restart solr

sudo systemctl restart solr.service

Build and install paper{s}pace

install app
  1. build app

    cd paperspace/api && ./gradlew build
  2. create user, folder and copy app

    sudo useradd -M -s /bin/false paperspace
    sudo mkdir /opt/paperspace-app
    sudo cp paperspace/api/build/libs/api.jar /opt/paperspace-app/app.jar
    sudo chown -R paperspace:paperspace /opt/paperspace-app
  3. create application.properties with this content

    database.location=database/paperspace.db
    search.host=localhost
    search.port=8983
    ocr.language=deu
    app.host=http://localhost:8080
    ocr.datapath=/usr/share/tesseract-ocr/4.00/tessdata
    storage.folder.tasks=/storage/tasks
    storage.folder.documents=/storage/documents
    storage.folder.binaries=binary
    ### REMOVE IF YOU DONT WANT TO USE MAILS ###
    email.enabled=true
    email.target-address=
    email.sender-address=
    email.attach_documents=false
    spring.mail.host=
    spring.mail.port=
    spring.mail.protocol=smtp
    spring.mail.test-connection=false
    spring.mail.properties.mail.smtp.auth=true
    spring.mail.properties.mail.smtp.starttls.enable=true
    spring.mail.username=
    spring.mail.password=

    Adjust the property ocr.language to your language code and also the property app.host where the application is reachable. Replace /storage/tasks and /storage/documents with the path to your documents. If you want to have paper{s}pace send you an email on upcoming tasks or new documents fill the properties starting with email and spring.mail. If you don't need emailing, simply delere the whole block from the application.properties. After this you should have the opt folder populatet like this:

    /opt/paperspace-app/
    ├── app.jar
    └── application.properties
  4. create service files

    sudo tee <<EOF /etc/systemd/system/paperspace-app.service >/dev/null
    [Unit]
    Description=paperspace-app
    After=syslog.target
    
    [Service]
    User=paperspace
    WorkingDirectory=/opt/paperspace-app
    ExecStart=java -jar /opt/paperspace-app/app.jar
    SuccessExitStatus=143
    
    [Install]
    WantedBy=multi-user.target
    EOF

    reload systemd configurations

    sudo systemctl daemon-reload
  5. start services

    sudo systemctl enable paperspace-app
    sudo systemctl start paperspace-app

    If everything is set up the app should be now accessible over http://localhost:8080 and should you greet with an empty result. You can start throwing PDFs into the documents or task folder now.