Building a Document Management System: Part 1

Im building a System to manage Documents accumulating in our company. Starting with internal (paper based) documents. We generate a few hundred of them every day.

The strategy is: scan them, throw the paper away, keep the scanned data forever. Then OCR it enough to get find out which document it actually is (see here, here and here, all in german) and drop it with appropriate metadata on a permanent store.

So how to construct that storage? It should be networked. It should be able to spread over several hard disks and servers. So we probably need a client-server architecture and a network protocol. We use http because this Web thingy is all around, totally rocks and was recently upgraded to Version 2.0. Seemingly RESTful application design and the Atom Publishing Protocol is the way to go for content, if you want to play with the cool kids.

All in all the Idea of Atom fits well with an document store: Atom is about Documents which have an Author, a publication date and so on.

So let’s get wild and just code away. We post new documents to /documents/:

POST /documents/ HTTP/1.1
Host: 127.0.0.1:8000
Content-type: text/plain
Content-Length: 9

blablafoo

If we do so we get back an Atom Document describing the just created entry:

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Document 31 (2007-06-21)</title>
  <id>tag:id.23.nameu,2007-05-01:/f/1a3...b73</id>
  <author><name>HUODORA DoDoStore</name></author>
  <link href="http://.../document/1...3/metadata.atom" rel="self"/>
  <entry>
    <title>Document 31 (2007-06-21)</title>
    <id>tag:id.23.nameu,2007-05-01:/e/1a3e270d7e8b73/</id>
    <published>2007-06-21T21:58:41Z</published>
    <link href="http://.../document/1a3...b73/" type="text/plain"/>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml" name="1a3...b73">
      </div>
    </content>
  </entry>
  <updated>2007-06-21T21:58:41Z</updated>
</feed>

Based on the Atom entry we now know where to request the document we have just posted:

$ curl http://127.0.0.1:8000/document/1a3...b73/
blablafoo

Viola! New documents are POSTed to /documents/ and afterwards you can get them from /document/{id}/.

Trackbacks/Pingbacks

  1. teenage mutant ninja hero coders » Blog Archive » Building a Track and Trace Application with CouchDB - 2008-12-28

    […] it with a self-designed Document store called DoDoStorage. For background on that project see here, here and […]

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s