|

WORLDOX®
Enterprise Document Manager® employs a two-tiered architecture
in the implementation of its back-end document profile database
technology. This approach combines the best aspects of a failure-resistant
distributed database, with the speed of access inherent in a centralized
data repository. Thus WORLDOX satisfies customer requirements for
quick, global access to document repositories, and provides the
added protection of a redundant, fault-tolerant database structure
as well.
Document
management software is designed coordinate and control the documents
created, maintained, and used within a firm or organization. Virtually
all electronic document management systems also offer additional
functionality as a ruleincluding version control, document
archiving, full-text indexing, content-based retrieval, network
mirroring, workflow, and so forth. But the heart and soul of document
management consists of cataloging and tracking documents.
This
means that the document manager must somehow extract or derive "information
about the documents," often referred to as document profiles
or metadata, keeping it separate from the information in
the documents themselves. It is critical to distinguish document
profile information from document content. The critical
role that documents serve within an organization cannot be over-stated.
Documents are a firm's intellectual assetsin much the same
way that staff are human assets. In order to manage employees effectively,
firms maintain human resource recordsfrequently handled by
an entire department dedicated to that function. A document management
system performs an analogous function by maintaining document
resource records.
The
document manager must have the means to store, edit and retrieve
the document resource records, or profile information, for the documents
under its control. The obvious solution is to use a database
system of some kind. Storing profile information in a database affords
all the advantages inherent in database technology to the document
manager. By means of a database the document manager can handle
a vast amount of information that can be stored, organized, and
searched quickly and efficiently by large numbers of users. The
database contributes an element of structure in parallel with the
document repository, which consists of largely unstructured data.

In
determining an approach to implementing database technology for
a document management system (DMS), the essential questions must
revolve around how best to leverage:
- the
informational requirements inherent in managing a large document
set
- available
system resources at a DMS client site
- the
skill level of information system staff (to support a particular
database implementation)
- the
day-to-day realities of document flow and usage experienced by
users of the DMS solution.
Keep
in mind that document management system databases or not generally
end-user facing, meaning that most users do not have any direct
experience with the database in typical usage scenarios. No more,
say, than most automobile drivers have direct experience of their
car's engine while driving. When it works as designed, the driver
can disregard it and concentrate on getting from one place to another.
As with automobile engines, the database engine of a DMS should
be unobtrusive, efficient, powerful, and reliable. It should enable
customers to achieve desired outcomes at an acceptable performance
level without draining resources excessively or breaking down when
needed most.
A
document management system that employs a central database stores
all document profile information in a single, monolithic database.
Typically this is a relational database management system (RDBMS)
that resides on a dedicated Server attached to the network. As users
of the document management system work with documents across the
network, the DMS routes all document profile updates and information
requests to the server housing the central database.
Typically
the database is an adjunct to the documents, which tend to remain
distributed throughout the network. Database records therefore contain
a pointer of some sort that "attaches" the record to the
corresponding document's physical location. An extreme application
of a centralized database approach to DMS, however, foregoes this
level of abstraction and actually stores the documents within the
database structure, often as a binary large object, or BLOB. While
highly secure, this approach can also increase risk of lost documents
or impeded productivity during a hardware or software breakdown.

A
centralized database offers several key benefits:
- controlled
access to the document profile repository
- quick,
efficient searching
Many
DMS vendors have constructed systems that use SQL database technology.
Ostensibly, this is to enhance interoperability and to take advantage
of the client organization's existing expertise with the technology,
if present. SQL databases offer these benefits:
- some
sites can leverage existing SQL proficiency (if present)
- scaleable
support for Wide Area Networks
- offers
opportunity to consolidate data stores.
On
the down side, a single, centralized database containing the entire
body of information about a firm's document repository presents
a single point of failure. Without a properly defined and strictly
enforced backup regime, the failure or corruption-for any reason-of
a centrally stored document profile repository, can be catastrophic.
If the sole document profile database is corrupted, restoring any
lost information becomes a drop-everything, mission-critical operation.
Even with current backups, at a minimum the DMS-and probably the
network as well-will have to be "brought down," bringing
productivity to a standstill.
While
the potential for a "doomsday" DMS scenario is not necessarily
high in well implemented and well maintained systems, it does
happen from time to time. Stories abound of firms having to
stop all work, search for the latest backup tapes-or worse, of having
to recreate document profiles from scratch.
The
more likely scenario on the downside is that the server hosting
the centralized DBMS may go down. Without a redundant system in
place, such as mirroring, the loss of the server will generally
enforce a network-wide work stoppage.
In
large organizations with the staff and the know-how, an SQL back-end
to a DMS can prove beneficial on several fronts. However, in smaller
organizations, or those lacking in-house SQL expertise, a dedicated
SQL database can prove to be a serious resource drain, both on the
network and on budgets that are bound to expand in order to accommodate
outsourcing SQL database expertise. SQL databases are powerful,
but that power comes at a cost.
Be
assured that installing an SQL database is in no way a turnkey operation.
It takes studied expertise to set up the database, and ongoing tuning,
backups, and continual upkeep in order to keep it running within
acceptable performance limits.
It
is the presence of these all too real concerns that has led to another
approach
A
distributed database offers an antidote to "putting all your
eggs in one basket." The key difference between a centralized
and distributed database is where the information is stored.
A document management system using a distributed database stores
all the necessary document profiling information dispersed throughout
the network. The information is stored at various points, or nodes.
The storage nodes may be based on the network architecture or on
disk structure. Though the data is stored in multiple physical locations,
the distributed database is centrally managed. Distributing the
database compartmentalizes the information, greatly reducing the
chance of loosing the entire database.
A
typical implementation in a DMS might allocate the profile database
along the lines of the logical structure shared by the documents
within the network. For example, each system Folder (directory)
containing profiled documents will have a corresponding DMS data
set.

The
distributed data approach offers several advantages:
- There
is no single point of failure (with respect to data loss)
- Updates
happen close to where the work occurs and may therefore happen
faster (less latency in the system)
- Certain
operations, such as "cloning" profiles, can be faster
- No
special procedures are required to backup and restore document
profile information-it gets backed up during the standard backup
routine
- The
distributed approach is inherently scaleable
- A
distributed database should optimize processor use as individual
data sets required by processing functions will be smaller.
As
with the centralized database, the nature of the distributed database
is transparent to the end user. A user simply works with the document
management software to save documents, retrieve documents, etc.
The DMS handles whatever mediation is necessary with respect to
the profile database.
There
are, to be sure, some disadvantages to using a distributed database
within a document management system:
- Where
databases share media with the profiled documents, database resources
may be unprotected (e.g. users may delete database files)
- Data
synchronization must be managed by the software which requires
processing overhead
- Searching
across data sets can be slow.
Rather
than accept the intrinsic limitations of one particular database
model over all others, the designers of WORLDOX have chosen to combine
the robustness and reliability of a distributed data model with
the speed and efficiency inherent in a centralized database. This
unique approach enables WORLDOX to exploit the benefits of each
model, while circumventing the various drawbacks.
The
implementation of a two-tiered database architecture sets WORLDOX
apart from other document managers on several fronts:
Access
to files when the central database or network is unavailable
The
primary benefit directly impacts productivity in that with WORLDOX
users maintain access to their documents along with the profile
information even when the central database is unavailable for any
reason. With WORLDOX's mirroring technology users maintain full
access even in the event of a network shutdown. Mirroring, coupled
with the WORLDOX distributed database, maintains local copies of
work files that users can work with while off-line. When network
connections are restored, WORLDOX automatically synchronizes local
documents with their network counterparts.
The
central database can be rebuilt from distributed databases
If
the central WORLDOX database becomes corrupted for any reason, it
can be rebuilt directly from the information stored in the local
databases. This ensures that the database contains the latest information
available about the documents under DMS control. Restoring a database
from an overnight backup, on the other hand, will fail to include
information updated subsequent to the time of the backup creation.
Distributed
databases are included in routine backup procedures
Each
distributed WORLDOX database is backed up as part of routine network
backup procedures. The databases, in essence, "live with the
documents," and therefore remain tightly integrated with them.
The
distributed databases compartmentalize document profile information
If
a database becomes corrupted somehow, the damage is localized, allowing
quick and easy recovery without impacting the entire network. There
is no need to take down the entire network, or to limit access to
the document management system while restoring the local database
affected.
WORLDOX
stores profile data in a distributed database consisting of linked
pairs of data files residing in each directory containing profiled
documents. Each distributed data set consists of two files, XNAME.LIB
and XNAME.CRS, which are described briefly in the following table.
| Filename |
Description |
| XNAME.LIB |
Contains document numbers (DOS names), extended
names, and file security information. |
| XNAME.CRS |
Contains custom profile field and version control
information. |
In
many customer sites-smaller installations for the most part-the
WORLDOX distributed database fully satisfies the database requirements
of the DMS and delivers acceptable search performance. In such sites
WORLDOX works exclusively with its distributed database.
Which
highlights another advantage the WORLDOX dual-database architecture
affords: it is able to adapt to the requirements of an organization's
computing and business environment. WORLDOX does not present a "one
size fits all" architecture forcing organizations to accede
to its needs.
In
larger sites, however-which comprise the majority of WORLDOX installations-documents
typically number into the hundreds of thousands. Such sites are
able to reap the full benefit of the WORLDOX dual database organization
by implementing a central profile data structure to achieve optimal
search performance.
The
central database duplicates the information contained in the distributed
database, but in a form that is optimized for search speed and efficiency.
The central database is created and maintained by a dedicated computer
called an Index Server (or Indexer, for short). An installation
may have one or more Index Servers, depending upon the number of
documents that are profiled, whether full-text searching is implemented,
and other network configuration factors.

From
the users' perspective, WORLDOX databases essentially fade into
the background. XNAME files are generally configured to be hidden
system files so that users can neither see them, nor accidentally
delete them. The central database resides on a file server on the
network insulated from direct user access.
As
users work with documents within the WORLDOX desktop, their actions
affect the local database. WORLDOX client software does not directly
contact the central database except when conducting searches. Updates
to the central database occur by means of queued change files that
each WORLDOX desktop posts to the Indexer.
The
WORLDOX Central Profile Database
As
a general rule, a WORLDOX site will implement a central profile
database for each volume or repository containing documents. All
central database files are grouped in a shared directory which is
located beneath the WORLDOX program directory by default. This directory
must be visible to all WORLDOX users on the network. The default
directory naming convention places each database in a separate subdirectory
as shown in the following example:
- The
profile database for a volume labeled 'X' is located in F:\WORLDOX\ISYS\PROFX.
- The
profile database for a volume labeled 'Q' is located in F:\WORLDOX\ISYS\PROFQ.
The
WORLDOX central profile database uses the Isys search and retrieval
engine from Odyssey Development Corporation. The following table
provides brief descriptions of the files that make up the WORLDOX
central profile database.
| Filename |
Description |
| EAFILE00.FTA |
List of directories under WORLDOX control. |
| EAFILE00.FTB |
The primary central database file, containing file
records from the XNAME files. |
| EAFILE00.IXA |
Index to FTA |
| EAFILE00.IXB |
Index to FTB |
| ISYS.IXA |
Isys Index A |
| ISYS.IXB |
Isys Index B |
| ISYS.IXC |
Isys Index C |
| ISYS.CFG |
Isys database configuration file |
All
changes to document profiles made by the client software-that is,
the WORLDOX Desktop-are written directly to the local distributed
database (the XNAME files in the document's directory). This reinforces
overall system reliability in that all changes are saved immediately,
meaning that profile "transactions" are not subject to
network outages, nor latency. As changes are made throughout the
network, the WORLDOX Index Servers identify the changes and post
them to the central database.
In
addition to continuously polling the network, WORLDOX Index Servers
can follow a scheduled update process programmed by the site administrator.
This allows each unique installation to implement a central database
update strategy that is optimized for that site.
If
you have any questions about the information presented in this
paper, or would like more information, please contact World
Software Corporation via electronic mail at worldox@worldox.com.
This paper is available at the official WORLDOX web site at
www.worldox.com.
World Software Corporation
124 Prospect St.
Ridgewood, NJ USA 07450
201-444-3228
www.worldox.com
|