Administration
Easy Installation and
Administration
The DocuQuest installation process is quick and easy allowing
you to be up and running within minutes. Integration with your
existing document collections is straightforward and requires
minimal training to use and deploy.
Back to Top
Leverages Existing Security
Policies
DocuQuest Personal uses the secrity policies of
the local computer to enable or deny access to local files.
DocuQuest Network permits users on a network to share DocuQuest libraries eliminating the
need for the document collections to be on individual computers.
To accomplish this DocuQuest is designed to be used in conjunction
with network security policies. Network administrators would establish at
least two Group accounts to be associated with DocuQuest:
-
DocuQuest
Administrators
Members assigned to this Group account will
have permissions granted that will enable them to create, delete,
rename and modify DocuQuest libraries. They will also be able to
index and search libraries.
-
DocuQuest Users
Members assigned to this Group account will
only be able to search DocuQuest libraries and view
results.
Back to Top
Low Total Cost of Ownership
DocuQuest provides an extremely low total cost of ownership
through its initial reasonable price, its low demand on computer
resources, its ease of use, and its substantial increase in
organization and user productivity.
Back to Top
Extensive Help and Tips Support
It's often said, "No one reads the documentation!" Is
this because it's hard to find and impossible to understand?
Because DocuQuest is a system designed to be distributed over the
Internet, we have focused on building a system that bridges the
gap between hardcopy documentation and software. If you are
using the DocuQuest software, you are in the documentation.
All DocuQuest windows have Help enabled on their controls and an
extensive Help system is always available on the menubar.
For new and casual
users, DocuQuest includes an extensive set of Tips or hints on using
the software. The Tips are especially useful the first few
times you use the DocuQuest system. They present helpful information
in the context of where you are as you operate the software. As you
use the software you will have less and less need for the Tips.
Consequently, you can turn the Tips on or off as needed.
Back to Top
Seamless Integration with Microsoft
Office Systems, Corel Word Perfect and Adobe Reader and Acrobat
One of the most striking features of DocuQuest is its use of
the Automation interfaces provided in modern office applications
such as Microsoft Office, WordPerfect Office and Adobe
Acrobat. Through a software process called Automation, these
software vendors enable their software's features to be
seamlessly accessed and used by second-party software. What
this means, for example, is that if a software developer
needs a Spell Checker in its software, it can use the Automation
interface of Microsoft Word to gain this capability without having
to write this facility for itself.
DocuQuest takes advantage of the Automation interfaces
of Microsoft's Word, Excel and PowerPoint, Corel's Word Perfect and
Adobe's Acrobat to tightly integrate its index and search facilities
to the document processing facilities of these
software vendors. What this means to DocuQuest users is that they can use
DocuQuest to find lost or misplaced documents and switch directly
to the application that was used to initially build the document.
Using Automation, the selected document is opened and is positioned
to the search criteria, greatly assisting in recognizing the
document.
Since the Automation interface remains open until
the user is finished with the document, its possible to use
DocuQuest to search for other occurrences of the search criteria,
select a different document from the search results list, refine
the current search, or make a new search - all without closing
either application.
Back to Top
Index
Documents Referenced by Libraries
Documents can be physically stored
almost anywhere on a computer's storage system - on removable drives
(diskettes, CD-ROMs, etc.), on hard drives, or anywhere on network
servers. At issue is how one collects these documents together
for indexing and searching purposes. DocuQuest takes the view
that the documents need not and therefore should not be
moved. Rather, a logical entity is needed for collecting
together documents. These entiries should be named according
to the specific purpose they serve for the user. Hence,
DocuQuest evolved the notion of the DocuQuest
library.
In its simplest form a library
is nothing more than a named collection of places where documents are
stored. The name should identify the collection and give
some suggestion as to its purpose. For example, a user's correspondence
may be stored on several hard drives within several separate
folders. However, the user may want to bring together all
of these disparate document collections under one name.
Not physically move the documents, just collect together the
respective path names.
The DocuQuest library is also
an appropriate point for specifying control parameters for how the
document collection will be managed. DouQuest permits
different kinds of document collections to be managed
differently. For example, a library's Scope can be define
as(1) Removable Media, (2) Local Hard Drives, or (3) Network
Resources. This aspect significantly effects system operations
involving user prompting, security and other operational concerns.
Other library parameters control indexing, searching and
viewing.
Thus, the DocuQuest library is the unit that
the user selects to focus a search on a particular set of
documents. Clicking the Library button brings up a
control panel presenting all previously defined libraries showing the name, owner, scope, date
created and size. This control panel allows for
selecting a library and provides controls for setup (changing the
library's parameters), adding, deleting or renaming the library, and for indexing
and searching it. Each library maintains its own
query memory of its most recent searches.
Back to Top
High Performance Index Algorithm
Most Full-Text indexing systems use an algorithm that results in what
is called an Inverted Index. This algorithm extracts each word
or term in the document collection and builds a list of the documents
that contain the term. For each occurrence of a term the algorithm
notes the frequency of the term and the position in the document where
the term appears (only needed if proximity queries will be supported). Some
Inverted Index algorithms express the position as the document,
section, paragraph, sentence, and location within the sentence where
the term occurs. This position record has high impact on the
size of the Inverted Index which can result in indexes exceeding the
size of the document collection. It also has high update
costs.
To overcome this size inflation and to provide
greater flexibility in what can be
searched for, DocuQuest uses a novel and proprietary indexing technology
called Hyperdex. The Hyperdex technology does not use words
or terms as its base units for indexing, rather it
uses 3-tuple character patterns and indexes them into a compressed bitmap
structure that is optimal for index space-time tradeoffs. The result is
a concise, highly accurate index that permits searching for
any 3-character term regardless of where it appears in
a word or phrase. Thus a search for the word "symmetry"
becomes a search for "sym", "ymm", "mme", "met", "etr" and "try."
Interestingly, this technique can overcome problems with term
prefixs and suffixes (see Word Stemming, Truncation and
Fuzzy Search below), allowing the same search to find
"antisymmetry" and "asymmetry" as well. Leaving off the
trailing "y" would also find "symmetrical", "symmetrization",
etc.
Back to
Top
Small Index Size
DocuQuest's Hyperdex technology also
makes possible the smallest reasonable index size for a given
volume of text. For example, the word "symmetry" requires
8 characters of 8 bits each or 64 bits for its storage.
Using the Hyperdex approach, the index bitmap requires only
6 bits to index this word. Judiciously choosing how to store
these bits in the index bitmap can result in DocuQuest
indexes no more than 25 percent the size of the total document
text.
Back to
Top
Manual and Scheduled Index Updates
A DocuQuest library can have its
Index parameter set to Manual or Scheduled. When set
to Manual, the user decides when to reindex the associated document collection. Indexing
can be total (all documents indexed) or incremental; i.e., only newly
deleted, modified or added documents are indexed.
The Network version of DocuQuest also supports a
stand-alone, background Index Server that can be used to
periodically index libraries. DocuQuest libraries marked
as Scheduled are automatically indexed when the Index Server is
run. The Index Server can be added as a task under the Windows
Task Scheduler where it can be run hourly, daily, or on
any schedule desired. Only the newly deleted, modified
or added documents are indexed. Since the indexing
algorithm is very fast, the index operation appears transparent and any
subsequent search results will reflect the latest status of the document
collection.
Back to
Top
Search
By Single-word or Multiple-words
A DocuQuest search can be for a single word
like "computer" or a phrase like "ad hoc committee." Case
sensitivity in a search is controlled in DocuQuest by using
uppercase characters wherever you want a specific test for uppercase
to be made. The use of all lowercase characters in search
criteria will return hits for either case. For example, a
search for "apple" will find you documents with "apple", "Apple" and
even "APPLE", but if you only wanted "Apple" then you would search
for "Apple."
A single-word or phrase search is the simplest
kind of search. A search using the boolean AND, OR and NOT
operators or a Proximity search can be more complicated. Many
search tools that accomodate such a range of search capabilities
complicate their search dialog by placing controls for every search
capability they possess on one window. It's hard to know what
is allowed or disallowed with what. It quickly gets
confusing.
DocuQuest eases search operations by presenting
a tabbed search dialog with 3 search tabs, Simple Full-Text Search,
Advanced Full-Text Search and Document Parameters Search. If
you just want to do a single-word or multiple-word search, select
the first tab. If you want to do a boolean or proximity
search, select the second tab. The Document Parameters Search
tab can be
used alone or with either of the other search tabs to further
refine the search using document parameters.
Back to
Top
Boolean (AND, OR, NOT) Support
DocuQuest provides a special facility to
formulate more complex search expressions using the boolean
operators, AND, OR, and NOT. Rather than having to type a
formula, DocuQuest lets you simply type a search word or phrase then click
one of the AND, OR, or NOT buttons that are
provided. You continue by typing a second word or phrase, and so
on. As you type and click, the search statement will be built for
you. When you are finished, simply click the OK button to perform the
search.
If a search produces more hits than you want to review, you can select
DocuQuest's Refine mode. This action will automatically return you to
the Advanced Full-Text Search tab where the previous search will be parsed and
displayed in the dialog. You can then click the AND or NOT button to
further narrow the search.
Back to
Top
By Document Parameters
Document parameters such as name, size, date, etc. can be added to either a Simple
or Advanced search to futher refine its results. Knowing any part
of a document name will restrict search results accordingly.
Specifying document sizes greater than or less than
a given value can likewise reduce a search. Also, you can
specify to include documents created within a given date range or to
select documents dated during (or prior to) a given interval
specified in months or days.
Back to
Top
Proximity Search
Using
DocuQuest's Proximity search facility you can look for terms based
on whether they occur (or do not occur) within a specified word
distance of each other in a document.
A Proximity search is considered a more advanced type of
search and is included on the Advanced Full-Text Search tab. It operates
like the boolean search operators.
Back to
Top
Refine Search
Quite often when searching a
library the number of resulting files can be far greater than
would be practical to individually review. To reduce the number of
document hits, one could always start a new search, reenter
the previous search criteria, use the boolean AND or NOT
operator, and add another search term to cut down the
list. This can become both confusing and time consuming.
DocuQuest eliminates this perplexity with its
Refine Search
mode. Just hit the Refine Search button on the toolbar and
DocuQuest does everything for you up to you entering the next
search term.
Back to
Top
Word Stemming, Truncation and Fuzzy Search
Full-Text
search engines that are based on Inverted Indexes embody complex
algorithms to remove particular suffixes and/or prefixes before terms are
added to their index. This eliminates many different
forms of the same term. Called word stemming (arriving
at a common root form), this process makes searching for many
words much easier because it isn't necessary to consider every permutation
of the word when trying to find it. For example, using
word stemming, the suffix "ing" would be removed from terms like
"indexing" or "computing," but what would it do for a term like
"king?"
If an Inverted Index
only allows suffix truncation it can still be maintained
unchanged because all the words covered by a particular truncated
term are adjacent in the index. It gets much more complicated
for prefix truncation where, in some cases, inverse
alphabetizing is used. Thus the word "antitrust" becomes
"tsurtitna" and the rules for suffix truncation are used.
The designers of DocuQuest felt
that many of these techniques are arbitrary and capricious and
generally beyond the understanding and control of the user.
As discussed above in the topic High Performance Index Algorithm the unique way that
DocuQuest indexes document files means that it shouldn't matter as
long as you know what you are looking for. Search for "analy*"
and you will find "analysis", "analyzer" and "analyzing."
Search for "*symmetry" and you will find "symmetry", "asymmetry" and
"antisymmetry." Likewise, search for "*psych* and you will
find "parapsychology", "psychiatrist", "psycho", etc.
Back to Top
Integrated Spell Checker
DocuQuest's Spell Checker tool can be used when you are
entering search criteria on any of its search screens or when
you are using its built-in Word Processor. You will be shown each
misspelled word along with a list of suggested spellings. The Spell
Checker tool is implemented through Microsoft Word. If you do not
have Microsoft Office or Microsoft Word installed on your computer
then the Spell Checker tool will not be available.
Back to Top
Integrated Thesauras
DocuQuest's Synonyms tool can be used when you are
entering search criteria on any of its search screens or when you
are using its built-in Word Processor. The Synonyms tool is
implemented through Microsoft Word. If you do not have Microsoft
Office or Microsoft Word installed on your computer then the Spell
Checker tool will not be available.
Back to Top
Recent Query Memory
For each of
its libraries DocuQuest maintains a query memory of the most recent
searches made on the respective library. The number of queries
that can be saved is configurable.
Back to Top
Result
Views
File Hits View
DocuQuest's Files view shows the first results of a
DocuQuest search. The display shows the path where a document
is stored, the document name, its size, and when the document
was last stored or changed.
Double-clicking a document in
this view sets in motion automatic logic to select a viewer for
the document. If the document type is one for which DocuQuest has
an Automation interface then the appropriate application is
called, passing the search criteria for positioning. If
there is no Automation interface available then the
document file is opened using Windows Association when available. If
there is no Windows association available or if the file is a
text file, then DocuQuest's built-in Word Processor is used to view
the file.
Back to
Top
Customizable View
The Files view shows the results of a
search and is a customizable view. A formatter is
included in DocuQuest to permit users to select just which columns they want to include or exclude
from the Files view. Other features of the formatter allow the user to
combine hit sentences, extracted from selected files, with other columns of interest.
Back to Top
Built-in Word Processor
As its default document viewer and as a user
application for working with many types of
text files, DocuQuest provides a built-in Word Processor. This
Word Processor is similar in function to the Windows WordPad program
and can be used to create or edit text
files that contain formatting or graphics.
Back to Top