Back to top
The House of Senate at Australian Parliament House, red seats in a semi circle facing the front of the room.

TeraText

The TeraText suite of products provides a solution for large-volume, high-complexity collections of documents.

Managing and Searching Collections of Text

Most information in organisations resides in semi-structured, primarily textual documents, not in structured, organisational repositories. The volume of reports, submissions, emails, contracts, policy documents and similar documents in most large organisations is beyond the capacity of most systems.

The TeraText suite of products provides a solution for large-volume, high-complexity collections of documents. TeraText products support diverse systems, including those that manage and assemble technical documents and legislation websites. These products also support back-end processes for drafting and publishing legislation and related documents, dictionaries, email and document archives, and other large-scale collections of complex documents or metadata.

Technology

The TeraText suite of products includes technologies that solve complex text-oriented problems. These include:

  • TeraText Database System (DBS) - a high-performance repository for text-rich assets.
  • TeraText Document Management System (DMS) - augments the DBS with business process management and document and component version management capabilities.
  • TeraText for Legislation - adds a set of tools to the DMS to help manage the process of drafting and publishing legislation.

Read more on each of our products below.

Outstanding Performance, Scalability And Reliability
  • Instant access to information - Information inserted into the database becomes instantly available for search and retrieval. There is no down-time while the database is being updated.

  • Unsurpassed indexing / retrieval speed for structured text documents - TeraText DBS scales to support over a thousand interactive updates per second while continuing to allow thousands of end users to access the collection. Structured documents encoded as XML (eXtensible Markup Language) are stored natively to eliminate the time-consuming process of document decomposition and reconstruction.

  • Scales to index & query text collections from gigabytes to multi-Terabytes -TeraText DBS was designed to support distributed search and retrieval from small to very large text collections, handling both static and real time collections. TeraText DBS utilises a single logical view to provide access to the physical collection of databases. For large collections, the database is generally distributed to many smaller physical databases. These databases can either be appended together to form one database or the collection can be aliased together. This allows you to create, manage and search multi-terabytes or more of information.

    Note: Our largest deployed system currently holds several billion XML documents (8+ terabytes). In this implementation, the TeraText DBS inserts and indexes up to 1,000 documents per second. The information is immediately searchable by the end user. A complex full text search across the entire collection can be accomplished in seconds. We have a team of experienced developers who will work with you to deliver total solutions, and offer a full training package to enable your own developers “to get up to speed.”

  • Survives server failures - TeraText DBS is designed to automatically recover from unexpected problems. Power failure? OS crash? No problem! The TeraText DBS will restart without losing a single record, ready to resume normal operations.

Minimises Storage Requirements

TeraText DBS uses sophisticated compression techniques. Compressing the text minimises the size of the data files, and specialised index compression techniques enable ultra-fast text searching. In many instances, the storage requirements for the indices and documents are often no larger than those of the original collection.

Flexible Integration With A Modular, Standards-Based System

TeraText DBS components are modular and can be installed as a suite or as individual modules to work with existing database management and document-authoring systems.

  • Supports XML, SGML, Unicode, Z39.50, HTTP and other industry standards - TeraText DBS is based on open standards. Leading text and document standards are supported to ensure that TeraText-based solutions have a long life and can co-exist with current and future infrastructure.

  • Unique applications server provides immediate access to any TeraText Database - TeraText DBS supports plug and play modules for complex value added web services.

  • Built on the Z39.50 Standard — the Library of Congress Standard Protocol for Information Retrieval - This is the only worldwide industry standard protocol for information retrieval in a distributed environment. This protocol allows TeraText DBS to scale to support multi-terabyte collections.

  • Provides a rich development environment that includes Java, C++, and .NET® APIs - Custom applications are a breeze thanks to an extensive suite of libraries that provide ingest, indexing, searching, retrieval and many other capabilities.

Comprehensive Security Features

TeraText DBS provides role-based access to data at the field, record and database levels. This enables an administrator to restrict access to sensitive data down to the level of specific XML nodes. TeraText DBS has a very strict security model, designed to prevent unauthorised users from even being aware of the existence of sensitive data. Other security features include support for Lightweight Directory Access Protocol (LDAP), Kerberos and the Generic Security Service (GSS), and Secure Sockets Layer/Transport Layer Security (SSL/TLS) to identify, authenticate, and authorise users and protect and encrypt sensitive information.

XML Capable

TeraText DBS as an XML-capable product was designed to store, retrieve and manipulate semi-structured text. By storing native XML (and its predecessor SGML), you get back what you put in. There is no time-consuming document decomposition or reconstruction required. Documents remain intact for faster updates and quicker access. The system also indexes all or part of the document using XML standards, enabling complex and comprehensive searching. In addition to storing XML natively, TeraText DBS can store alongside that XML other fielded data such as filenames, time stamps, and arbitrary binary data (for example, a native Word or PDF document from which the XML content was derived). This allows applications to take advantage of powerful XML capabilities without altering authoritative XML data that is created in other environments or tools.

Supports Complex Searches

TeraText DBS has integrated support for an extensive array of search capabilities including:

  • Full text and fielded
  • Proximity operators (near, order)
  • Text structure operators (with [in same paragraph], same [in same sentence])
  • Range operators (string, numeric)
  • Fuzzy match, stemming, weighted
  • Limit operations
  • Custom case folding, punctuation stripping, transformations, expansions, etc.
  • Boolean operators (and, or, not, xor)
  • Wildcards for characters and words (#, #n, ?, ?n)
  • Relevance ranked search
  • Index scan operations to search the index
  • Hit highlighting
  • Saved searches