
TeraText
The TeraText suite of products provides a solution for large-volume, high-complexity collections of documents.
Managing and Searching Collections of Text
Most information in organisations resides in semi-structured, primarily textual documents, not in structured, organisational repositories. The volume of reports, submissions, emails, contracts, policy documents and similar documents in most large organisations is beyond the capacity of most systems.
The TeraText suite of products provides a solution for large-volume, high-complexity collections of documents. TeraText products support diverse systems, including those that manage and assemble technical documents and legislation websites. These products also support back-end processes for drafting and publishing legislation and related documents, dictionaries, email and document archives, and other large-scale collections of complex documents or metadata.
Technology
The TeraText suite of products includes technologies that solve complex text-oriented problems. These include:
- TeraText Database System (DBS) - a high-performance repository for text-rich assets.
- TeraText Document Management System (DMS) - augments the DBS with business process management and document and component version management capabilities.
- TeraText for Legislation - adds a set of tools to the DMS to help manage the process of drafting and publishing legislation.
Read more on each of our products below.
-
Outstanding Performance, Scalability And Reliability
Instant access to information - Information inserted into the database becomes instantly available for search and retrieval. There is no down-time while the database is being updated.
Unsurpassed indexing / retrieval speed for structured text documents - TeraText DBS scales to support over a thousand interactive updates per second while continuing to allow thousands of end users to access the collection. Structured documents encoded as XML (eXtensible Markup Language) are stored natively to eliminate the time-consuming process of document decomposition and reconstruction.
Scales to index & query text collections from gigabytes to multi-Terabytes -TeraText DBS was designed to support distributed search and retrieval from small to very large text collections, handling both static and real time collections. TeraText DBS utilises a single logical view to provide access to the physical collection of databases. For large collections, the database is generally distributed to many smaller physical databases. These databases can either be appended together to form one database or the collection can be aliased together. This allows you to create, manage and search multi-terabytes or more of information.
Note: Our largest deployed system currently holds several billion XML documents (8+ terabytes). In this implementation, the TeraText DBS inserts and indexes up to 1,000 documents per second. The information is immediately searchable by the end user. A complex full text search across the entire collection can be accomplished in seconds. We have a team of experienced developers who will work with you to deliver total solutions, and offer a full training package to enable your own developers “to get up to speed.”
Survives server failures - TeraText DBS is designed to automatically recover from unexpected problems. Power failure? OS crash? No problem! The TeraText DBS will restart without losing a single record, ready to resume normal operations.
Minimises Storage Requirements
TeraText DBS uses sophisticated compression techniques. Compressing the text minimises the size of the data files, and specialised index compression techniques enable ultra-fast text searching. In many instances, the storage requirements for the indices and documents are often no larger than those of the original collection.
Flexible Integration With A Modular, Standards-Based System
TeraText DBS components are modular and can be installed as a suite or as individual modules to work with existing database management and document-authoring systems.
Supports XML, SGML, Unicode, Z39.50, HTTP and other industry standards - TeraText DBS is based on open standards. Leading text and document standards are supported to ensure that TeraText-based solutions have a long life and can co-exist with current and future infrastructure.
Unique applications server provides immediate access to any TeraText Database - TeraText DBS supports plug and play modules for complex value added web services.
Built on the Z39.50 Standard — the Library of Congress Standard Protocol for Information Retrieval - This is the only worldwide industry standard protocol for information retrieval in a distributed environment. This protocol allows TeraText DBS to scale to support multi-terabyte collections.
Provides a rich development environment that includes Java, C++, and .NET® APIs - Custom applications are a breeze thanks to an extensive suite of libraries that provide ingest, indexing, searching, retrieval and many other capabilities.
Comprehensive Security Features
TeraText DBS provides role-based access to data at the field, record and database levels. This enables an administrator to restrict access to sensitive data down to the level of specific XML nodes. TeraText DBS has a very strict security model, designed to prevent unauthorised users from even being aware of the existence of sensitive data. Other security features include support for Lightweight Directory Access Protocol (LDAP), Kerberos and the Generic Security Service (GSS), and Secure Sockets Layer/Transport Layer Security (SSL/TLS) to identify, authenticate, and authorise users and protect and encrypt sensitive information.
XML Capable
TeraText DBS as an XML-capable product was designed to store, retrieve and manipulate semi-structured text. By storing native XML (and its predecessor SGML), you get back what you put in. There is no time-consuming document decomposition or reconstruction required. Documents remain intact for faster updates and quicker access. The system also indexes all or part of the document using XML standards, enabling complex and comprehensive searching. In addition to storing XML natively, TeraText DBS can store alongside that XML other fielded data such as filenames, time stamps, and arbitrary binary data (for example, a native Word or PDF document from which the XML content was derived). This allows applications to take advantage of powerful XML capabilities without altering authoritative XML data that is created in other environments or tools.
Supports Complex Searches
TeraText DBS has integrated support for an extensive array of search capabilities including:
- Full text and fielded
- Proximity operators (near, order)
- Text structure operators (with [in same paragraph], same [in same sentence])
- Range operators (string, numeric)
- Fuzzy match, stemming, weighted
- Limit operations
- Custom case folding, punctuation stripping, transformations, expansions, etc.
- Boolean operators (and, or, not, xor)
- Wildcards for characters and words (#, #n, ?, ?n)
- Relevance ranked search
- Index scan operations to search the index
- Hit highlighting
- Saved searches
-
Built On A Solid Structured Text Database Foundation
TeraText Document Management System (DMS) is built on the TeraText Database System (DBS) and inherits many of the capabilities of the DBS, including:
- Outstanding performance and scalability
- Minimising storage requirements
- Flexible integration with an emphasis on standards
- Comprehensive security
- Native storage of XML
- Support for complex text search
- Role-based security model with very fine control
Authoritative Repository To Manage Authoring And Publication Process
TeraText DMS supports an authoritative document repository to manage documents from draft through QA to final published baseline. TeraText DMS aids the document creation process with standards-based versioning and workflow. It helps improve quality by keeping audit trails and tracking dependencies.
Advanced Versioning Capabilities
TeraText DMS offers extremely powerful versioning capabilities for complex, structured and semi-structured documents in SGML or XML format. Whole documents can be versioned as a single series, a set of branches with alternatives, or threaded with draft and published threads. A single version can collect multiple renditions — XML, Word, PDF and HTML. This allows powerful XML search capabilities to be used to retrieve matching PDF or Word documents. Documents can be fragmented using XML markup and the individual components versioned independently of the documents to which they belong. This allows component reuse and the ability to track in which documents and versions of documents a component is used. It is even possible to search and retrieve a document as it once existed at a particular point in time.
Standards-Based Service Oriented Architecture
TeraText DMS delivers a set of web services using the W3C's SOAP and WSDL framework so TeraText DMS can be utilized from most modern programming environments, including .NET, Java/J2EE, Python, and PHP. TeraText DMS capabilities can be delivered in a default web interface (which provides a worklist interface) or via custom interfaces using browsers or directly in authoring environments. Customers are currently accessing TeraText DMS from authoring environments, including Microsoft Word, Adobe FrameMaker, PTC/Arbortext Editor, and WordPerfect.
Document versioning capabilities are based on the complete Document Management Alliance object model, architecture and application programming interfaces (APIs), adapted for web services. Workflow and business process management capabilities are based on the Workflow Management Coalition set of standards and APIs, adapted for web services.
Proven in Complex Document Environments
The TeraText DMS has been applied to managing the authoring and publication of technical documentation and complex legal documents such as legislation.
-
Overview
TeraText for Legislation adds a set of tools to the TeraText Document Management System (DMS) to help manage the process of drafting and publishing legislation and governments, (including United Kingdom, Scotland, Northern Ireland, Falkland Islands, Singapore, Canada, and Australia), manage and even automate many of the drafting and publishing steps for these highly important documents.
This tool set is the basis of the highly successful EnAct® system deployed in Tasmania, and the Legislation Information System (LEGIS) system deployed in New South Wales and the Australian Federal Parliament, with additional instances deployed as the Lawmaker system for the United Kingdom (including devolved parliaments in Scotland and Northern Ireland), ADAPT system for New South Wales, QuILLS system for Queensland, and the Legislation Editing and Authentic Publishing (LEAP) system for the Singapore governments.
Amendments To Legislation
An important characteristic of legislation is that it changes over time. Amendments, sections or even larger units can be added, removed or altered. Although new laws are created, more often existing legislation is altered or amended. This must be done by creating amending laws with specific editing instructions, perhaps changing the wording of one or two sections, or replacing complete sections, or even removing or inserting whole parts or chapters.
Legislation's Temporal Nature
Although only the original law and the amending laws have legal force, lawyers and legal researchers need access to the law as it was during the time period relevant to their particular problem. From time to time, governments authorise an agency to issue publications that consolidate particular laws. These consolidations include codes, reprints, and revisions. A consolidation represents current law, presenting the law as modified by all relevant amending laws — with all additions, deletions, and changes to wording applied, and with all new components inserted. However, lawyers are often interested in the state of the law at times other than those for which officially released consolidations are available. Ideally, they would like to access consolidations of the law at any and every arbitrary point in time.
Key Challenges
A number of key challenges exist in delivering an effective legislative management system:
- Representing and maintaining the structure inherent in legislative documents
- Providing the ability to search legislation databases at an arbitrary point in time
- Certain legislative and regulatory procedures are rigid and strict compliance is required for a document to be accepted as valid law (e.g. a Bill must be read three times in each chamber, and a Bill must pass a majority in each chamber).
- Numerous events and activities relating to Bills are optional and can take place at multiple stages during the law-making process (e.g. committee hearings, accepting or rejecting proposed amendments).
- The legislative environment presents an unusual mixture of important and highly public documents, highly sensitive confidential documents, and the security requirements for managing the content and status of both types of documents can be complex.
Representing Structure With XML
The use of XML solves the problem of how to represent the structured text inherent in legislation. XML defines an abstract grammar for representation and exchange of text with tags interspersed throughout the text. A DTD (document type definition) or Schema is a particular XML grammar describing which document components are valid and what sub-components the document can contain. Bills or resolutions from a given jurisdiction can be stored in XML in a format satisfying a particular structural requirement (i.e., every Bill must contain sections and each section must contain text, or two or more subsections, and so on). One would then describe how to display a particular bill that satisfied the structural requirement by describing the presentation in terms of the DTD or Schema. A number of different presentation schemes can be described for a single DTD so that one might specify a presentation that only displays the table of contents to a specified depth as well as a presentation for the whole bill or resolution. This is one of the advantages most often cited for using XML: the ability to reuse the same information for multiple purposes. TeraText for Legislation supports the international standard for representing legislative documents in XML (Oasis standard LegalDocML otherwise known as Akoma Ntoso) as well as a number of pre-existing custom Schema (e.g. QuILLS XML, ADAPT XML, LEAP XML).
Supporting Point-In-Time Search
TeraText for Legislation augments the TeraText DMS with tools to help automate the drafting of amendments in such a way that the creation of amending laws and the production of various types of consolidation of those amending laws can be largely automated. When the workflow process completes the enactment or making of a law, the consolidation process applies the amendments to produce a new consolidated version of the text.
Whether these tools are used (as in EnAct) or a manual consolidation process is applied (as in LEGIS), TeraText for Legislation uses the TeraText DMS and the TeraText Database System (DBS) to present these consolidations, allowing users of the law to search and browse the entire collection of laws as they were at a particular time. Accessing databases of laws does not involve only viewing text. Collections of laws contain large number of interrelated documents ideally linked by hyperlinks. Viewing a consolidation of laws at a particular point in time involves retrieving the correct text as well as the correct hyperlinks at one time.
Example Site: Tasmanian Legislation
This site gives free public access to the Tasmanian Legislative Acts in consolidated form. This site also features advanced searching and browsing capabilities with all cross-references and amendment history stored as electronic hyperlinks.
Managing Legislative Process
Managing the legislative process requires an extremely flexible workflow and metadata management platform to support a mixture of rigid and ad hoc events and tasks associated with a single Bill or a set of related Bills, a flexible document versioning environment, and an intimate understanding of the legislative process with strong security credentials to ensure appropriate security is applied.
TeraText for Legislation supports Bills before the legislature as Projects. The metadata, lifecycle stages, and business rules of a Project are configured in a Workflow Process Definition. In addition to the more rigid process rules, TeraText for Legislation also captures events – for example, an amendment motion or a committee hearing. Events can accurately capture process details – such as the date of a committee hearing or the resulting report - that can repeat, or can occasionally happen in different orders (for instance, proposed changes from one chamber may be referred back to the responsible Standing Committee in the other chamber after their initial report has been presented at second reading), without losing or overwriting data fields. Events can also be linked to specific versions of a document allowing tracking of which documents are actually used for various purposes. Each document or document version can be associated with one or more other documents, document versions, events or processes.
The web-based interface to these processes and events can be customised for different users, roles or responsibilities. A simple web report interface allows authorised users to create standard and ad hoc reports over the whole repository or selected subsets.
Managing Access and Security
TeraText for Legislation is built on TeraText DBS which has a security model developed for the intelligence community supporting role-based access to data at the field, record, and database levels. This enables an administrator to restrict access to sensitive data down to the level of specific XML nodes. TeraText for Legislation can manage high-performance access to public documents, restrict access to confidential documents to authorised users, and move documents automatically between repositories dedicated to public or private access as required. Access to change the content or status of a document is restricted to authenticated and authorised users. What a user is authorised to do with a document depends on the roles of that user, the workflow status, metadata stored in or with the document or document version (such as the date of tabling, or the date for public release), and metadata associated with the workflow (such as the type of Bill, the sponsor, the assigned standing committee, etc). Other security features include support for Lightweight Directory Access Protocol (LDAP) and Active Directory, the Generic Security Service (GSS), and Secure Sockets Layer (SSL) to handle encrypted sensitive information.
An Ideal Partnership
TeraText for Legislation provides a number of tools that augment the underlying TeraText DMS and DBS capabilities with the ability to provide accesses to the correct state of the law at any point in time. By using XML to represent the structured text inherent in legislation, TeraText for Legislation solves the challenge of maintaining a consistent structural representation for the documents, while allowing the database designer to reuse the same information for multiple purposes. By using a combination of XML and sophisticated search and indexing capabilities, TeraText DBS effectively and efficiently supports point-in-time access to legislation. The TeraText DMS provides powerful fragmentation capabilities, allowing large pieces of legislation to be managed both as whole documents and as individual components, such as separate sections, to be versioned independently of the whole document. A combination of flexible and rigid workflow capabilities and a robust security environment provides a powerful platform for a variety of solutions in the legislative and regulatory environment.
Leidos has over 15 years' experience developing solutions for the legislative environment using this platform and can provide a cost-effective application customised to each jurisdiction's requirements.