A Multimedia Information System for Governmental Historical Documents

Nelson Spangler de Andrade
PRODEMGE - Companhia de Processamento de Dados do Estado de Minas Gerais


Arnaldo de Albuquerque Araśjo
Universidade Federal de Minas Gerais


Cleber Hostalącio de Melo


Multimedia environments are transforming human-computer interactions and allowing the creation of a new family of products that could very well be the catalyst for launching the second information revolution. This new generation of products is helping not only to integrate multimedia into existing environments, but to reengineer work processes. At the very least, multimedia and imaging applications are enriching existing applications by integrating images, voice, and video. More important, they are helping us to rethink information processing in various applications and, with the introduction of multimedia computing, to revolutionarize business, art, science, engineering, and manufacturing processes.

Los dos objetivos básicos de los archivos históricos (la conservación y la difusion de los fondos) animam todo el proyecto: el rápido incremento del número de investigadores que repetidamente consultan los papeles del Archivo hace que la documentación esté sometida a un acelerado deterioro en los últimos anos. Asimismo la demanda de una información más completa y más fácilmente recuperable crece en la actualidad en el terreno de los archivos igual que en los demás campos de la información. Se trata de dos urgentes problemas archivísticos ante los que el desarrollo de un sistema informático puede a la vez ser una solución que frene el grave riesgo producido por la manipulación de los documentos y responda a la necesidad de una información más amplia, profunda y rápida para el investigador.

Como resultado de este proyecto se espera obtener importantes beneficios para la conservación de los documentos, ya que una vez digitalizados, la necesidad de manipulación disminuirá sensiblemente. Al mismo tiempo crecerán las posibilidades de difusión de los fondos documentales del archivo: la localización más rápida de los documentos, la visualización directa de los mismos en pantalla, la posibilidad de consulta remota de la base de dados textual y la facilidad de obtener copias de los documentos en papel o en soportes informáticos, mejorarán la utilización por parte de archiveros e investigadores de todo el mundo.

1 - Introduction

This work describes the elaboration of a multimedia information system for research and popularization through World Wide Web of collections of historical documents belonging to the Arquivo Público Mineiro (Minas Gerais Public Archive).

The main objective of the development of the system is to increase the process of informatization of the Arquivo Público Mineiro what stores about 1600 linear meters of administrative and historical documentation on the state of Minas Gerais, Brazil.

This paper addresses the use of new computer science technologies for storage and processing of multimedia data as images, video, audio and free texts (GHAFOOR, 1995), once these means of information representation generally compose the collections of a public archive.

2-Complex Data and Multimedia Information Systems

Nowadays we live together with a series of communication ways and information as the image, the sound, the movement, the free text, deeply inherent to the senses, to the spirit, to the history and the human knowledge. These types of data are extremely rich, capable to express quantitative or qualitative information in a friendly way and with immediate perception for the user. They are, most of the time, atavics and universals and their understanding is within reach of the human being, independent of social, political, ethnic and cultural factors.

In the entertainment, in the learning, in the arts, in the communication, in the trade, in the sciences, those forms of information execute important and preponderant functions.

From the beginning, in the fifties, until the current days, most of the automated systems of information have used a restricted data type: the classic or conventional data, composed by a limited chain of alphanumeric symbols representing names, codes, measures, amounts or values (ELSMARI, 1994; KORTH, 1995).

The forms of representation of information as image, audio, and video possess much more complex structure than the small chains of letters and numbers. Therefrom the pertinence of the denomination "complex data" generally used to make allusion for those types of data in the ambit of the Computer Science.

Complex data still have small representativeness in the storage and management structures and in the available processing tools in the commercial market of computer science, mainly if we consider information systems that need to manipulate great amounts of those data. It is still a segment under domain of the academic research and of the construction of prototypes and first commercial products in the industrial area.

The inclusion, although incipient, of those "new data" in the context of the computer science, has turned the development of information systems less restricted and closer to the real world, facilitating the human's interaction with the machine and propitiating a better level of information, knowledge and understanding of the reality. It is not for free the popular dictation: "an image is worth more than 1000 words".

Multimedia is a new branch in Computer Science. We can date its beginning in the last years of the eighties, and its real technological and commercial evolution starting from 1993 (YOSHIDA, 1994). This way it is premature to demand accuracy in its concepts and maturity in its methods and tools. Precise definition does not exist for multimedia, it remains, in this segment, a great dose of uncertainty and confusion (RODRIGUEZ, 1995). The term that until few years ago it had no mean became too including (SUTHERLAND, 1995). Still in agreement with RODRIGUEZ (1995) multimedia expanded and turned out to be a field that challenges rigorous definitions. For NEWTON (1997) a lot of people still visualize multimedia as a diversified group of technologies looking for a purpose.

As the own name reveals, multimedia involves several media types, that is to say, several ways to represent and to divulgate the information. Those several media types include image, graph, animation, video, free text, audio, each one with its specific properties. However, multimedia data have a common characteristic concerning its representation in computer: the need of considerable storage volume (DAVID,1996). Those data make intensive use of the primary and secondary storage means: volatile memory, magnetic disks, optic disks. As, more and more, these types of data travel through networks widely distributed as the Internet, powerful compression algorithms and high performance network systems are necessary (KHOSAFIAN, 1996).

Independent of the existence of an accurate definition multimedia, ratifying HOLSINGER (1996), is one of the most powerful ways, on humanity's disposition, to communicate ideas, to present information and to experiment new concepts. KHOSHAFIAN (1996) has similar perception: "multimedia is the richest and expressive form to represent and interact with the information". And he complements asserting to be the multimedia an irreversible tendency in Computer Science and, in the future, responsible for a dramatic revolution in the interaction between the man and the machine. For MORAN (1995), the multimedia technology is capable to modify our relationship with the world, the perception of the reality, the integration of the time within the space. The communication becomes more sensorial, multidimensional and non-linear. There is a reenchantment for that technology because it allows a much more intense interaction between the real and the virtual. RODRIGUEZ (1995) advances that we are destined to become a society that uses (and, perhaps, depends of) a plethora of multimedia applications that will execute in personal or professional computers and in television equipment. The perception of those and other authors (DAVID, 1996; ROSEMBORG, 1993) shows clearly the importance of the multimedia computing.

Multimedia information systems use concurrently several types of multimedia data being capable to organize, to synchronize and to present that complex and including group of information in an interactive way (DAVID,1996; KHOSAFIASN, 1996, GROSKY, 1994,1997). According to ADJEROH (1997) those systems are characterized by the integration of different types of multimedia data originating from several sources. MARCUS (1996) emphasizes that although a considerable volume work on multimedia already exists, produced in the last years, only a small part specifically refers to multimedia information systems.

These information systems are not limited to any application type and any specific area of knowledge. Multimedia applications are useful for several types of users and professionals: students, educators, doctors, economists, engineers, executives, artists, researchers, scientists, etc. They are also important for the entertainment. This way, the evolution of multimedia is interesting for all the segments.

Multimedia applications can be found where exists the need to manage complex data. As classic examples can be mentioned the education areas (local and distance training, digital libraries), health (database of medical images), entertainment (games, video on demand, interactive TV) and business (videoconference, electronic trade).

The Internet and the multimedia walk for an inseparable partnership. A variety of tools and techniques are being developed to support multimedia in network systems (EARNSHAW, 1997). A good example for that is INTERNET-2, an alternative network for high-speed multimedia applications, already initiated in United States. Languages as JAVA and VRML (Virtual Reality Modeling Language) are good as platforms for the development of sophisticated applications based on the World Wide Web (WWW).

The association of multimedia database management systems with the Internet is of special interest for the information systems addressed for museums and other institutions in charge of the guard and popularization of art works and historical documents (LINS, 1995; LANZELLOTE). These conjugated technologies have a great potential to enlarge and to democratize the access to the humanity's cultural patrimony as asserts BESSER (1995): "few technologies have offered as much potential to change research and teaching in the arts and humanities as digital imaging. The possibility of examining rare and unique objects outside the secure, climate-controlled environments of museums and archives liberates collections from around the world breaks down physical barriers to access, and the potential of reaching audiences across social and economic boundaries blurs the distinction between the privileged few and the general public".

3 - Public Archive: Concept and Challenges

A public archive is defined as the group of produced or received documents by government institutions due to their specific, administrative, judiciary or legislative functions (Arquivo ...., 1996). In agreement with the same source, document is an information register independent of the physical media that contains it.

The growing demand for complete and easily recoverable information of great archives provoked the appearance of methods and advanced technologies in the field of the digitizing, storage, recovery and presentation of images and other types of historical documents (Arquivo ..., 1995).

Public archives or other collections maintaining institutions face several problems, due generally to the great accumulation of documents and their fragility, standing out the risk of degradation of the originals due to their direct and frequent manipulation, and of the difficulty of access to the information by the researchers and the public in general (Arquivo ..., 1995; ARAÚJO, 1992). Mentioning GARCIA (1994) on the digitizing work of the Archivo General de Indias in Spain: "the free access to the papers has driven to what is denominated the 'users inflation' in the reading rooms, inflation that visibly is producing more damages in the documents than those produced by the simple up to now to pass of time. In the Archivo General de Indias there are documents that can be handled more than fifty times with different objectives along the year. What would happen with them if the appropriate measures were not taken?"

4 - The Arquivo Público Mineiro

It is more and more accentuated, in the current days, the use of data processing systems in the several activities of the Public, Federal, State and Municipal Administrations in Brazil, with the objective to offer efficient and effective results in the reach of the purposes of administrative interest and, also, to assist the needs and the citizen's social and political rights.

The Arquivo Público Mineiro (APM) is a centennial institution founded on July 11, 1895 by the state law number 126. It worked up to 1902 in the historical city of Ouro Preto, when it was transferred to Belo Horizonte, the recently built capital of the state of Minas Gerais (Arquivo... ,1996). Nowadays the Arquivo Público Mineiro is tied up to the Secretaria da Cultura and is installed at a listed building registered by the Historical Patrimony (figure 1).

Its objective is: to "pick up, to guard and to conserve produced and accumulated documents by the organs of the public state administration, guaranteeing to the citizens full access to them " (Arquivo ..., 1996).

It possesses a collection composed of textual and special documents. For these last ones it can be understood: proceedings, cuttings, posters, films, pictures, maps and plants. The collection includes documents from the XVIII, XIX and XX centuries (Arquivo ..., 1996).

APM possesses now about ten million documental pages classified in the following way (Arquivo ..., 1995):

Public documentation (95% of the total)

  • Bound (80%)
  • Still in sheets (15%)
Private documentation (3%)
Special documentation (2%)

The main services, rendered exclusively in its facilities, are (Arquivo ..., 1996):

  • Consultation room to the documents, with an average of a thousand monthly consultations to the catalogs and documents;
  • Support library to the users;
  • Reproduction of documents, under authorization;
  • Referring publications to its activities and its collections.
5 - Objects Model of the Arquivo Público Mineiro

The "acervo" (set of collections) of the Arquivo Público Mineiro is divided in collections or funds. Several types of collections exist: textual, photographic, etc.

Each collection, seeking to facilitate the research and in agreement with its cataloguing, is divided in series and subseries, the division being used for chronological series. The collection, as well as its series and subseries, are composed by documents. A document is formed by one or more " document component " that can be a text, an image, a video, an audio recording or any type of information independent of its media.

The diagram of objects shown in the figure 2, based on the object-guided methodology OMT (RUMBAUGH, 1994), illustrates this situation. The multimedia information system implemented starting from this model tries to provide great flexibility in the consultation and recovery of referring information to the collections of a public archive.

6 - The Multimedia Information System

The development of a prototype seeking to motivate an informatization project of the Arquivo Público Mineiro tries to make compatible new technologies in Computer Science, mainly those based on multimedia databases and multimedia information systems, with the specific needs of a public archive in the conservation of the documental collections under its guard and at the same time to turn it available to the public.

The authors tried to model a system guided to easy use and flexibility in recovering historical documents and correlated information. This system could assist the researcher with deep knowledge of the researched object and also the laic, curious on a certain subject.

The Internet, due to its extraordinary diffusion for everyone, is considered an excellent way for knowledge popularization and for the democratization of the access to the information. For this reason the developed application has chosen WWW as the main access path to the historical documents.

The used tools for storage and management and for processing of the digitized documents were selected by their technological updating, interaction capacity with complex data as free texts and images and also for their integration with the Internet through the use of browsers such as Netscape from Netscape Communications Corp. and MS-Explorer from Microsoft Corp.

Thus, after a search in the market, the following group of software was chosen:

  • Object oriented database management system (OODBMS) Jasmine, supplied by Computer Associates International Inc. (CAI) and Fujitsu Inc. in evaluation period (McCRIGHT, 1997). Chosen by representing the state of the art in DBMS (its commercial version was only released in January of this year), being completely object oriented, with Web support and capable to work with great volume of complex data (CRAIG, 1997; FRANK,1997);
  • Jasmine Development Environment (JADE), applications generating system in Jasmine, from CAI (BOOKER, 1997);
  • Java from Sun Microsystems Inc., a programming language designed primarily for writing software to WWW (NEWTON, 1997);
  • Java Proxies (JP), supplied by Technology Deployment International Inc. A product developed by the partnership with CAI, it works as the middleware between Jasmine DBMS and the Java interface;
  • Interface using the Java language, chosen due to its portability and web support;
  • Symantec's Visual Cafe Pro from Symantec Corp., a rapid application development tool (RAD) for Java (MARTIN, 1997);
  • Digital image processing system developed by the Digital Image Processing Nucleus - NPDI of the Federal University of Minas Gerais (UFMG);
  • Browser for Web access, such as Netscape or MS-Explorer with Java support (DeVONEY, 1997).
 The use OODBMs is justified by the best adaptation of the object oriented data model in relation to complex data representation than the other data models, such as, for example, the relational model, usually implemented in DBMSs (FOLEY, 1996; PAZANDAK, 1997; GROSKY, 1989).

On the other hand, OODBMS is still emergent in the commercial market of software, becoming an useful investigative process to observe the behavior of this tool in a practical application as shown in this work (FRANK, 1995). Jasmine implements the basic concepts of object orientation such as encapsulation, polymorphism, inheritance, reuse and aggregation.

The use of a database management system also gives larger dynamism to the application since each new document incorporated to the database becomes immediately available for consultation.

Several research methods were implemented using the database management system potentialities:

  • Searches through the series and subseries in the way that a collection is classified. The researcher navigates in the system through virtual catalogs selecting his documents of interest;
  • Keyword searches that guide the user for a certain subject, event, person, etc;
  • Textual searches through any word or expression present in one or more document description.
 Besides the research methods, topics were incorporated to the information system with multimedia and hypertext support, containing additional information, biographies, bibliographies and glossary about themes inherent to the searched collection.

Figure 3 schematically illustrates the browsing levels allowed by the system:

Summarizing, the implementation of this informatization project is justified for several reasons:

  • It preserves the original collection of a public file, avoiding the direct handling of the documents and their misleading;
  • It facilitates the consultation of digitized documents through different search methods, allowing the simultaneous access for several users in geographically different places;
  • It facilitates to improve the quality of the documents presented to the user and to enhance interesting aspects of them with the use of digital image processing techniques (DIP), such as brightness and contrast control and borders enhancement, without altering the original digitized document;
  • It implements alternative search methods as textual and keyword search, besides the habitual catalogs use;
  • It allows the use of hypertext, making the searches more dynamic and friendly;
  • It allows several methods of remote and local access as Internet, Intranet, CD-ROM, DVD, workstations and local networks.
7 - Arthur Bernardes Historical Collection

As a prototype, the system presently contemplates just one of the historical collections of the Arquivo Público Mineiro, the collection of Dr. Arthur da Silva Bernardes.

Arthur Bernardes, Brazilian statesman, born in 1875 and died in 1955, was President of the Minas Gerais State in the 1918-1922 period, Brazilian President in the 1922-1926 period, senator of the Republic and several times federal deputy, being one of the most important persons of the beginning of the Brazilian Republican history in the first decades of this century (MAGALHÃES, 1973; MONTEIRO, 1994; AMORA, 1964).

The choice owed, besides the character's relevance, to the fact that his collection is rich and completely classified, being constantly researched. It is composed of a variety of types of hand written and printed historical documents, as pamphlets, cuttings, pictures, correspondences and even films representing well an archivistic collection.

Since it is a very extensive collection, a selection was made among the documents, choosing the most important ones, in way as to portray all the character's biography, the historical period in what he lived and including all types of documents. Emphasis was given to the photographic collection: about 300 photos, dated from 1893 up to 1955, were selected in an universe of more than one thousand and two hundred pictures.

To avoid their degradation, the selected original documents were photographed in the Arquivo Público Mineiro (figure 4) and the photographs were scanned through flat or manual scanners and then stored in the database system.

In figure 5, some screens of the system are shown, illustrating the several possible options for information recovery.


The prototype of the multimedia information system developed is executing its main purpose of being a catalyst for a larger process of informatization of the Arquivo Público Mineiro. The institution is now looking for partnerships to implement a similar system that contemplates, initially, all its pictures and plants of with more than 20.000 documents.

The use of the technology of object oriented multimedia databases in conjunction with World Wide Web was revealed pertinent and interesting for this type of application, in spite of some restrictions and incompatibilities, associated with the incipient stage of many of the used tools, which can be avoided in the future.


The authors are grateful to Capes, CNPq and Prodemge for financial support and to Arquivo Público Mineiro for technical support.

The authors also would like to thank Leonardo Kenji Shikida and Frederico Braga Torres Paulino, Computer Science students at UFMG, for their participation in the generation of the multimedia information system.


