Renardus LogoRenardus Logo
Renardus - The Academic Subject Gateway Service in Europe

 

Cross-browsing and cross-searching in a distributed network of subject gateways: Architecture, data model, and classification

Heike Neuroth & Traugott Koch

 


Table of Contents

1

Introduction

2

Quality-controlled Subject Gateway (definition and elements) and Renardus broker

3

Renardus Application Profile (RENAP)

4

Renardus Collection Level Description (RCLD)

5

Renardus Technical Approach

6

DDC Mapping for Cross-Browsing

 

References


1. Introduction

The EU project Renardus, an Academic Subject Gateway Service in Europe, is funded (January 2000 – June 2002) through the Information Society Technologies (IST) Programme 'Promoting a User-friendly Information Society'. This is a major theme of the European Union's 5th Framework Programme.

The twelve Renardus partners are drawn from European library and other information-related communities and are from countries like Finland, Denmark, Sweden, Great Britain, The Netherlands, France, and Germany. They work at the forefront of developments in quality-controlled subject gateways, providing access to selected quality resources for the academic and research communities. The aim of the Renardus project is to provide users with integrated access, through a single interface, to these and other Internet-based, distributed services. The approach being taken is to provide access to distributed quality-controlled subject gateways (high quality metadata collections) that will allow integrated searching and browsing of distributed resource collections. Further goals are to develop and define organisational modells, business modells, technical solutions and metadata standards (Renardus Application Profile, Renardus Namespaces, Renardus Collection Level Description).

Renardus intends to develop cross-searching and cross-browsing options of distributed subject gateways with the help of a common metadata profile and a common classification system all the locally used classification systems are mapped to.

The following table presents a list of all participating high quality-controlled subject gateways:

Name

Acronym

URL

Dutch Electronic Subject Service DutchESS http://www.kb.nl/dutchess/
Nordic Gateway to Information in Forestry, Veterinary and Agricultural Sciences NOVAGate http://novagate.nova-university.org/
Engineering Electronic Library, Sweden EELS http://eels.lub.lu.se/
Deutsches Agrarinformationsnetz DAINet http://www.dainet.de/
The Finnish Virtual Library - Virtuaalikirjasto FVL http://www.jyu.fi/library/virtuaalikirjasto/
Resource Discovery Network mit z.B. SOSIG, OMNI, Humbul etc. RDN http://www.rdn.ac.uk/
SonderSammelGebiets-FachInformationsführer

VLib History Guide

VLib Anglistik Guide

MathGuide

Geo-Guide

SSG-FI

http://www.sub.uni-goettingen.de/ssgfi/

http://www.AnglistikGuide.de

http://www.HistoryGuide.de

http://www.MathGuide.de

http://www.Geo-Guide.de

Archivserver DEPOSIT.DDB.DE - http://deposit.ddb.de/
Potential future partner:
Danish Electronic Research Library DEF fagportal http://www.deff.dk/vejviser/index.zap?sprog=eng
Les Signets de la Bibliothèque nationale de France - http://www.bnf.fr/web-bnf/liens/
BIBSYS - http://www.bibsys.no/english.html

 

2. Quality-controlled Subject Gateway (definition and elements) and Renardus broker

Subject Gateways:
Subject gateways are usually subject-based Internet services which provide links to resources like documents, databases or objects and support in this way systematic resource discovery. Although these resources are distributed, users could search via one single interface across them. Each resource is described with a minimal set of metadata like title, identifier, description, subject etc. The option to search or browse by subject is one common element of subject gateways.

Quality-controlled subject gateways:
Quality-controlled subject gateways are indicated by high standards for quality control and a rich set of metadata which enables users to cross-search several metadata elements. Koch (2000) defines a quality-controlled subject gateway as follows: "Quality-controlled subject gateways are Internet-services which apply a rich set of quality measures to support systematic resource discovery. Considerable manual effort is used to secure a selection of resources which meet quality criteria and to display a rich description of these resources with standards-based metadata. Regular checking and updating ensure good collection management. A main goal is to provide a high quality of subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing."

The following elements can be used to define a quality-controlled subject gateway:

Renardus Broker:
The Renardus broker as a resource discovery broker service is a third party, creating a broker service based on the resources selected and described by individual quality-controlled subject gateways. Renardus is a multidisciplinary broker service based on subject gateways that cover different subject areas like Agriculture, Engineering, Earth Sciences, Mathematics, History, Literature, Social Sciences etc. The Renardus broker pilot will not be restrictive in terms of collections, resources, granularity, subject areas, language of resources/metadata, etc. The target user-group of the pilot system will be the common target audience of the participating subject gateways: the higher education and academic research community. Indexing and searching facilities in the pilot system will be based on a common set of metadata elements and main subject classes. Mappings from the local sets of metadata and subject classes to the common Renardus metadata set for cross-searching and to the common classification system for cross-browsing is necessary. The broker system is not restrictive to types of resources. However, the basic requirement is that the resource brokered to should be online accessible and in most cases freely accesible.

 

3. Renardus Application Profile

To provide access via cross-searching and cross-browsing to distributed high quality metadata collections the definition of a common core set of metadata is necessary. Each participating partner of the Renardus broker service is responsible for mapping his metadata format to the common Renardus metadata format (see chapter 5). For the definition of such a core set of metadata information about partners‘ metadata formats is necessary:

First some general thoughts about elements which are meaningful for a service like Renardus were necessary. Of course subject is one important element to give users the possibility to search and browse for their respective interests. Also title, identifier, description and language are meaningful because of the multilingual coverage of the resources. In addition, a language tag that indicates the language of the metadata might be useful. Some partners create metadata in their native language, sometimes added to an english version (e.g. for keywords or description). Users should have the possibility to select their favoured language for cross-searching purposes.
Some filter options for the search or some sorting processes after a search are also useful.

The information about partners’ metadata formats were gathered via a questionnaire which was developed in the context of workpackage 6 in Renardus. Results lead into a first definition of metadata elements without any semantic or syntactic information.

The common core set of metadata consists of the following elements:

dc:title, dc:creator, dc:description, dc:subject, dc:identifier, dc:language, and country.

A second survey gathered information about qualifiers, codes, rules, semantic and syntactic definitions of each element. For each metadata element a table sheet was formulated which informs about the following:


Name Name of Metadata field
Namespace DCMES version 1.1,
DCMES Qualifiers (2000-07-11),
Renardus Metadata Element Set = RMES version 0.1, or
Renardus Metadata Element Set = QualifiersRMES Qualifiers version 0.1
Refinement(s) Element Refinements used in Renardus: These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope
R Refinement(s) Renardus specific element Refinements, see above
DC Encoding Scheme(s) These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent, the value may still be useful to a human reader
R Encoding Scheme(s) Renardus specific encoding scheme, see above
Form of Obligation In the Renardus data model the obligation can be: mandatory (M), strongly recommended (R) or optional (O). Mandatory ensures that some of the elements are always supported. An element with a mandatory obligation must have a value. The strongly recommended and the optional elements should be filled with a value if the information is appropriate to the given resource or provided by a subject gateway, but if not, they can be left blank.
Repeatable Metadata field is repeatable: yes or no
LQ "LANG" Language Qualifier "LANG": to give information about the language of the content of a metadata field (ISO Code 639, two letter), yes, no (or possible: prototype system)
DC Definition Dublin Core Definition of metadata field
DC Comment Dublin Core comments to this metadata field
R Definition Renardus definition of metadata field
R Comment Renardus comments to this metadata field

 

Very early in the discussion of a Renardus data model it was clear that the data model should be based on Dublin Core as far as possible. Only one Renardus "content" element is neither a DC element nor a DC based element and this is "Country". All other elements and qualifiers (element refinements and value encoding schemes) are based on Dublin Core where possible. In case no encoding scheme or refinement from Dublin Core can be used, the definition is a Renardus qualifier.

The Renardus broker consists of the content databases (decentral: Z39.50, see chapter 5) with the agreed eight elements and two administrative elements and the Collection Level Description database (see chapter 4). The content database contains the metadata records extracted from the individual service providers' databases in accordance with the Renardus data model (common core set of metadata). The Collection Level Description database contains information on the collection description of each participating subject gateway.

The Renardus Application Profile consists of four namespaces:

The following table provides an overview of all metadata elements and qualifiers which are part of the Renardus Application Profile. Future elements like publisher, rights, or format are under discussion. A first final version of the Renardus Application Profile encoded in RDF/XML will be ready in July 2001.

 

Metadata Element Obligation Repeatable LQ Namespace Comments
dc:title

M

NR

yes DCMES -
dc:title.alternative

O

R

yes DCMES Qualifiers -
dc:creator

R

R

no DCMES -
dc:creator (R Qualifiers)

R

R

no RMES Qualifiers Creator(s) are person(s) which are responsible for the intellectual content of the document(s), e.g. webmasters are no creators. If this field is applicable it is strongly recommended to provide the creator.
For Renardus normalization process it is strongly recommended that last name and first name are clearly distinguishable.
dc:description

M

R

yes DCMES For cross-search reasons the field description must contain free text.
dc:subject

M

R

yes DCMES In the prototype system there will be no further distinction between the several kinds of subject (keywords, classification system) and the provision of keywords is strongly recommended. In the final system the provision of keywords is required.
dc:subject

R

R

yes DCMES Qualifiers -
dc:subject (R Qualifiers)

M

R

yes RMES Qualifiers All other encoding schemes used by the partners, see Acronym & Abbreviation list at http://renardus.sub.uni-goettingen.de/renap/racr.html.
dc:subject.Ren-DDC

M

R

no RMES Qualifiers DDC 21: adapted DDC version for cross-browsing puporse. Only captions and not notations will be displayed.
dc:identifier

M

R

no DCMES Qualifiers In the prototype system no distinction will be made between resource URL, mirrored, copied resource URL(s) and URL(s) for archive reasons.
dc:identifier (R Qualifiers)

O

R

yes RMES Qualifiers Renardus refinements for translated sites and/or mirrored, copied sites, will be realized in the final version.
dc:language

R

R

no DCMES Qualifiers The language code is the ISO 639-2, three letter code. A mapping between the two letter (ISO 639-1) and three letter language code will be found on the LoC site: http://lcweb.loc.gov/standards/iso639-2/englangn.html
dc:type

R

R

no DCMES Subject gateways should provide their original types without encoding scheme.
dc:type.DCT1

R

R

no DCMES Qualifiers A mapping from partners' type list to DCMI Type Vocabulary (DCT1) is strongly recommended.
Country

R

NR

no RMES Qualifiers 3166-1 (two letter code)
Full Record URL

R

NR

no RMES Qualifiers A URL that leads to a detailed display of each record at the originating service site.
SBIG ID

M

NR

no RMES Qualifiers A stable unique acronym also well defined in the Collection Level Description.

 

The Renardus Application Profile will be re-worked over the time. At the moment we distinguish between the:
Until now it is not clear when we will switch from the prototype to the operational system. Probably this will be an ongoing process and will depend on the results of the normalization/harmonization processes (see chapter 5) of the partners.

 

4. Renardus Collection Level Description (RCLD)

The Collection Level Description schema is a simple format to describe collections, locations and related people or organisations. This schema was developed in the context of the RSLP (Research Support Libraries Programme) project. The goal is to describe collections in a consistent and machine readable way by using the Resource Description Framework (RDF). A simple Web-based tool was also developed to enable project partners to describe their collections. Renardus adapted this tool to provide users information about participating subject gateways in Renardus. The aim of the Renardus Collection Description is:


The format of the Renardus Collection Level Description is based on RSLP (UKOLN) and consists of metadata elements from the Dublin Core format, CLD format, and Renardus specific format. The following table gives an overview of RCDL:

 

Renardus Collection Level Description
Attribute RDF property Definition
Dublin Core (based) elements:
Title dc:title The name of the collection.
Identifier dc:identifier An unambiguous reference to the collection within a given context (encoding scheme: URI).
Description dc:description An account of the content of the collection.

Comment: Renardus will provide a standardized structure of the content of description with information about granularity of collected resources, type of subject indexing, etc. in context of D6.5.
Language dc:language The main language(s) of the metadata in the collection with quantitative indication.

Syntax: Free text.
Publisher dc:publisher An entity responsible for making the collection available.
Comment: The organization etc. who is responsible for the intellectual (not technical) distribution of the collection.
Format.Extent dc:format dcq:extent The size of the collection.

Comment: It is recommended to provide the number of records as follows: about x records.
Date.Issued dc:date dcq:issued Date of formal iisuance (e.g. publication) of the collection.
Subject dc:subject The topic of the content of the collection.

Syntax: Main DDC captions for the subjects represented in the subject gateway.
Relation dc:relation
dcq:hasPart dcq:isPartOf
A reference to a related resource.

Syntax: Acronym followed by empty character must precede other describing text for every related subject gateway.

Comment: At the moment only used by RDN and its member subject gateways.
Collection Level Description elements based on RSLP schema:
Country cld:country The country in which the collection is physically located.

Syntax: Free text.
Renardus specific Collection Level Description elements:
Subject Notation rencld:subjectNotation The topic of the content of the collection.

Syntax: Main DDC notations and captions for the subjects represented in the subject gateway: DDC notation1 – DDC caption1; DDC notation2 – DDC caption2 etc.

Comment: Element content not displayed in human readable Collection Level Descriptions.
For technical and license reasons this element is declared as a Renardus CLD element instead of a DC element.
Acronym rencld:acronym The acronym of the collection.
Resource Language rencld:resourceLanguage Language(s) of the described resources.

Syntax: Free text.
DDC mapping URL rencld:ddcMapping URL of local DDC mapping information in Renardus format.

Comment: Element content not displayed in human readable Collection Level Descriptions.
Z39.50 Location rencld:Z3950Location The online location of the Z39.50 server of the subject gateway

Syntax: machine name; port number; database name

Comment: Element content not displayed in human readable Collection Level Descriptions.
Logo URL rencld:logoURL The URL of the logo (image) of the subject gateway.

Comment: Element content not displayed in human readable Collection Level Descriptions.

 

The tool of RCLD allows all Renardus partners to create the description of their metadata collection. It is possible to complete and update the collection at any time, because the XML file is saved locally by each partner. The Renardus broker gathers these XML files to provide users a well-structured description of all subject gateways at the Renardus WWW sites.

 

5. Renardus Technical Approach

Renardus uses a decentralized architectural model where major subject gateway services across Europe can be searched and browsed together through a single interface provided by the Renardus pilot broker. It is based on a generic broker-architecture and a common data-model.

In the following we describe the main steps in the developmental process, preparation and implementation, and the main elements of the technical solution.

 

PREPARATION

The technical solutions are preceded and defined by the results of a number of investigations and enquiries into available standards and technologies, functional requirements of users and service providers, the common datamodel and the overall architectural design. For the pilot, several time, economic and technical limitations had to be taken into consideration, however.

Investigation of available standards and technologies
Information about relevant, selected Internet standards, protocols and practices has been collected as a background for the decision on potential Renardus solutions. Among the candidates considered were HTTP, Z39.50, LDAP, Whois++, CIP, Dublin Core, IAFA Templates, RDF and XML.

Investigation of functional and user requirements
User requirements for the Renardus broker system were collected in two ways: a) participating service providers described and ranked their requirements for the functionality and b) use case scenarios were constructed from the end user's perspective. The outcome directed the technical solutions both directly and via the data modelling work. Essential requirements are implemented in the Renardus pilot system.

Some of the most important outcomes were: the preference for a distributed architectural model, daily updates, Dublin Core semantics and RDF/XML syntax for the metadata records, Z39.50 as the retrieval protcol. Both searching and browsing should be supported and elaborated use of metadata elements for searching and filtering needs to be offered. Mappings between classification schemes are also needed so that the cross-browsing functionality can be implemented. Resources presented as search results in Renardus should always be linked back to the gateway which selected and provided the description. Customization and local user interfaces to Renardus should be possible.

The Use Case scenarios are a means of formally specifying what the final Renardus service will do. They describe how various players will actually use the service, without specifying any technical solutions for how this functionality will be achieved. Functions such as: performing a simple search, cross-browsing by subject, or displaying results are covered. From the perspective of service administrators, other use cases cover requirements such as: maintaining metadata indexes, assuring data quality, or inserting data into Renardus. The use cases and activity diagrams are based on the Unified Modelling Language (UML).

Development of the datamodel
The datamodel takes these requirements into consideration when harmonizing the metadata models in order to provide semantic interoperability between the subject gateways. A common list of Renardus metadata elements with all the necessary specifications about their definition, qualifiers, repeatability, encoding, level of obligation etc. is defined, specifying used namespaces and presented as an application profile for the Renardus project.

Renardus chose a distributed architectural model without any central metadata repository according to the preferences of the participating subject gateways. The main reasons are to leave full local control over the content and to prevent intellectual property rights problems. In order to improve performance and to allow advanced functionality a few indices might be kept centrally in the future, however.

The distributed search and browse broker allows an integration of information from several suppliers into one single user interface. The Renardus architecture consists of a number of interoperable databases simultaneously searchable over the Internet. End user access to the merged data is done through a Renardus WWW user interface. Each participant or group of participants is required to set up and maintain a Renardus server with a content database and administrative information.

Z39.50 is used as the search and a retrieval protocol, supporting complex queries against highly structured data. Adding other retrieval protocols might be a future option. The Renardus Z39.50 profile, formalizing the way the protocol is used, is compliant with most of the Bath profile, which is designed for library applications.

 

IMPLEMENTATION

Data normalization
The Renardus technical approach and architecture requires all subject gateway partners to normalize their resource metadata according to the data model and all other requirements. This may involve extracting the relevant fields from their existing databases, to adapt to semantics, encoding schemes and syntax chosen in Renardus, to run a classification mapping script etc. The normalized data is then exported into a local Z39.50 Renardus database. Certain administrative information has to be provided too. A generic normalization toolkit with Z39.50 configuration files and a conversion script has been provided to assist in this process.

Renardus developed a RDF structure with XML syntax to store the metadata for each resource provided to Renardus. The conversion to this format is supported by the normalization toolkit as well. Unfortunately, there is not a fully agreed standardised approach to how to do this for qualified Dublin Core metadata.

In order to support cross browsing all local classifications are about to be mapped to DDC as the common browsing system. The mapping work between DDC classes and local classes is supported by a tool adapted from the German CARMENx project (using mySQL, PHP, Javscript), capable to work in a distributed way. A classification mapping script uses the mapping file output from the mapping tool to create DDC class mappings from the local classification for every resource and includes it into the resource metadata on the local Renardus server.

Individual subject gateway collection descriptions are important in a decentralized service. They can be used to support the (in the future even automated) selection of individual gateways for searching and to provide background information for human users. A tool [F: http://renardus.lub.lu.se/cld-tool/], adapted from the UK RSLP Collection Description effort, implements the Renardus Collection Level Description schema based on Dublin Core and assists in creating consistent and machine-readable subject gateway descriptions. These descriptions are used by the broker to provide information about the gateways and to find administrative information incl. the local Renardus servers and their logos.

Creation of participant's Renardus servers
Each participant or group of participants (like in the case of the Finnish Virtual Library) is required to set up and maintain a Renardus server which contains their content normalized to the Renardus datamodel and Z39.50 profile. Renardus Server Utilities are provided to make this task easier.

In the standard solution the service makes its content directly available through a Z39.50 interface conforming to the Renardus profile. Most participants are using the Z'mbol information system from IndexData. A Z'mbol configuration file has to be written and the Z'mbol indexer to be run on the record/files generated.

Alternative server solutions are possible following the pilot period: e.g. a server with protocol conversion where a Z39.50 front-end (conforming to the Renardus profile) is interfaced directly to one or several native database servers (all using the same protocol) using a protocol conversion tool.

Implementation of broker software and functionality
The Renardus service uses a Z39.50 to WWW gateway capable of simultaneous cross searching all the Renardus servers. A gateway of this type has to make a compromise when interfacing Z39.50, a statefull protocol, with HTTP which is stateless. We use the Zebril gateway package which is based on Europagate. It supports basic Z39.50 Search, Browse, Present (MARC and GRS-1), parallel search, sessions and reuse of Z39.50 associations.

The access-control to the gateway will use regular HTTP methods, making it possible to use whatever standard package for user administration available. Access to the Renardus Z39.50 servers will be under control of the individual services or groups of services.

The technical specifications for the pilot broker as published in the documentation [F: Deliverable D2.2/2.3, Ch. 2.1.1] are preliminary and may change during the implementation, due to new functional and/or performance requirements.

The mapping tool produces a Renardus mapping format as output from a set of mySQL databases each containing a pair of mappings between a local system and DDC. Subject cross-browsing is then provided by the Renardus broker in the common system, by creating web pages on the fly using the DDC classification structure and by adding links to mapped related classes from the local gateways exploring the mapping data.

User interface implementation
Based on the use cases (see above) the Renardus user interface is built. They are translated into a set of initial screen specifications [F: Deliverable D2.2/2.3, Ch. 3.2] for the following screens: Homepage, Advanced search screen, Index scan window, Advanced search page after index scan and selection, Browse by subject screen, (Preliminary) Result screen, Sorted result screen, Participating gateways screen and Help (index) screen.

Supporting layout solutions, the Zebril gateway is screen template driven with special tags embedded in HTML files. Record presentation is handled through TCL scripting which is capable of advanced query formating, support for multiple profiles, and result set merging. TCL scripts deal with the Renardus Z39.50 profile and all other advanced functional requirements of the user interface.

 

6. DDC Mapping for Cross-Browsing

In order to accomplish subject cross-browsing between all resources of the participating gateways, a feature the renardus partners agred upon, the different local classification systems need to be mapped to a common classification system. This task comprises some theoretical investigations, a certain check of existing similar applications, formulation of guidelines for the work of Renardus and finally the mapping effort itself.

A detailed description of the effort and the working guidelines are available from Renardus on request. (DDC Mapping Report and DDC Mapping Guidelines)

Subject cross-browsing and classification:
Quality-controlled subject gateways are services that provide access to selected Internet resources based on a rich set of metadata (see chapter 2). These services typically offer hierarchical browsing structures based on subject classification systems. The classification scheme is being used for collecting related resources into groups sharing the same topic. This is the case even for a service like Renardus which brings together many different subject gateways for cross-browsing.

Following Koch, Day, et al. (1997), a site that organises knowledge with a classification scheme has several distinct advantages over sites that do not:

The searching is enhanced by the following advantages:

DDC versus other classification systems:
The Renardus service will give access to resources from all kinds of subjects, published world-wide and in many languages and it is intended to be offered to an international multi-disciplinary community of users. Considering these issues an existing universal classification system should be selected to build the common browsing structure in Renardus.

DDC and UDC both have a good multilingual capability due to the fact that the codes they produce are numerical and their schedules have been widely translated (into up to 30 different languages).
Furthermore universal classification systems like DDC and UDC are used by many Internet services and are readily available in machine-readable form. Using such well-known and international distributed classification systems guarantees maintenance by the owner of this classification system.

A narrower investigation revealed important advantages of DDC as compared to UDC for an application like the cross-browsing in Renardus:

When it comes to digital library applications and especially to the classification mapping task in Renardus, the DDC and its development efforts are clearly superior to the UDC.

DDC research license:
The basis for the usage of the DDC in the Renardus project is a research agreement with OCLC Forest Press, the owner of the DDC. It allows Renardus to use the full enhanced DDC classification system to construct and offer common Renardus cross-browsing pages.

DDC notations shall not be displayed. Renardus can adapt captions of DDC classes to European vocabulary as long as they express the same coverage. According to the agreement only the English language version of the DDC can be used in the Renardus browsing pages.

Analysis of partners classification systems:
The classification and browsing solutions in the participating gateways are very heterogeneous. In order to prepare the mapping effort a thorough analysis was necessary.
In Renardus, 11 subject gateways so far will need to map their classification system to DDC. Some organisations will map their classification system to DDC on a broker level, at least initially, although they offer individual access to several subject gateways with different classification systems as well (e.g. FVL, RDN with SOSIG, OMNI etc.).

Most of the subject gateways use a thematic classification system, only 5 of 11 gateways support an universal system. Only one subject gateway has no classification system, but subject headings for a browsing structure presented in an alphabetical list (DAINet). The majority of classification systems are more or less local or national. Among the international systems are adaptations of the Basisclassificatie (BC) and the Dewey Decimal Classification (DDC).

Several gateways are about to change to or to adapt DDC:
From January 2001 the DDB (Deposit Server) catalogues the online dissertations also directly with DDC. The SSG-FI subject gateway History Guide will probably change the primary classification system from GOK to DDC in the context of co-operation with another German history subject gateway. A DDC adaptation is used for the subject gateway Les Signets (future partner).

Nearly all subject gateways offer an English classification language and thus an English browsing structure. Only the Deposit Server (furthermore SWD is only in German) and DAINet offer no hierarchically structured browsing page.

Some of the thematic systems are very special and have a deep structure like EELS or most of SSG-FI subject gateways. EELS (Ei) and MathGuide (MSC) use an international thematic classification system. All other thematic systems (NOVAGate and FVL) are not so extensive: one or two levels maximum and in total up to 60 classes need to be mapped to DDC. For EELS about 300 classes have to be mapped, structured in 5 levels. For SSG-FI subject gateways between 200 and 400 classes, structured hierarchically in up to 5 levels have to be mapped. The mapping depth and politics has to be adapted to the specific conditions of each gateway.

Mapping approaches and issues:
A few practical principles are needed to keep the mapping work consistent and the resulting Renardus browsing pages balanced.
The mapping relationship is expressed between a pair of classes and not between a (DDC) class and individual resources. It is carried out one-directional, from the DDC classification to the local classifications (browsing systems).

When treating a certain class one should look at the same level of specificity of the subject content and try to find a fully equivalent class first, then look for true subset or superset classes and finally watch for overlapping situations (cf. the mapping relationships below).

The mapping process, as reflected on the Renardus browsing pages, should be completely finished for the top level of the hierarchy and then move downwards in the local hierarchy, to assure a balanced Renardus service at all times. The final goal is, of course, to map all local classes to the DDC. The priority, however, is to map well used classes in the local gateway.

Many approaches and issues need to be discussed and decided upon, sometimes changed as well. Examples are: Specifics of the usage of DDC in Renardus, consequences of the usage and display of the mapping in Renardus, the depth of the mapping at both sides: the DDC and the local systems, how to treat local classes which contain both generalities and specialities, the exclusion of non-topical classes (auxiliary tables), the average number of allowed mappings etc.

Organisational issues are how to assure quality control and coordination of the mapping work and how to update the mappings.
A practical problem Renardus is facing requires an investigation of to what a degree mapping between thesaurus descriptors or subject headings and the DDC can be used as a replacement for mapping between classifications.
A permanent and very important issue is how to find the best trade-off between consistency, accuracy and usability in the Renardus cross-browsing service.

Mapping relationships:
Many traditional and less advanced mapping projects like conversions or concordances between pairs of two classification systems for usage in OPACS or union catalogues limit themselves to just establishing a connection between pairs of classes. They leave it open what the character and degree of the indicated equivalence is.

The cross-browsing service in Renardus aims to mediate between many different and heterogeneous classification systems using the DDC classification as a common "switching language" and browsing structure. The structure and level of detail, the vocabulary, language and cultural context is extremely different between these locally applied classification systems and the universal DDC.

Therefore we expect a straight and full equivalence between the content of two classes to be a rather rare situation. The same judgement has been made by other related projects like CARMEN.

In the Renardus Subject browsing pages the user needs to be told that certain links from a DDC class point to a class in a local gateway containing a broader or narrower area of content, showing major or minor overlap with the DDC class. That is especially true since there quite often will be mapping links to several different classes in different subject gateways. One might be fully equivalent, another only showing a minor overlap.

The need for a more detailed specification of the degree of equivalence is even greater when the mapping between the local class and the DDC classes is used in the Renardus advanced (subject) searching feature. The result list could be ranked according to the degree of relationship between the individual resources local class and the DDC class used for searching.

The following are the mapping relationships we are using:
The local class is either:

compared with the DDC class.

In Renardus we do uni-directional mapping only, from the DDC classification to the local classification(s).
The three types of equivalence require that one of the two classes is a true subset of the other, that it is not to be mapped to another class of the comprising classification. Full equivalence is the intermediate situation where both classes are basically 100% equivalent.
The two overlapping relationships require that parts of both classes do clearly not belong to the subject content of the other class.
(see the illustration at http://renardus.sub.uni-goettingen.de/conferences/elag2001/mapping_language.gif).

Technical solution:
The sources which are used for the classification effort are the local classification systems and the enhanced DDC as presented in OCLC's CORC Web-Dewey.

To support the practical effort Renardus has adapted a mapping tool from the German CARMENx project. The tool is Web-based and requires the free database software mySQL, an Apache web server, JavaScript and php scripts at the server side. Classification systems and mapping information is kept separately at different servers, partly for legal reasons.

The user interface consists of three main windows: one for displaying and navigating the origin classification (DDC), the other the local target classification and the third one receives and displays the mapping information including the relationships and notes. Mapping relationships are displayed as links in both classification windows.

The tool has been adapted to create and store the mapping information in a mySQL database in the syntax specified by Renardus. This information is imported with Perl scripts by the main Renardus software in order to create the mapping links on the subject browsing pages and by the local normalization scripts in order to generate a DDC mapping for every resource in the local gateway's Renardus databases.

The enhanced DDC is delivered by OCLC in several XML encoded data files with a XML DTD, tag/attribute information and additional hierarchy information. They contain 25 500 main schedule entries (notations) and 35 700 different records.

Using these files an initial complete hierarchical set of web pages is generated allowing a user to navigate through the DDC hierarchy.

Usage of the DDC mapping in Renardus:
The DDC mapping is used in different ways in the Renardus prototype. The main usage is to allow the user to browse through the subject hierarchies of the DDC classification at Renardus and to "jump" from many classes to related (mapped) classes and directories in the local subject gateways. This type of navigation can be called "browse and jump". Renardus displays the different equivalences and degrees of overlap separately (cf. the Renardus Subject browsing pages).
This approach allows the user to see the resources in the context of their local browsing structures and to continue browsing there. All added information and the full local service context is available.
The following example shows the results of participating subject gateways by browsing Mathematics and then Analysis.

Screen Shut Renardus Browsing Page

Most probably, Renardus will offer "virtual browsing" as well, from every DDC class at the Renardus browsing pages. This saves time and may better serve users who do not care to see the local context and additional information. All resources from all local gateways who are mapped to a certain DDC class are merged and listed in one result list.

The DDC mapping is used in the Renardus advanced search service as well (cf. the Renardus Advanced search pages).
Apart from the subject element which combines all local subject information (uncontrolled keywords, controlled keywords from thesauri and subject headings systems and classification captions and notations) the captions of the mapped DDC classes are offered for searching in the element named DDC classification.
A search opens normally an index scan window that allows the user to select among subject entries hit by the search.

This way of directly searching a given DDC class and everything mapped to it in Renardus will most probably be offered, in addition, at every subject browsing page.

Layout and user interface solutions need still to be optimised, based on user evaluations and usability research. According to the agreement with OCLC only the DDC captions can be displayed, not the notations. Renardus is not allowed, at the moment, to make changes in the DDC captions other than adaptations to specific European terminology. Multilingual captions are not available to the project yet.
When the mapping work is finished the completely empty branches in the lower part of the hierarchy will be removed from the display as long as they are not needed to assist as transitional steps during the browsing. Renardus is investigating if and how unnecessary intermediate steps can be reduced.

Future:
The Renardus classification mapping for cross-browsing is a large-scale experiment and can not build very much on theory or practical experiences. One has to hope that it can be continued as a combined development and research effort influencing future solutions.

An immediate side-effect of the work is the possibility to gain experiences and to make recommendations for improved subject access efforts in gateways and brokers. The focus on interoperability and the Renardus cross-browsing service will hopefully stimulate changes at the local level. They might greatly improve the consistency, accuracy and usability of the mapping effort and of Renardus as a whole. Potential areas of improvements might involve to:

The most immediate steps would be to further develop the mapping methodology, theory and practical solutions, and to investigate the usability of the browsing service with real users. User interfaces for browsing in large and distributed digital services are not very frequently occurring nor well developed.

Opportunities to further research methods to support classification and classification mapping with automated techniques would be very valuable and reduce the very expensive intellectual efforts needed today. Further future goals are to convince the owners of established classification systems to provide mappings themselves and to maintain them as part of their vocabulary services. This would make the task sustainable. Co-operative development and standardisation efforts are needed in order to prepare vocabularies for distributed usage over the net with suitable encoding, identification and appropriate search protocols.

A more detailed acount of the Renardus DDC mapping work will be published in the Proceedings of the IFLA Satellite Conference "Subject retrieval in a networked world" (Koch, Neuroth, Day 2001).  
 

References

Bath Profile: International Z39.50 Profile for Library Application, Release 1.1 June 2000 http://www.ukoln.ac.uk/interop-focus/bath/ or http://www.nlc-bnc.ca/bath/bp-current.htm, see also: The Bath Profile - UKOLN Interoperabilty Focus http://www.ukoln.ac.uk/interop-focus/bath/

CLD Collection Level Description http://ukoln.ac.uk/metadata/cld/

CORDIS: 5th Framework Programme http://www.cordis.lu/fp5/home.html

Dublin Core Metadata Element Set, Version 1.1: Reference Description http://www.dublincore.org/documents/dces/

Dublin Core Qualifiers http://www.dublincore.org/documents/dcmes-qualifiers/

Elements of a quality-controlled subject gateway are based on Traugott Koch (2000), Renardus deliverables "Scoping Document (D6.2)" http://www.renardus.org/deliverables/d6_2/D6_2summ.html and Questionnaire Gateway Survey for the Evaluation Report of Existing Data Models D6.1 http://renardus.sub.uni-goettingen.de/wp6/d6.1/questionnaires/index.html

Koch, T., 2000, Quality-controlled subject gateways: definitions, typologies, empirical overview. online information review - The International Journal of Digital Information Research and Use, Volume 24, Number 1, 2000: 26. http://www.mcb.co.uk/oir.htm

Koch, T., Day, M., 1997, The role of classification schemes in Internet resource description and discovery. DESIRE deliverable D3.2 (3). http://www.ukoln.ac.uk/metadata/desire/classification/

Koch, T., Neuroth, H. Day, M., 2001 (forthcoming), RENARDUS: Cross-browsing European subject gateways via a common classification system (DDC). In: Proceedings of the IFLA Satellite Conference "Subject retrieval in a networked world", Dublin, OH, USA, Aug 14-16, 2001.

Report on DDC Mapping and DC.Type Mapping WP7, D7.4 http://renardus.sub.uni-goettingen.de/wp7/d7.4/index.html

Renardus http://www.renardus.org

Renardus Deliverables available at: http://www.renardus.org/deliverables/


Renardus Project at SUB Goettingen http://renardus.sub.uni-goettingen.de/

RSLP Collection Description http://www.ukoln.ac.uk/metadata/rslp/

RSLP Collection Description Tool http://www.ukoln.ac.uk/metadata/rslp/tool/

The Z39.50 Document http://lcweb.loc.gov/z3950/agency/document.html


Written by Heike Neuroth (SUB) & Traugott Koch (NetLab/DTV)
Created: 19 May 2001
Last update: 19 May 2001

 

SUB logo   NetLab logo