2003 Dublin Core Conference:
Supporting Communities of Discourse and Practice—Metadata Research & Applications
September 28–October 2, 2003
What is Dublin Core Metadata?
In the early days of the web there were several institutions very interested in sharing information about networked resources. Along with Online Computer Library Center (OCLC), several representatives from various aspects of library science and information publishing gathered to establish a semantic protocol to describe resources for information sharing. This initial meeting in 1995 in Dublin, Ohio, established the foundation for what is known as the Dublin Core Metadata Initiative (DCMI). Today there are 15 elements that represent the simplified set of Dublin Core’s metadata.
Dublin Core (DC) has recently become incorporated as part of many international metadata standards, globally adopted by CEN Workshop Agreement CWA 13874 as the National Information Standards Organization (NISO) Standard Z39.85-2001, and as a draft with ISO as DIS 1583.
Initially the meetings were a series of opportunities for the working groups to meet face to face annually around the world. These workshops comprised tutorials, discussions on refinements and changes to the metadata elements, and workshops. Only recently have peer-reviewed papers, poster sessions, and, this year, pre-conference workshops been included to promote advancements in the scholarship and application of metadata, and more specifically the Dublin Core Metadata standard.
The conference opening this year brought together the various committee chairs, who acknowledged the different members involved in putting the conference together. There were kind words for Stuart Weibel, the former DCMI director, who came to the stage for a few words, the opportunity to take photos of the whole crowd and the chance to reflect on how the past year has advanced DCMI.
Metadata Primer and Metadata and Search, the two day-long pre-conference workshops that kicked off the conference, were enthusiastically attended.
This session covered some of the basics of metadata. Following is a summary of the terms discussed in the Metadata Primer, their meaning, and how they relate to metadata or Dublin Core. The overview given in the Metadata Primer laid the groundwork for those who were not fully familiar with all of the aspects of metadata and Dublin Core.
Metadata: Information about information; information or names we come up with to encapsulate and define information to understand it and to use it.
- Metadata lives in databases, spreadsheets, or XML (extensible markup language) structures. Databases have tables, fields, and values. Metadata in XML has elements, attributes, and values. XML is good for providing a tree structure to metadata. It offers a different way of looking at information rather than a spreadsheet.
- Metadata can identify, manage, and describe. It makes information usable by others (both machines and humans).
- Dublin Core Metadata provides a standard wrapper for a resource—a rich description. It provides information such as what a resource is and what’s in it.
- A metadata schema creates the rules by which certain metadata are used on an information collection.
- An XML schema provides rules around Dublin Core elements. An XML instance is what you type (the value).
- Uniform Resource Identifiers are strings of letters in a networked address. A URL is an example of a URI for the web. Other strings can be used for other applications.
- XML transfers the structure of information via the web.
- A namespaces is the declaration of a system used, and points to a resource that provides a “key” or decoder to decipher the system.
- RDF, or Resource Description Framework, supports a system of relationships between objects, allowing for the management of meaning, not just data. RDF is a particular type of metadata.
- Application Profile is a metadata schema that is created or tailored for a specific use. It looks like a list of metadata schemas in the top of XML RDF file notation. It also lists elements and their explanation or description.
- Registries are databases of metadata schemas.
After completing this workshop, most attendees were well-versed in metadata, its uses, and some of the issues to be addressed at the conference to follow.
Pre-Conference Workshop: Metadata and Search
This pre-conference workshop was a great mix of industry experts, presentations of case studies, and vendors talking about their wares. Hosted by the DC-Global Corporate Circle working group, many of the presenters came from corporate settings. The most dynamic and charged portions of the workshop were the questions and answers regarding the whole lifecycle of metadata establishment, quality, tools, workflow, and usage. Many of the concerns voiced during these question and answer sessions echoed concerns that have been raised at previous IA Summits on value-added service, ROI, and marketing/selling value. It was not surprising that such concerns related to information architecture, and better information practices, would spill over into this type of forum.
Notes from the workshop: http://www.dublincore.org/groups/corporate/Seattle/
Mary Lee Kennedy, Director of the Knowledge Network Group, Microsoft
The opening plenary was an interesting overview of the inner workings of Microsoft’s Knowledge Network. Mary Lee Kennedy, the director of the Knowledge Network group, presented on the work the group has done to bridge employees’ information needs through an integrated enterprise information architecture.
- The right quantity and quality of information
- Relevance and sufficient context of information
- A mechanism to establish trust in the information authority
- A way to identify who else has knowledge
- The ability to gather information from many starting points in the system
- The ability to learn about a new resources as it is relevant and available
Many aspects of this project have been documented in previous conferences, such as IA Summit 2001.
Encoding DC in (X)HTML, XML, RDF
Andy Powell, UKOLN
The purpose of this tutorial was to introduce to attendees new to Dublin Core options for encoding the metadata standard into (X)HTML, XML, and RDF. Adam Powell reviewed some basic information regarding DC, though the tutorial was based on the assumption that attendees were familiar with the proper usage of DC. Powell briefly reviewed the concept of the “abstract model,” which was further discussed at the Infrastructure Working Group.
Powell noted that your choice of encoding basically depends on the application that will be using the metadata. Without making any judgment on the encoding scheme, he reviewed how DC would be set up in each encoding type and offered caveats, limitations, and tips for how it would be handled. He gave lots of great examples in the presentation and provided some sample code in the appendix of his handout. Powell led the group through some of these samples, including simple errors that would be easy to miss when encoding DC. His recommendations for approaching the encoding were very helpful and very practical.
Working Group: Global Corporate Circle
Joseph A. Busch
Global Corporate Circle Working Group has evolved as Dublin Core is implemented more often. Dublin Core guidelines are pretty well established, but they are not always clear to the average practitioner. This panel discussed what additional features would be necessary to apply Dublin Core to specific domains, a topic that is still largely open to discussion.
The panel put forth some suggestions:
- Individuals working with metadata in corporate environments should share elements and values for those elements. The panel noted that there is a need to manage both internal and external information.
- Practitioners must be able to speak to applications developers, both internal developers and vendors. (There were a number of vendors at the conference trying to understand how Dublin Core is being used as well as their constituents and customers need.)
- Best practices should be shared around how Dublin Core is being implemented in the corporate environment and what people have learned as they have taken on and accomplished these implementation projects.
- It is important to define the unique needs of the corporate community that distinguish it from other communities who use and implement metadata.
Based on the panel’s discussion, the group decided to:
- Gather some best practices and use cases.
- Sponsor a conference track for the next DCMI conference in a year.
- Put together a workshop similar to the Metadata & Search workshop that was held at this conference.
At the end of the session, officers for the working group were identified and tasks were assigned.
Neil McLean, Metadata for education
Gottfried Zimmermann, Metadata for appliances
Wendy Chisholm, Metadata for accessibility
Eric Miller, Metadata and semantics
Stephen Stead, Metadata and ontologies
This special session looked at a few ways DC has been used in non-traditional ways. Areas represented include education, museums, and hardware technology. These areas represent disciplines for which metadata, both simple and complex, has been used to help describe materials, describe transactions, and describe semantically linking resources.
- Expressing the complexities of material for both dynamic and static content.
- Realizing that content has both a permanent and a transitive nature.
- Noting that content and services are all different, and that people and content need to be expressed in the metadata.
The next presenter looked at the potential use of DC in a museum setting, in which objects are viewed from a conceptual reference. He proposed a more event-centric model, bringing together relational history and object models with the use of ontologies. The presenter mentioned that DC was limited because it could not support the dynamic nature of ontologies. By using RDF and ontologies, he hoped to dynamically create conceptual references among a body of museum objects as a way to provide a better context for the objects.
Gottfried Zimmermann presented on the use of DC as the base metadata for physical hardware to enable various appliances to communicate with each other. The example he shared was a handheld device able to communicate with an ATM on an open protocol using DC as its metadata standard. Some of the value-add to this type of protocol was that various devices could openly communicate with each when the need came up, rather than relying on proprietary protocols limited to certain vendors and products.
More information can be found about this draft ANSI (American National Standards Institute) standard here: http://www.ncits.org/study/docs/ita00006.htm.
Paper Session: Use—Access, Needs & Users
Sheila Denn, University of North Carolina
Chen, National Digital Archives Program
Francesconi and Peruginelli
The focus of this paper session was on how user-centered design could inform the development of metadata systems. The papers ranged from user needs analysis, how systems were utilized, and how systems are changing to reflect what users need and want to support their work in education, legal research, statistical analysis, and historical archives research.
Statistical Metadata Needs during Integration Tasks
Sheila Denn from the University of North Carolina presented on the use of metadata to manage statistical data. The research project Denn spoke about involved government data with components for user interface, information management, and usability of the data for all users, not just information professionals. The project team had to address both experts of the data and typical end-users visiting the government website. The method for this project involved understanding users’ needs by identifying use case scenarios, gaining domain knowledge on how the information could be used, understanding the coverage of the data, interpreting the data, and evaluating navigation and layout of information.
The study found that there was not enough information to allow users to use the statistics properly in an integrated setting. For instance, data from various repositories were brought together to formulate a story. This data included the appropriate coverage of definitions, concepts, sources, location, time, units of measurement, and demographics. The project team discovered that discrepancies in these facets had an impact on analysis of information.
The project team faced other challenges beyond data integration:
- Information about the data was lacking.
- There were no means of describing limitations for how data could be analyzed.
- Users’ knowledge was limited.
- There were inconsistencies in interface design.
- Data labeling was inconsistent.
The team aimed to use metadata, such as DDI (Data Documentation Initiative), to help wrangle the data. Over time they plan to include additional, domain specific elements to refine their metadata system.
Functional Requirements of Metadata Systems: From User Needs Perspective
Chen from Taiwan’s National Digital Archives Program looked at three disciplines of current interest to her institution: arts, sciences, and biodiversity. Her organization is involved in the technical aspects of metadata and digital libraries. The program’s project was to establish user-centered requirements to develop a metadata repository. These requirements, and the system they would produce, would align with four dimensions for users: the object, the experts, the lifecycle, and the use.
The group realized that their needs went beyond a conventional relational database. Their mission was to identify the needs for a metadata system, bridge the relationship between content (creators, managers) and technology (tools, process, software, hardware), and determine how the two fit together as a unified system.
From the group’s perspective, there were practical operational relations that needed to be considered for the software, system, and users. From a series of interviews and questionnaires, they were able to establish a baseline for needs, as well as forge relationships with stakeholders. By analyzing the needs, they discovered six categories of metadata with 32 required elements. (More details can be found in their paper.) They were able establish this core set of metadata for the three disciplines. The group considered the required elements if they were absolutely necessary in more than a certain number of projects.
The group is still trying to resolve some challenges, such as coverage of functions, gaps between system implementation and requirements, metadata systems integration with external sources of content, and the exchange or transfer of metadata across institutions.
Access to Italian Legal Literature: Integration between Structured Repositories and Web Documents
Francesconi and Peruginelli presented the latest developments of the Institute of Legal Information Theory and Technologies, Italian National Research Council. The Institute is in the process of establishing a portal for both structured information and web documents. The portal serves as a repository of legal material that integrates the access and presentation of the information.
Part of the group’s process was mapping bibliographic records to Simple DC. They developed modules to populate web documents with Dublin Core. After the metadata was in place, they were able to build a fairly sophisticated classifier tool using Bayes’ theory as a way of organizing information in the portal. Much of Francesconi and Peruginelli’s presentation revolved around the measurements of success in correlating the documents based on this classifier tool. They used Dublin Core as a common view on the disparate data and content. They also looked to the Open Archives Initiative for gathering metadata from various legal repositories outside of the portal. Using Bayes’ algorithms and the metadata standard, they were able to create a federated portal for legal content.
Metadata for the Project Management Community
Dr. Robert Allen, University of Maryland, College of Information Studies
Barbara Richards, Food and Agriculture Organization of the United Nations
David Briggs, Boeing Company
This group was initially brought together to discuss the possibility of formulating some metadata elements that could be used for project management. Three case studies were presented, and there was a discussion following.
The session began with the premise that there might be some metadata for project management that could be defined and used across different information domains. However, as each presenter spoke, it became apparent that the metadata necessary for each project discussed was very closely tied to the subject domain (aerospace, engineering, and agriculture). It didn’t seem practical to try and glean any similar metadata to try and leverage across all the projects.
The ultimate goals of the project were to have a single repository of projects, to support organizational learning and duplication of effort, and to manage resources.
The NASA environment is highly distributed, with no single repository for project information. This presentation focused on the attempt to collect, in a single database, information about all projects at NASA. This was an impossible task; even collecting all projects just within the engineering group was a huge project. The task was difficult because of the complexity of defining a “project.”
Projects spawn other projects. Some projects last for months, some for years. There is a parent/child relationship to many projects. Also, security is an issue — there are many levels of secure access to sensitive information. In the NASA culture, there is a belief that if each project keeps its own information, it will better allow for control over the sensitive stuff. The presenter believes that this is actually not true.
There was a concerted effort to educate the company about the idea of a central repository. A central repository with well-defined levels of access and clear definitions around sensitivity of and access to information was a more practical approach from both a resource management and a security perspective.
Food and Agriculture Organization
A central repository for all projects in the UN’s Food and Agriculture Organization, from both developing and developed countries, provided to be immensely valuable, especially for developing countries. Each project captured methodology as well as lessons learned that could be used by other developing countries when planning a future project. Leveraging existing knowledge was key. The metadata required for this repository was presented. The overall message seemed to be that the needs of the intended audience drove what information was captured and how it was managed with metadata. Some references to Dublin Core metadata were made, mainly that it was used as a starting point and then augmented as needed for the project’s information management needs.
For Boeing, project management-related information was crucial for a number of reasons. First, government regulations required that information about all phases of a project be retained for the life of the project (which could easily be 70-100 years). There were also considerations around gathering and keeping information to use in any legal matters. One of Boeing’s key challenges was a changing technology. Because the shelf-life of products is so long, technology can change multiple times within a single life span. There was a move, therefore, toward digitizing all information. It was a costly effort, but ultimately necessary because of the complexity of Boeing’s information management issues.
The three presenters agreed that all projects, regardless of their subject domain, have some similarities. Projects are types of events, defined as an individual or collaborative enterprise, carefully planned with the purpose of achieving a particular goal.
The presenters also agreed that each project used a similar process for the development, identification, and application of metadata. The information covered in this session could be gathered and posted as a set of “best practices” for using DCM in a project management capacity.
The group agreed that the similarities lie within the project methodology rather than similar metadata for the subject domain of a project. Gathering project methodology information around metadata would be of benefit to the greater community to provide learning around project management.
Based on the discussion that followed the presentations, it seemed that most, if not all, projects were quite different in their goals as well as their subject domain. However, similar project methodology was used for all three projects. It might serve as a useful exercise to examine some of the popular project management software to glean some potential project metadata suggestions and compile those suggestions to be circulated within the special interest group for comments and refinement. The group could then present these suggestions to the larger Dublin Core community to provide a basis for some best practices for project management.
Metadata Value Spaces
Marcia Zeng, Kent State University
Rachel Heery, UKOLN (UK Office for Library and Information Networking)
Traugott Koch, Lund University NetLab, Knowledge Technologies Group
More and more values are being used differently by different groups. The objectives for this discussion focused on two main questions: What should be explored? What’s out there? There is current work on establishing some standards in different information domains, and current systems of use are already established.
A single controlled vocabulary (CV) or thesaurus is not adequate for everyone’s use. There’s an overwhelming level of complexity that makes a single taxonomy unwieldy. Also, terms in different domains are used differently. It is important to match thesauri to users’ terms, hence the need for multiple CVs and thesauri, even within a single organization or field of practice.
The session presenters recommend the use of controlled vocabularies, even when it is possible to have a very large vocabulary. The presenters suggested drawing from the following sources when creating a CV:
- Established CVs or classification schemas (LCSH, Dewey Decimal, etc.)
- Standardized vocabularies (DC: Type, format, language, etc.)
- Name Authority Files (Thesaurus of Geographical names)
- Controlled lists
One problem the group noted is that many established CVs are not created for networked environments. The presenters also discussed the need for extensibility and scalability in CVs.
For syntax, there is the pre-coordination vs. post-coordination issue: Does the system match up terms for complex meaning, or do we leave that to the user doing the searching?
The thing to remember, the presenters said, is that we are dealing with words, concepts, and their relationships. How we represent, standardize, and try to control them for organization purposes is constantly changing.
Working Group: Education
This working group is actually a neat cross-pollination opportunity for metadata and its use in education. While many in the audience came from the K-12 arena, there were definitely many who were involved in establishing relationships with other organizations also trying to tackle metadata for learning objects, learning activities, and content sharing.
The main focus of this session was to discuss the purpose of the working group, which led to an agreement that the mission of the working group needed to be re-evaluated. Since the group had an application profile in development in collaboration with the IMS/IEEE Learning Objects Metadata initiative and other learning-related metadata, that group agreed to continue the discussions around this type of work.
Links to abstracts related to posters: http://dc2003.ischool.washington.edu/program.html#posters
Overall the conference showed lots of enthusiasm in the use of standardized metadata for future inter-operability between institutions such as libraries, government agencies, and corporations. It was interesting to hear various DC people acknowledge that they needed more awareness of how DC was being used in non-traditional settings and ways, such as in corporations, with hardware, and in conjunction with other standards. It will be exciting to see how Dublin Core metadata and other metadata standards start to share a common ground with the information architecture community.
Next year Dublin Core will be taking its show to Asia. See you in Shanghai, China, in the fall of 2004. Stay tuned for opportunities related to calls for participation.
- Dublin Core
- Dublin Core 2003 Conference
- Dublin Core 2003 Proceedings
- More information about Dublin Core 2004 can be found at the main Dublin Core website.
- Madonnalisa Gonzales-Chan wrangles the volunteers for Boxes & Arrows. During the day she is Metadata Services Manager at Stanford Graduate School of Business. Her primary role is to provide guidance on information management best practices for applications and websites developed at Stanford GSB. She is also involved in metadata and taxonomy development initiatives at GSB.
Previous to Stanford, she was an Information Architect at AltaVista developing information and interaction architectures for AltaVista’s LIVE! portal and search services. Some of her past work experiences include a short stint as a volunteer on the re-architecture of the San Jose Repertory website, Academic Technology Specialist for the English Department at Stanford University, and Systems Librarian at Innovative Interfaces Inc.
Madonnalisa has a M.S. in Library and Information Science from the University of Illinois, Urbana-Champaign and a B.A. in English Literature from the University of California, Irvine.
In her free time she moonlights as a karaoke singer, rides “shotgun” on the tracks of Laguna Seca Raceway, and dabbles in finding the meaning of life through calculus. Oh, by the way, don’t hesitate to ask her about her name.
- Sarah Rice has her own consulting practice, Seneb Consulting, and has worked with clients such as Sun Microsystems, PeopleSoft, VeriSign and National Semiconductor. Her speciality is in information complexity and regularly applies information science principles and methodologies to reduce complexity in content-heavy information environments. She has a degree in Library and Information Science and has been practicing information architecture since 1995. Her website is www.seneb.com.
When not practicing IA, Sarah spends her spare time chasing after two small kids and attempting to reduce complexity on a whole different level.