Can We Agree?

One of the interesting ingredients for success in several current metadata projects is agreement across communities about what metadata are important for various use cases. In an earlier blog, I introduced the idea that metadata recommendations provide descriptions of which documentation concepts communities or organizations believe are important. These recommendations provide an opportunity to identify similarities and differences between community beliefs.

We have collected recommendations from 10-20 organizations and communities as part of an NSF Project aimed at evaluating metadata collections in various dialects with respect to these recommendations. Many of these recommendations include several levels with names that reflect the importance of the concept. Our collection of 73 recommendations includes 26 that are mandatory, required, or core and 35 that are recommended, suggested, or optional. These recommendations are listed in Tables 1 and 2 along with the number of concepts that they include. More details and crosswalks of the recommendations to various dialects are available on the Earth Science Information Partners (ESIP) wiki.

Recommendation Name Count
OGC Catalog Services for the Web (CSW) Core Queryables 10
OGC Catalog Services for the Web (CSW) Core Returnable Properties 15
NASA/ESDIS Common Metadata Repository (CMR) - Collection Required 15
NASA/ESDIS Common Metadata Repository (CMR) - Common Required 13
NASA/ESDIS Common Metadata Repository (CMR) - Granule Required 3
NASA/ESDIS Common Metadata Repository (CMR) - Variable Required 12
DataCite Metadata Schema for the Publication and Citation of Research Data - Mandatory 8
DataCite_4_Mandatory 7
Dataset Descriptions: HCLS Community Profile – Distribution - Required 7
Dataset Descriptions: HCLS Community Profile – Summary - Required 5
Dataset Descriptions: HCLS Community Profile – Version - Required 7
DCAT for Data Discovery - Mandatory 8
Directory Interchange Format for Data Discovery - Required 8
Recommendation Name Count
Dryad Metadata Application Profile - Data File Module-Required 13
Dryad Metadata Application Profile - Data Package Module-Required 10
NASA EOS Clearing House (ECHO) for Data Discovery - Mandatory 9
FGDC for Data Discovery - Mandatory 17
FGDC for Data Discovery - Mandatory if Applicable 8
FGDC for Data Understanding - Mandatory if Applicable 24
Interdisciplinary Earth Data Alliance (IEDA) Recommendation 4
ISO-1 for Data Discovery - Mandatory 6
ISO-1 for Service Discovery - Mandatory 6
Service Entry Resource Format (SERF) for Service Discovery - Required 6
NASA/ESDIS Unified Metadata Model (UMM) -Collection Required 14
NASA/ESDIS Unified Metadata Model (UMM)-Common Required 18
NASA/ESDIS Unified Metadata Model (UMM)-Granule Required 5

Table 1. Required / Mandatory / Core Recommendations

Recommendation Name Count
Attribute Convention for Data Discovery - Highly Recommended 3
Attribute Convention for Data Discovery - Recommended 27
Attribute Convention for Data Discovery - Suggested 5
NASA/ESDIS Common Metadata Repository (CMR)-Collection Recommended 18
NASA/ESDIS Common Metadata Repository (CMR)-Granule Recommended 28
DataCite Metadata Schema for the Publication and Citation of Research Data - Recommended 11
DataCite_4_Recommended 11
Dataset Descriptions: HCLS Community Profile - Distribution - Suggested 6
Dataset Descriptions: HCLS Community Profile - Summary - Suggested 2
Dataset Descriptions: HCLS Community Profile - Version - Suggested 5
Recommendation Name Count
Directory Interchange Format for Data Discovery - Highly Recommended 20
NASA EOS Clearing House (ECHO) for Data Discovery - Recommended 18
EOS Core System for Data Discovery - Recommended 39
Interdisciplinary Earth Data Alliance (IEDA) Recommendation 12
Service Entry Resource Format (SERF) for Service Discovery - Highly Recommended 15
NASA/ESDIS Unified Metadata Model (UMM) -Collection Highly Recommended 3
NASA/ESDIS Unified Metadata Model (UMM) -Collection Recommended 24
NASA/ESDIS Unified Metadata Model (UMM) -Granule Recommended 11
WSDL for Web Service Description 6

Table 2. Recommended / Suggested Recommendations

This is clearly a mixed bag of recommendations from many sources, developed for different dialects with many goals. Many, but not all, were developed with discovery of Earth science (or other) datasets in mind. Those listed in italics in Table 1 seemed different enough to drop them from further consideration.

There is a lot of wiggle room when it comes to translating these recommendations to a unified set of concepts. Some concepts are easy, e.g. resource title, and some are very fuzzy or implemented in very different ways, e.g. resource quality. Despite these obvious challenges, we forged forward.

Together the required/mandatory/core recommendations include sixty-five (65) concepts and the recommended/suggested/optional recommendations include 148. This is consistent with the general practice to minimize the number of required metadata elements in recommendations. We are interested in how many of these concepts are shared across recommendations, i.e. what level of agreement exists across the communities that created the recommendations.

Figure 1 shows the percentage of recommendations that share concepts. The small green bars between 70 – 100% indicates that one of sixty-five (65) required concepts (Resource Title) is shared by 100% of the required/mandatory/core recommendations and one other (Abstract) is shared by 76%. Other than these two, all other concepts are shared by less than 50% of the recommendations (those in 30-40% and 40-50% are shown). The general pattern is the same for recommended concepts (light green bars) and the largest number of concepts in both groups occur in less than 10% of the recommendations.

Figure 1 - Concepts

Even given the many significant caveats described above, this result suggests that cross-community consensus about what metadata are most important (i.e. required) may be elusive. Of course, documentation concepts that are not included in mandatory recommendations can be included in dialects that are associated with those recommendations but not recommended for some reason. Some of these may be “recommended” instead of “required”, although the data show a similar pattern for recommended concepts. Some elements may not make sense across disciplines, e.g. spatial extend is critical for geospatial datasets, but may not meaningful in lab experiment results.

Connecting metadata across disciplines and communities is an important step towards the Metadata 2020 goals. This blog introduces the idea that understanding recommendations and connections between them is a helpful part of that process. It is also important to understand how these recommendations might effect communities. There are many questions that we might explore using a diverse collection of community recommendations. If your community has recommendations and you are interested in comparing them to those in this collection, or if you have specific questions, please let us know.


About the author

Dr. Ted Habermann worked for years leading a variety of data management and access projects at NOAA’s National Geophysical Data Center. He is now the Director of Earth Science at The HDF Group. Ted is a well-known advocate for integrated data and metadata standards and leads ISO development efforts in metadata and data quality. He works with NASA’s Earth Science Data And Information Systems Project and many others on technical and organizational adoption of data and metadata standards. He is the Principal Investigator on an NSF Data Infrastructure Building Block project working with communities to help evaluate and improve metadata. Ted has been active in ESIP for many years, leading the Documentation Cluster and many sessions.