By Ted Habermann | Thu, Sep 21, 2017
The devising or choosing of names for things, the body or system of names in a field.
For me, it started with earthquake prediction – trying to find signals – real signals. I needed consistent long-term datasets. I needed to find changes and then figure out if they were real… Turned out that most signals were changes in how things were measured… I needed good documentation.
Then came the web… Suddenly more data than we could imagine. “Metadata” was invented. At the Data Center, we told them that their data would be discovered if they gave us metadata…
Then came standards… and the xkcd story… CSDGM, DIF, ECHO, EML, ISO, DCAT… Since 2007, I have worked with NOAA, NASA and others on evolving and adopting the ISO TC211 Metadata Standards.
Now, working across communities… I have been working on an NSF Project aimed at helping communities evaluate and improve metadata in multiple dialects.
I’m excited about Metadata 2020. We all benefit from richer metadata. I thought it might be useful to start with nomenclature, the names of things.
Documentation – everything needed to reproduce a result.
Metadata – the structured and standardized subset of documentation. Used for discovery, access (humans and machines), use (humans and machines), and understanding of the data it describes.
Metadata dialects – concepts, definitions, and representations for metadata developed by communities and organizations. Many times, synonymous with standards, but emphasizes that many communities are really talking the same language.
Concept – a general, dialect-independent term for describing a documentation entity, typically an element or attribute defined in the dialect representation. For example, title is a general documentation nconcept defined in many dialects and represented in different ways. Typically, the communities or organizations that develop dialects also develop metadata recommendations.
Recommendation – a set of guidance for metadata content created by a community or organization… Recommendations and dialects are different but, in practice, they are frequently related because groups create dialects and recommendations together.
The relationship between dialects and recommendations is illustrated in Figure 1. A community creates a dialect (Dialect1) and three recommendations (R1, R2, R3) that serve different needs or use cases, e.g. discovery, use, and understanding, or different levels, e.g. mandatory, recommended and suggested. These recommendations (R1, R2, and R3) contained different numbers of elements (mandatory recommendations are typically the smallest) and all the concepts in the recommendations are included in the dialect.
When another community creates a second dialect (Dialect 2) with recommendations at two levels: e.g. Mandatory and Recommended (R4 and R5), there is typically overlap between the dialects (most often for discovery content) and the recommendations, e.g. R1 and R4. Overlapping concepts can usually be translated from one dialect to another with minimal loss.
Figure 1. Metadata dialects and recommendations across communities
Finally, other organizations can create recommendations (R6) that are independent of dialect (i.e. purely conceptual). This recommendation included concepts from Dialect1 and Dialect2, R1, and R5.
This nomenclature has been helpful while working with multiple Earth Science metadata communities over the last several years. It helps connect communities to a bigger picture through concepts that translate across dialects. It shows that these different communities share documentation needs and goals and that sharing metadata and ultimately data may be within reach. There are many examples described and connected on the Earth Science Information Partners (ESIP) wiki (Documentation Connections).
We are interested in helping communities find good metadata examples and using those examples as guidance for improvements across collections. We are looking for partners in continuing this work. If you are interested, please comment on this blog or send me an email (thabermann at HDFgroup.org). And please involve yourself in the Metadata 2020 campaign.
We are interested in helping communities find good metadata examples and using those examples as guidance for improvements across collections. We are looking for partners in continuing this work. If you are interested, please comment on this blog or send me an email (thabermann at HDFgroup.org)
About the author
Dr. Ted Habermann worked for years leading a variety of data management and access projects at NOAA’s National Geophysical Data Center. He is now the Director of Earth Science at The HDF Group. Ted is a well-known advocate for integrated data and metadata standards and leads ISO development efforts in metadata and data quality. He works with NASA’s Earth Science Data And Information Systems Project and many others on technical and organizational adoption of data and metadata standards. He is the Principal Investigator on an NSF Data Infrastructure Building Block project working with communities to help evaluate and improve metadata. Ted has been active in ESIP for many years, leading the Documentation Cluster and many sessions.