The Metadata 2020 Personas
From its start, Metadata 2020 has been engaged with the various communities that participate in metadata, Researchers; Publishers; Librarians; Data Publishers & Repositories; Services, Platforms & Tools; and Funders. In considering the flow of metadata between individuals and systems, we recognized that within these communities, people take on one or more personas based on what they are doing at any one moment. In addition, each persona’s actions are quite similar regardless of the community they belong to. Using this role-based set of personas is a useful construct for applying our project work.
provide descriptive information (metadata) about research and scholarly outputs.
classify, normalize, and standardize this descriptive information to increase its value as a resource.
store and maintain this descriptive information and make it available for consumers.
knowingly or unknowingly use the descriptive information to find, discover, connect, and cite research and scholarly outputs.
The Metadata Personas
Some of the best creators of metadata are the people who created the content to which the metadata refers, i.e., the researchers. They know their work better than anyone else, so are best placed to decide on keywords, and also know most about how, when, and where their work was carried out; who else was involved; what resources were used (equipment, artifacts, etc); and more. But researchers aren’t the only stakeholders who create metadata. Publishers also contribute, by adding persistent identifiers (PIDs) for authors and reviewers, funders, and other organizations. In future, grant identifiers could also be included, as well as identifiers for research resources, such as laboratory equipment or special collections. In addition, publishers typically provide structural metadata (page numbers, type of work, publication dates etc), and work with third party services to register DOIs (Digital Object Identifiers) for their content.
Collecting the data that makes up metadata is only part of the solution. Making it meaningful is just as important, and that includes making it consistent. PIDs play a role again here, as do taxonomies and standards, though neither are easy or straightforward. For example, while some fields have well-established keyword taxonomies, others do not; even where those taxonomies exist, they may not be embedded in research workflow processes. And if the keywords list contains ‘mouse’ and ‘human’, how is one to know the subject of study? Standards are equally challenging. Getting all stakeholders to agree on even the most basic elements of a standard can take years; getting that standard widely adopted can be an even longer process. And yet doing so could significantly improve metadata quality — and maybe even speed up the dissemination of knowledge.
Creation and curation of metadata are two key steps in the scholarly research pipeline. Once metadata has its home in databases, repositories, and catalogs, it comes under the stewardship of custodians — libraries, archives, repositories, library service providers. Also included are systems that enable the original creators and custodians to perform their custodian role in maintaining their metadata contributions. These custodians are tasked with keeping this metadata and other information current, accessible, and discoverable. Custodians and curators must, therefore, work in tandem. The bottom line for any metadata custodian is that their systems must be set up to ingest metadata correctly, distribute it efficiently, and ensure that any changes they make don’t affect its quality or inhibit its use by consumers.
Whether we are using metadata as individuals or consuming it via artificial intelligence, we have a collective responsibility to ensure that it’s the best it can be. This requires us to develop better ways of adding missing information and correcting incorrect metadata. We need to find ways to encourage our community to improve all metadata, both by making it easier to do so and by increasing their understanding of why this is important.