A new collaboration for richer, connected, and reusable metadata; welcome to Metadata 2020

I’ve been thinking and talking about Metadata 2020 for well over a year now, and we’ve run lots of workshops and met several times with a team of advisors, so this is a bit of a weird post to write. (And a bit nerve-wracking now we’re making it official - there’s even a news release about the launch!) But here we are with three or four events under our belts and more planned, numerous interviews giving clear insights, dozens of supporters with plans for thousands, and some very ambitious goals:

To demonstrate why richer metadata should be the scholarly community’s top priority, how we can all evaluate ourselves and improve, and what can be achieved when we work harder, and work together.

What could we achieve by working together?

When I started talking to my colleagues and others aside from Crossref and publishers, I realised there is only so much a single part of the community can do alone. Various areas of research communications have collaborated before by founding and contributing to organizations like Crossref and DataCite. And there is a lot of support for standards bodies such as NISO (USA), organizations like Jisc (UK), and technical metadata groups like Dublin Core.

Each group–publishers, data repositories, funders, librarians, platforms, and researchers–has its own metadata initiatives: some technical, some talkative; some ongoing, some project-based; some global, some national. So it’s clearly a challenge, the Digital Library Federation even runs a Metadata Support Group, acknowledging that “metadata is hard”.

But such groups don’t really tend to talk to each other either technically or strategically. As one of our interviewees said:

Standards are like toothbrushes. Everybody needs one but they prefer to use their own.

Standardizing standards?

Source: xkcd.com/927 (CC BY-NC 2.5)

And meanwhile there are huge gaps in the metadata that permeate throughout thousands of systems downstream, and we all suffer from mistyped, misplaced or just plain missing metadata. We initiated Metadata 2020 to bring together all the relevant parties from around the world, air the grievances, understand the barriers, and then to make it easier to reach and evaluate research outputs through better metadata.

At Crossref we’ve always introduced individual pieces of metadata under separate banners, asking publishers to do more each time, who in turn ask researchers and systems to do more. Funding data is important but it just gives one route into content, into measurement mainly through a funder lens. License data is also important but again only helps those interested in text mining or measuring trends in open access. No one is telling the fuller story. We have to emphasise the value of the interconnected whole. I know it’s the same for data, and for libraries, and for the research world outside of Crossref and publishers.

It’s been quite hard to relegate my Crossref hat sometimes, but this collaboration has continually reminded me that there is a world outside publishers and outside of journals and books. There are the data repositories, there are the funding agencies, the library discovery services, the services and platforms that help (or hinder) the sharing of research. And there are the research creators themselves.

Metadata pretty much all starts and ends with the researcher. If something is good for them then it’s good for all of us.

So Metadata 2020 is an approach to try to get everyone using the same language about the value of metadata. To encourage richer and better quality metadata, to move it up the development agendas of publishers, and demonstrate the cost of low-quality metadata. And to explain to researchers and platforms why they should be delighted to have to provide so much additional information. We also propose to make it easier for them to do so, not add more obstacles.

What have we learned so far?

We spent about nine months interviewing people from across the community, many of whom have morphed into an advisory group. These interviewees have helped clarify our goals, mission, and we also have a core team working on all the practical stuff to drive things forward. In general people agreed on the following positions:

Richer metadata is a strategic priority.
It’s a big, complicated ask.
No one entity can tackle this problem alone.

We have a stack of slides and videos from the research, which we’ll share in the coming months, but here are some of my favourite comments from the research:

If I’m talking to a researcher their automatic, deer in the headlight look is that you’re asking me to do more stuff. You want to cast it such that we’re very careful what we’re asking for so that it’s clear that there are bigger payoffs.

You’ve got a legacy that’s very comfortable with the status quo. The existing stack is very entrenched. So where does the motivation come from?

If you were reinventing this, what would you do differently? We’re not really investigating our own behaviours. We would, as a community, redeem ourselves if we were able to use tools like metadata and infrastructure to improve people’s lives.

We need to be clear on that distinction between the means and the end. Metadata is the engine, it’s the means, but metadata by itself it is not the goal.

Why do you need richer metadata? And what do you need in order to level up?

What Metadata 2020 offers initially will be an understanding of how different groups think about metadata, what they see as the obstacles and opportunities, and then to develop some best practices. Phase one of the collaboration is to gather stories about metadata, good and bad. We want to air them all, and we’ll be sharing templates and forms for people to contribute their success stories (and their horror stories). We’ll also be continuing our workshops to gather stories and work on creating resources. Resources and events pages on this site will come soon.

Community sub-groups of “metadata champions” will contribute to building business cases and developing a Metadata Maturity Model by which content creators will be able measure themselves and improve. To create awareness and resources such as business cases, for all who have a stake in creating and using scholarly metadata. These community champion groups will be for:

Data
Libraries
Funders
Platforms & tools
Publishers
Researchers

So please consider this an open invitation to get involved!

If you’d like to spend some time on one of the community groups, contribute to the maturity model, and help develop business cases for richer metadata for your part of the community, then please get in touch.

About the author

Ginny founded Metadata 2020 out of frustration with the disparate conversations about connecting research outputs, a desire to explain why metadata is a strategic issue and not an operational after-thought, and to help people improve their metadata management. She is a director at Crossref where she leads their development with new geographies and new communitites, and works to improve the member experience and communications.