Metadata 2020 Charleston Workshop

First let me say what a pleasure it is for me to write this blog post. Thanks to the Metadata 2020 team for giving me this opportunity.

On the 7th of November, I had the pleasure to attend the workshop given by the Metadata2020 group at the Charleston Conference. Metadata2020 is a collaboration project between librarians, publishers and service providers that advocates the creation of better metadata.

At the workshop, among the attendees were librarians, publishers and other information professionals. Presentations were given to illustrate the project, but also to provide new ideas that definitely need to be taken into consideration by everyone who creates and supplies metadata.

Panel Presentations

First Jennifer Kemp, Head of Business Development at Crossref, illustrated the project; the importance of richer and better metadata and the necessity of a coordinated and well delineated collaboration between all the people and institutions involved in the creation and supply of metadata. “Richer metadata fuels discoverability and innovation. Connected metadata bridges the gaps between systems and communities. Reusable metadata eliminates duplication of effort.”

The project is divided in two phases: the first involves gathering stories about metadata, both good and bad, and having more workshops to reach as many people as possible who can give their contributions and experiences; the second involves the creation of community groups, that will put together business cases to demonstrate the necessity of providing richer metadata, and of a metadata maturity model, “by which content creators will be able to measure themselves and improve”.

Maryann Martone, Researcher at the University of California San Diego, gave the researcher’s perspective on metadata. Data needs to be FAIR: findable, accessible, interoperable, reusable. To fulfil these requirements, the fundamental points are: a globally unique and persistent identifier; rich metadata that includes the identifier, that is registered or indexed in researchable resources, that includes verified references to other data and metadata, that meets international community standards.

This is fundamental to change the current situation where more than 50% of research resources do not contain enough metadata to be identified. Researchers (authors) are asked to enter basic metadata for their work but this process requires first of all a good knowledge of what exactly is required and also the necessity of having a unique identifier for each research identity (the use of ORCID has never been more necessary).

In conclusion, in order to have better and richer metadata, better tools and interoperable workflows need to be created to avoid repetition and multiple workflows; authors should be relieved from having to enter metadata so they can devote more time to their work; the cooperation between humans and machine needs to be developed and used at its best.

Jennifer Kemp used the example of ORCID, which provides unique identifiers for authors, enables them to show their affiliations and provides automated processes which are consequently easier and more likely to be used. ORCID demonstrates the necessity of having a solid, complex and complete infrastructure to support the rich metadata catalogues need to have. Organizations have to create infrastructure able to support full and complete metadata, according to community standards and interoperable with other organizations’ infrastructure, in order to ensure proper maintenance and innovation.

Lettie Y. Conrad, a Publishing and Product Development Consultant, gave the publisher prospective. Publishers need to invest in resources and expertise either internal or external. This is necessary in order to produce rich metadata, to match how it will be used, modified and distributed, and that conforms to the most updated international standards. One of the ideas suggested was to get rid of the peer review metadata system. Since metadata needs to be produced according to specific standards, it is not useful to get metadata from people who do not have the proper training and skills. It will be a waste of their time but also of the time for those who need to process it to make it useful.

The last (but not least) presentation was given by Michelle Brewer, Librarian/Market Intelligence Manager at Wolters Kluwer. Hers was a stimulating introduction to one of the “hottest” topic in the field: Metadata and machine learning. After giving a very informative introduction to the concepts of machine learning, programming and AI (which connects to the field through classification and text generation), she showed us the most important projects in the field.

A few of them were: the “World Checklist of Selected Plant Families” from the Royal Botanic Gardens, Kew; Amazon ebook recommendations; a project by the University of Nottingham who “used machine learning with 2005 patient data to predict which patients would have their first cardiovascular event in next 10 years and exceeded ACC/AHA Guidelines by 7.6 percent and identified several risk factors not on the ACC/AHA guidelines (16.)”.

The future is indeed going where we are thinking: Machines programmed to produce rich metadata. Is it far off? No it is not. Machines can be programmed to answer queries, summarize texts, read texts, and produce keywords. It has become clear that librarians and information professionals need to acquire a new sets of skills: programming. We can give power to the machines but we cannot be controlled by them.

Jennifer Kemp and Lettie Y. Conrad concluded with short slides on the latest developed infrastructures (OrgIDs, Grant IDs, Conference IDs, Event Data by Crossref and Scholia, Wikidata, oaDOI & Unpaywall by other organizations) and metadata collaboration projects (Holdings APIs, NISO standards, the community W3C).

Discussion

The workshop was extremely interesting and useful, not only because it gave us attendees new ideas, but also because during and after the presentation we were given the opportunity to discuss what we had just learned but also share stories of old and new projects. This is one of the best points in my opinion. When attending an international workshop, attendees should be given ample space to speak about what’s been discussed, and to share experiences, ideas and, why not, even laugh together. Our community is made of millions of individuals but we have to work as one.

Metadata 2020 is a project I am proud to be a part of. It is clear in my mind that librarians, publishers, vendors and anybody who is involved in the creation and supply of metadata, need to work together toward a common goal.

And as a publisher, I want to say this to other publishers: do not wait for others to come to you. Do not wait for requests to arrive, do not wait for standards to be created. Move first. Think about how to reach certain objective first. And share. Share your knowledge, your ideas and your developments, by sending short summaries of metadata challenges you are facing or have faced and managed to overcome; and/or ideas involving opportunities to collaborate on developing new solutions; to Clare Dean at cdean@metadata2020.org.

About the author

Concetta completed a BA in Conservation of Cultural Heritage in 2007 at the University of Messina. During her university years she worked in various libraries and cultural institutions in Italy. She completed an MA in Archival and Library Science at the University of Rome ‘La Sapienza’ in 2009.

In 2011 she moved to the UK where she worked as a Cataloguer for Blackwell, Baker & Taylor and YBP/EBSCO, acquiring great experience in cataloguing books and ebooks. In September 2015 she moved to Cambridge University Press where, as Library Data Analyst, she uses her knowledge of cataloguing and metadata to improve the quality of the metadata supplied to customers and third parties.

You could say she has worked on every side of the library field!