Need help managing your data?
Contact the library: ask@uml.libanswers.com
This guide was developed in Fall 2024 by Bari Pender (Ph.D., M.L.S. expected Spring 2025) and Veronica Chea (B.S. Public Health, expected Spring 2026), with inspiration and content from:
You are invited to re-use any content from this guide without needing to contact us, but please credit the authors and UMass Lowell Library when re-using.
Documenting your dataset through metadata ensures your research is useable and understandable weeks, months, and years in the future
This is important not only for when you revisit your older data, but also for others who will discover and potentially reuse a dataset which you have made available online.
MIT lists important things to document about your data, some of which are-
Metadata standards specify which pieces of information to include in the metadata.
Visit the Metadata Standards Catalog to search for a metadata standard by discipline or by scheme name,
Additionally, metadata must be encoded, or formatted, in a way that makes it machine readable and searchable. A robust repository will format your metadata for you. Some common formats are:
One of the most important metadata elements for a dataset is a globally unique- persistent identifier (PID) which allows research datasets to be discovered and cited directly. DataCite is a global not-for-profit membership organization which ensures "that research outputs and resources are openly available and connected", specifically by assigning a digital object identifier (DOI) to research datasets. DOI's are assigned through DataCite or through membership institutions with repositories. For example, Dryad is a member institution and can register DOI's for your datasets.
README files are documentation files that describe a file, folder, or dataset so that others can understand and interpret what it contains.
Data dictionaries and codebooks define the elements of the dataset so that you and others can understand and use the dataset in the future. These terms are used interchangeably for the most part, though codebooks are more often associated with survey data.
Data dictionaries and codebooks include information such as-
Resources