Need help managing your data?
Contact the library: ask@uml.libanswers.com
This guide was developed in Fall 2024 by Bari Pender (Ph.D., M.L.S. expected Spring 2025) and Veronica Chea (B.S. Public Health, expected Spring 2026), with inspiration and content from:
You are invited to re-use any content from this guide without needing to contact us, but please credit the authors and UMass Lowell Library when re-using.
At the beginning of any new project, consider the types of data that you will generate and think about how you or someone else might look for a file a year from now.
[Project] / [Experiment] / [Instrument or Type of file]
Below are several pitfalls to avoid, from FASEB's Structuring File Folders Effectively
Dryad's Good Data Practices gives an example of two different ways to think about file organization:
DatasetA.tar.gz |- Data/ | |- Processed/ | |- Raw/ |- Results/ | |- Figure1.tif | |- Figure2.tif | |- Models/
DatasetB.tar.gz |- Figure1/ | |- Data/ | |- Results | | |- Figure1.tif |- Figure2/ | |- Data/ | |- Results/ | | |- Figure2.tif
Establishing file naming conventions at the outset of a research project ensures data file organization and facilitates file retrieval and sharing. It is easy to underestimate the vast quantities of data files a project will generate, even on daily basis.
FASEB's File Naming Best Practices highlights important reasons to establish a file naming schema, as it will help you:
Below are guidelines for file naming best practices:
As technology continues to evolve, software and hardware that exist today can become obsolete. Data files saved in proprietary formats associated with that technology will be unusable. Storing your data in robust open file formats allows data to be accessible and usable to you and others in the future.
While you may need to collect data from an instrument in a default, proprietary file type, it is important to export data intended for storage and sharing to a more preservable format. Below are preferable formats that are non-proprietary, common, and accessible:
File type |
Preferred format |
---|---|
Text |
Plain text, ASCII (.txt) Portable Document Format (.pdf) Extensible Markup Language (.xml) |
Tabular | Comma separated values (.csv) |
Image |
Tagged Image File Format (.tif or .tiff) JPEG 2000 (.jp2) Portable Network Graphics (.png) |
Document | Portable Document Format (.pdf, .pdf/a, pdf/ua) |
Video |
MPEG-4 (.mp4) Material Exchange Format (.mxf) |
Web data/ Data exchange |
Javascript Object Notation (.json) Extensible markup language (.xml) |
Geospatial Data |
ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn) |
Formats like CSV, TXT, and JSON are widely used due to their simplicity, versatility, and ease of use across different platforms and programming environments.
Each format serves specific needs and has unique advantages, depending on the nature of the data, storage requirements, and the types of analyses or collaborations anticipated.