Data Management

From Digital Scholarship Group
Jump to navigation Jump to search

What is a Data Management Plan?

A data management plan is a written assessment of how project or research data will be collected, organized, shared, maintained, and preserved.

Why Manage Your Data?

  • Fulfill requirements
  • Improve project efficiency
  • Organize large sets of data
  • Preservation
  • Reuse
  • Promote research

How Do I Create a DMP?

  • Establish data management goals
  • Consult funding agency guidelines (NSF, NEH, IMLS)
  • Review checklists of recommended data management topics
  • Use a data management planning tool, like DMPTool or DMPonline (UK)

Managing Your Data

  • Analyze the data (what kind(s) of data? how much data? who needs your data? how will it be used in the future?)
  • Organize the data (decide on file naming conventions, directory structures, metadata standards, data formats)
  • Decide how the data can be accessed (where will it be stored? what will be shared? how will it be shared? when will it be shared?)
  • Who is responsible for your data?

Working with Projects and Data

Data Interviews

  1. Invite project representatives to answer data management questions using the DSG template in the DMPTool
  2. Ask them to keep track of questions that are difficult to answer
  3. Meet with project reps to discuss difficult questions and provide guidance for difficult data management areas

DMPTool

  • What questions do we want to ask?
  • How do we want to organize the questions?
  • What is the end result?

Possible DM Questions

Data and Project Materials
What kinds of data? (genres, file formats)
How much?
Who is the audience for your data?
How might your data be reused?
What will will be needed to reuse your data?
Organization and Standards
How are your files named?
How is your file directory structured?
Are you using a metadata standard?
What data formats?
How are you documenting your data? (wiki, codebook)
Data access, sharing, and re-use policies
What do you plan to share?
How will users access the shared data?
When will users have access?
Can data be redistributed?
Can other works be derived from your data?
Are there ethical or legal restrictions on access and use?
How will restrictions be handled?
How will you guarantee safe, untampered data?
Where is your data stored?
What is the life span of your stored data?
Roles and responsibilities?
Who is responsible for metadata and documentation?
Who secures the data?
Who ensures data is backed up and not corrupted?



New Questions (Inspired by the ODH DMP Tool)

Introduction (One Page)

Thank you for taking the time to fill out the data management plan tool. Remember that data management is an ongoing process that should be continued through the life of the project. Short term plans address the needs of a project during its active period, and long-term plans address the needs of a project beyond (i.e., what happens to the project data once the leads move on to new projects, is there a plan to archive it?)

  • Will this be a short-term or long-term plan?
  • Who will be responsible for data management and for monitoring this data management plan?
  • How will adherence to this data management plan be checked or demonstrated?

Data Description and Retention Expectations (Page One)

First, describe your data as well as possible.

  • What is the approximate amount of your data, or what is the estimated amount you will produce?
  • How much do you expect your data to grow, on a monthly or yearly basis? (Use the measure that seems most applicable to your project.)
  • What types of data do you have? Data types could include XML spreadsheets, interview transcripts, text files, historical documents, diaries, field notes, geospatial data, citations, software code, algorithms, etc.
  • What are your methods for collecting data and how are you recording that information?
  • What data will be preserved and shared later, and what data will be discarded?

Data Description and Retention Expectations (Page Two)

  • If you will be using existing data, please describe how you acquired the data and its provenance. If you have multiple data sets with different origins, what is the relationship between the data you are collecting and the existing data?
  • Where (physically) and on what media will you store the data during the project’s lifetime?
  • How will you back-up the data during the project's lifetime and how regularly will back-ups be made?
  • How long will the original data collector/creator/principal investigator retain the right to use the data before opening it up to wider use? (Explain details of any embargo periods for political, commercial, patent or publisher reasons. Explain the policies that may restrict the distribution of your data, and describe how you will make sure that access to data is made available in a timely manner.)

Roles & Responsibilities (One Page)

Explain how the responsibilities regarding the management of your data will be delegated. Try to include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff - individuals should be named where possible. Remember that those responsible for long-term decisions about your data will likely be the people we work with when managing your data in the DRS.

  • Who secures the data?
  • Who ensures data is backed up and not corrupted?
  • Who is responsible for metadata and documentation?
  • What process is in place for transferring responsibility for the data once the project is no longer active, or when there are personnel changes?

Sensitive Data and Secure Access (One Page)

File Formats (One Page)

  • Which file formats will you or do you already use for your data, and why?
  • What transformations (to more shareable formats) will be necessary to prepare data for preservation and data sharing?

Metadata (One Page)

  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?
  • How have you documented your data? For example, how would you describe your data to another researcher getting ready to use it?
  • How is your metadata stored? For example, XML or Excel?
  • If you are actively creating your data, how will you create or capture your metadata?
  • Which metadata standards will you use and why have you chosen them? (e.g. MODS, Dublin Core, or TEI).

Sharing (Page One)

  • Who will have access to your data? Will any permission restrictions need to be placed on the data?
  • If your data will be made available, what is the process by which others will gain access to your data? Will you provide an API, or downloadable zip files? Will it be via a website, or other system? Include the resources needed to make the data available: hardware, software, types of expertise, etc.
  • Will it be available for open download, or behind an account system, or other systems?
  • When will you make the data available? Are there deadlines you need to meet?
  • What other types of information should be shared regarding the data? Will other users need to know how it was generated, or how you decided to organize it, or what algorithms were used to analyze it?

Long-term Storage and Access (Page One)

Describe your long-term strategy for storing, archiving and preserving the data you will generate or use. Consider the following:

  • Who will maintain, curate and preserve data over the long-term? Will you deposit it into a database or a repository (such as the DRS)?
  • What procedures does your intended long-term data storage facility have in place for preservation and backup?

Long-term Storage and Access (Page Two)

  • What is most important to preserve beyond the life of your project?
  • How will you choose what to submit for long-term preservation?
  • What metadata/documentation will be needed in order to make the data reusable?

Brief Version: Interview Questions

Interview Questions and Suggestions:

- Possible Questions:

   - What data will be preserved and shared later, and what data will be discarded?
   - Describe your data: amount, formats, how much do you expect it to grow?
   - What are your methods for collecting data and are you recording that? (Getting at copyright questions)
   - How will you back-up the data during the project's lifetime and how regularly will back-ups be made?
   - What process is in place for transferring responsibility for the data once the project is no longer active, or when there are personnel changes?
   - Will you be collecting data that is sensitive, or could be considered sensitive? For example, are you collecting personally identifying information, like names and social security numbers?
   - What contextual details (metadata) are needed to make the data you capture or collect meaningful and how are you collecting it?
   - How have you documented your data? For example, how would you describe your data to another researcher getting ready to use it?
   - If you are actively creating your data, how will you create or capture your metadata?
   - Who will have access to your data and how? Will any permission restrictions need to be placed on the data? What happens to the data if those people leave? 
   - Is portability important to you? If so, what allowances have you made or does your platform allow for export? 
   - What is important to preserve beyond the life of this project? What needs to be maintained? 

- Suggested Practices

   - Make someone responsible for data management and for monitoring this data management plan.
   - Select open formats rather than closed.
   - Add recommendations for gathering metadata and building that into project workflow.
   - Add recommendations on long-term preservation.

Resources