Crystallographic Data Repository

From CCP4 wiki

Objectives[edit]

The idea is to save raw diffraction data related to all PDB depositions from an institute or lab. Below is a list of what this project aims to achieve.

  • Easy upload of data
  • Mandatory metadata submission
  • Off-site backup of data
  • Easy download of data (download manager to restart interrupted downloads)
  • Comparability – should automatic processing be run to provide datasets with comparable statistics?
  • Secure storage with access level (public, password, temporary access, private) set by user.

Should each dataset get a digital object identifier (DOI)? What are the procedures to do that? If there was one, it should surely be linked to the PDB.

Necessary metadata[edit]

To make data useful in the future, the following data should be associated with it:

  • PDB code – this is the main connection with the PDB.
  • Responsible crystallographer – whom to contact with technical questions
  • Responsible PI – person who is ultimately responsible for data (Use World Directory of Crystallographers ID? e-mail addresses change...)
  • Publication details
  • Data source – beamline where data were collected

It might be possible to extract many of these details from the headers of the image files or the PDB.

People[edit]

This is a list of people who've shown interest in working on this project.

  • Adam Ralph - Irish Centre for High End Computing
  • Nicola McDonnell - Irish Centre for High End Computing
  • Kashif Iqbal - Irish Centre for High End Computing
  • Marco Grossi - Irish Centre for High End Computing
  • Andreas Forster - Imperial College London
  • Jon Agirre - University of York

People to contact:

  • Erica Yang and Brian Matthews at Rutherford might have solutions almost ready according to Martyn Winn

Funding[edit]

Nothing is possible without money. Here are option for funding

  • Horizon 2020 funding - One of the strands is Research Infrastructures.
  • Instruct
  • EUDAT - Martyn Winn says this is not a possibility.
  • Joint Information System Committee (JISC) - Mainly funding UK projects but has funded UK/Ireland ones in the past. It is partially funding the Rutherford project.
  • ARCHER - UK Research Data Facility. It is funded by NERC and EPSRC so UK Xtal community should be able to use this. It is hosted and run by EPCC (Edinburgh Parallel Computing Centre) which is a sister organization of ICHEC.