Options for collaboration
Sharing raw data and other analysis-relevant files between collaborators can lead to amazing insights and time-saving distributions of effort and expertise, but can also lead to:
- Frustrating and time-communing confusion around things like the versions of files
- Compromised data security
- Enormous email attachments
The technological tools for collaboration are discussed below, but regardless of how you share access to your data, it’s vital to have a clear plan for file sharing that is effectively communicated to everyone in the project. Ideally, this would be part of your data management plan and so would include naming and directory structure conventions for the project.
Your best options for creating access to your data with collaborators depends on where your collaborators are, the size of the data you are working with, and how you want to share it.
If you want to provide a copy of your data (or otherwise transfer it)
For smaller amounts of data:
- scp or sftp are generally good options
For larger amounts of data, or if the network connection is slow or unreliable:
- Globus is probably the best option because it can gracefully handle, e.g. recovery if the transfer is interrupted. You can find more information about how to use Globus at Princeton here.
- It’s important to keep in mind that data transfers take time; if you want to transfer more than 10GB of data in a short timeframe, you should email firstname.lastname@example.org to find out what your best options are.
If you want collaborators at different institutions to access data at Princeton
You can sponsor them to get limited access to our systems. If you are using Research Computing resources to work with your data (e.g. Tiger or Della) then the principal investigator of the project should contact email@example.com to set up an account. If you are using departmentally-hosted resources, you should contact your departmental administrator to set up an account.
A note about using email to transfer files
Transferring data by email is not ideal -- it can create versioning headaches and is insecure unless you take extra steps to encrypt your data. But, it’s easy to use and when facing deadlines it can seem like the best option. If you do use email for file sharing, here are some things you can do to avoid some common pitfalls:
- Have one person be responsible for keeping track of the latest version of the file(s)
- Keep a running document with the file names/versions, creators, and dates
- Encrypt sensitive information (email is not secure!)
- Detach/delete attachments in your email outbox (and inbox) after the transfer