DataSpace Help


How to Submit a Dataset to DataSpace

Preparation

Data and Metadata

Data must be free of any information that is unsuitable for public release (e.g., sensitive personal information). Data files should be accompanied by adequate documentation. There are also some practical limitations to what DataSpace can accommodate, as well as best practices of open research that DataSpace strives to uphold. Please review the DataSpace Policies and Guidelines.

Deposit License

Contributors must have the right to distribute their data, and they must grant Princeton University re-distribution rights. The DataSpace Deposit License is found within the Policies and Guidelines.

Access and Login

Contact the DataSpace administrators directly to gain access for your first submission. Then you can sign in to DataSpace as often as needed to make new submissions and/or revise submissions following feedback from the curators.

The Submission Web Portal

Choosing a Collection

The DataSpace repository is organized into “communities,” which roughly represent academic/administrative units within Princeton University. Communities are further subdivided into “collections,” which may represent administrative sub-units, named research projects, or types of items. As you begin a new submission, the first step is to choose your community and collection. If you are unsure, if you lack permission to submit to the appropriate collection, or if you would like to create a new community or collection, please contact the DataSpace curators.

Describing an Item

All items in DataSpace must include sufficient information to identify, attribute, and describe the research data. Depending on the community and collection, different metadata fields may be shown or not shown, and required or not required, in the web submission forms. Definitions for the most common research data fields across form variants are listed below, in alphabetical order. For further guidance on metadata, please contact the DataSpace curators.

Abstract
Please enter a summary of the item as a whole in this field. The focus should be on the nature of the data included in the item (scope, purpose, methods, etc.), as opposed to substantive claims made in related publications.
Alternative Title
In some cases, it may be helpful to supplement an item’s title proper with additional titling information. For example, an “alternative title” could include a translation from a foreign language, a title used in an older version of the item, or an abbreviation used in technical references. (Click “+ Add More” as needed.)
Author(s)
An “author” is a person responsible for the composition, collection, generation, and/or compilation of the item. Provide the complete first and last names of all persons who should be credited for any file included in the item, or in relation to the item as a whole. (Click “+ Add More” as needed.)
Citation
If you have a preferred format for a human-readable bibliographic citation for the item, enter it in this field.
Data Creator
Like an “author,” a “data creator” is a person responsible for the composition, collection, generation, and/or compilation of the item. If no “author” field is available in a form for a given collection, please use the “data creator” fields to provide the complete first and last names of all persons who should be credited for any file included in the item, or in relation to the item as a whole. (Click “+ Add More” as needed.)
Data Publisher
The “data publisher” refers to the entity responsible for making the item publicly available--as opposed to the publisher of an article or book in which data included in the item are referenced. In the case of DataSpace, Princeton University is the institution responsible for item publication in most cases. In some cases, the item may be publicly released by another entity before DataSpace (e.g., on a special project website or in a data paper), and then the entity responsible for the item’s initial public release should be entered as the “data publisher.” To credit departments, labs, or other bodies responsible for the creation of the item (as opposed to its publication), please contact the DataSpace curators.
Dataset Description (abstract)
If available, please enter a summary of the item as a whole in this field. The focus should be on the nature of the data included in the item (scope, purpose, methods, etc.), as opposed to substantive claims made in related publications, and as distinct from an outline of the item’s included files.
Dataset Description (TOC)
If available, please enter an outline of the item’s included files in this field. Further data documentation may be provided within a "README.txt” file. (Specific descriptions of individual files can also be added upon upload.)
Date of Creation
Enter the date that should be associated with the creation of the item, following the ISO 8601 standard: YYYY-MM-DD. This is typically earlier than the “date submitted” and the “date of issue.” For archival/historical records, the “date of creation” is not the same as the time period from which the records originate--which can be captured by a “temporal coverage” field. Instead, the “date of creation” pertains to the date the researcher(s) collected or compiled the data for the present item being submitted.
Date of Issue
Enter the date that should be associated with this item’s public release, as may be used in citations of the data. At minimum, the year must be entered (YYYY), but the full date is strongly recommended, following the ISO 8601 standard: YYYY-MM-DD. Typically, the “date of issue” is the current date at the time of submission, as the curators review items for publication immediately (and the curators can update dates if review takes longer). If the item has already been published somewhere before the present DataSpace submission (e.g., on a special project website or in a data paper), then the “date of issue” should match the date of the item’s initial public release. In the case of limited or delayed release, the “date of issue” may be later, but one must contact the DataSpace curators to make such arrangements.
Date Submitted
In some cases, it may make sense to distinguish the item’s “date submitted” from its “date of issue” (e.g., when the item is embargoed for a period). If applicable, please enter the current date at the time of submission in the “date submitted” field, following the ISO 8601 standard: YYYY-MM-DD.
Depositor
In some cases, there may be a “depositor” distinct from the authors, creators, and/or contributors of the item. If required for a given collection, enter the full name of the person responsible for depositing the item to DataSpace.
Description
As a supplement to the “abstract,” please enter technical information and other important descriptive information in the “description” field. If a “README.txt” or similar files are included, please reference those files specifically. (Specific descriptions of individual files can also be added upon upload.)
Funder
In some cases, it may be appropriate to name a particular funding agency and/or provide a particular grant number associated with the item. Information entered in the “funder” field should be considered descriptive of the item. If you need to credit outside organizations for direct contributions to an item, please contact the DataSpace curators.
Identifiers
If the item itself (as opposed to any related publications) has other persistent identifiers besides the ARK that is automatically generated by DataSpace, please provide them, each according to its standardized context from the drop-down menu (e.g., ISSN). (Click “+ Add More” as needed.)
Language
DataSpace tracks the language of the main content of an item, following the ISO 639 Standards. Please select the best-fitting language for the item as a whole from the drop-down list. The default is “English (United States).”
Other Titles
Like an “alternative title,” in some cases it may be helpful to supplement an item’s title proper with additional titling information under “other titles.” For example, an “other title” could include a translation from a foreign language, a title used in an older version of the item, or an abbreviation used in technical references. (Click “+ Add More” as needed.)
Publication Citation
If the item is referenced by any specific publication(s), please enter the full citation(s) for the publication(s) in this field. Persistent URLs, such as a DOI, are preferred to identify referencing publications. (Click “+ Add More” as needed.)
Publisher
The “publisher” refers to the entity responsible for making the item publicly available--as opposed to the publisher of an article or book in which the data included in the item are referenced. In the case of DataSpace, Princeton University is the institution responsible for item publication in most cases. In some cases, the item may be publicly released by another entity before DataSpace (e.g., on a special project website or in a data paper), and then the entity responsible for the item’s initial public release should be entered as the “publisher.” To credit departments, labs, or other bodies responsible for the creation of the item (as opposed to its publication), please contact the DataSpace curators.
Princeton Project Grant Number
If applicable, please use this field to record the Princeton project grant number that should be charged for the submission.
Relation (Is Part of Series)
If the item itself (as opposed to any related publications) is part of a named series of resources, please provide both the name of the series and the item’s number within that series.
Relation (Is Referenced By)
If the item is referenced by any specific publication(s) and the publication(s) have persistent URL(s), such as a DOI, please enter the persistent URL(s) for the publication(s) in this field. (Click “+ Add More” as needed.)
Relation (Is Replaced By)
If the item is supplanted, displaced, or superseded by another resource, please provide the complete, canonical citation for the replacing resource in this field. Persistent URL(s), such as a DOI, are preferred for references to replacing resources.
Relation (Is Version Of)
If the item as a whole is a version, edition, or adaptation of some other resource, please provide the complete, canonical citation for the parent version in this field. Persistent URL(s), such as a DOI, are preferred for references to parent versions.
Relation (Replaces)
If the item supplants, displaces, or supersedes another resource, please provide the complete, canonical citation for the replaced resource in this field. Persistent URL(s), such as a DOI, are preferred for references to replaced resources.
Series/Report No.
If the item itself (as opposed to any related publications) is part of a named series of resources, please provide both the name of the series and the item’s number within that series.
Source
If the item, in whole or in part, is derived from some other resource(s), but not a version of the other resource(s), please provide the complete, canonical citation for the source(s) in this field. Persistent URL(s), such as a DOI, are preferred for references to sources. (Click “+ Add More” as needed.)
Spatial Coverage
If appropriate, a location or region of spatial coverage may be specified for the data contained in the item (e.g., the place where an experiment was conducted, the geographic scope of a population represented by a sample, or the astronomical region observed). DataSpace allows named geographic locations and regions (e.g., a city, state, country, or continent), as well as geolocation points, boxes, and polygons. If you need to reference a specialty set of standards to define the spatial coverage accurately, please contact the DataSpace curators.
Sponsors
In some cases, it may be appropriate to name a particular sponsoring agency and/or provide a particular grant number associated with the item. Information entered in the “sponsors” field should be considered descriptive of the item. If you need to credit outside organizations for direct contributions to an item, please contact the DataSpace curators.
Subject Keywords
Please provide keywords to indicate the topic and/or subject matter of the item. (Click “+ Add More” as needed.)
Supersedes
If the item supplants, displaces, or supersedes another resource, please provide the complete, canonical citation for the superseded resource in this field. Persistent URL(s), such as a DOI, are preferred for references to superseded resources.
Table of Contents
If available, please enter an outline of the item’s included files in the “table of contents” field. Further data documentation may be provided within a “README.txt” file. (Specific descriptions of individual files can also be added upon upload.)
Temporal Coverage
If appropriate, a period of time may be specified for the data contained in the item (e.g., the window of observation, the historical period, or the geological era). For dates and times, DataSpace conforms to the DCMI Period Encoding Scheme, which draws from the W3C-DTF specification, which in turn conforms to the ISO 8601 standard. This means the temporal coverage may be specified within particular time zones, down to fractions of a second, as YYYY-MM-DDThh:mm:ss.sTZD. For lower granularity specifications, truncate the unnecessary components of the date/time string. Use “start=” and “end=” to designate the opening and closing of a range, delimited by a semi-colon. For example, an item covering data collected from February 3, 2020 through March 2, 2020 would have the following “temporal coverage”: “start=2020-02-03; end=2020-03-02.”
Title
Provide an informative and distinguishing title for the item itself (as opposed to a related publication), with public dissemination in mind. The title will be presented to DataSpace users while browsing, supplied to web harvesting and indexing services, and used in data citations.
Type
Select the type of item from the drop-down menu (e.g., “Dataset”). For items with multiple files to be uploaded, or with different types of data contained in the files, select as many types as apply to the submission as a whole. (More details about the types and descriptions for individual files can also be added on the upload page.)
Version
If the item has a particular version number or name, please enter it in this field.

Uploading Files

You must upload at least one file to proceed with your submission.

Due to the practical limitations of the web interface, the DataSpace administrators recommend that you do not attempt to upload individual files larger than 150 MB. DataSpace does accommodate larger files, and they can be be transferred by more efficient and reliable means than the web portal (e.g., Globus). Please contact the DataSpace administrators for assistance while preparing large files for submission.

DataSpace does not restrict file types, but if the system does not automatically recognize an upload, you will need to manually enter the file type by clicking the “Change” button under “File Format.” You may also add some brief information about each file by clicking the “Change” button under “Description.”

Accompanying the data file(s), include with your item a “README.txt” file, which may contain additional metadata, restrictions or guidelines for data use and re-use, technical information about the data files, and any other documentation necessary to interpret the data. (For submissions including large data files, the README may be the only file uploaded through the web portal.)

Verifying Entries

Upon upload, individual files may be verified using checksums. After all of the metadata forms are complete and all of the files are uploaded, you will be asked to verify your submission as a whole. You will have the option at that point to go back and correct metadata entries and add or remove files.

Affirming the Distribution License

The final step before submission is to affirm that you grant the DataSpace distribution license (text provided within the web portal). If you click “I Do Not Grant the License,” then the system will automatically delete your submission. So if you are unsure, do not click either button; instead, contact the DataSpace curators with any questions you have.

Additional Metadata

The DataSpace web portal is designed to streamline the submission process, and as such, it only offers entry fields for certain required and recommended metadata categories. However, DataSpace can accommodate a much wider variety of metadata values, which may improve data discoverability, usage, and attribution. For example, distinctions among contributors may be made according to their specific role; ORCID iDs and other personal identifiers may be associated with contributors; multiple provenance dates may be specified; and elaborate references among published datasets and articles may be defined. Only the DataSpace curators may add or modify metadata outside the web portal; please contact them with any questions or requests.

Curatorial Review

Once you submit an item, it goes to the DataSpace curators for review. Curatorial review is geared toward discoverability, re-usability, and long-term preservation; it does not involve a substantive evaluation of the data contained in a submission. The curators may offer suggestions to improve a submission’s alignment with the best practices of open research, but final decisions about the content of submission always rest with the contributors. The curators make every effort to review submissions in a timely manner, typically maintaining a turnaround of two business days (though large and/or complex submissions may take longer). They will contact you directly with any questions or suggested revisions, and you will receive email notifications for any status updates on your submissions.


Terminology

DataSpace is an implementation of the DSpace platform, which entails some specialty terminology for the structure of contents in the repository and the persons responsible for contributions. Terminology for persons responsible is also complicated by differing metadata standards for research data. This section clarifies the key terms for DataSpace users.

Terms for the Structure of Contents

DataSpace maintains a strict hierarchy of contents, ordered as follows:

Community

A “community” in DataSpace is a top-level category for content, roughly representing academic/administrative units within Princeton University. Every item in the repository must be assigned to exactly one community.

Collection

A “collection” in DataSpace is a second-level category for content within communities, representing administrative sub-units, named research projects, or types of items (e.g., datasets). Every collection must be assigned to exactly one community, and every item must be assigned to exactly one collection.

Item

An “item” in DataSpace is the basic unit for content--equivalent to a “resource” by Dublin Core and DataCite metadata standards. Every item must be assigned to exactly one collection, within exactly one community. Items may contain any number of files, and copies of files may be assigned to more than one item. Each item bears a unique, persistent identifier to distinguish it from other items. Additionally, metadata typically distinguish items--for example, by contributors, date, and title.

File

A “file” in DataSpace is much the same as a file on Windows, macOS, and Unix-based platforms, except that DataSpace does not have a file folder structure (i.e., DataSpace is a flat-file system). Files in DataSpace exist only as components of items. DataSpace can accommodate any file type, and it does not restrict duplicate file names. A file may bear values for “Description” and “File Format” in addition to its item’s metadata. (On the backend of the DSpace software that runs DataSpace, a file is called a "bitstream," but users would only see this terminology if they access the REST API or OAI endpoint.)

Terms for Persons Responsible

The terminology for persons responsible for content in DataSpace may vary by context. Some terms are used for metadata fields and have technical definitions established by metadata standards, while others relate to more general usage within this help page for DataSpace.

Administrator

An “administrator” is not a special term defined by DSpace or by Dublin Core or DataCite metadata standards. For the purposes of DataSpace documentation, an “administrator” is a Princeton staff member with administrative access to the backend of DataSpace.

Author

An “author” is a type of “contributor,” used by DSpace (and DataSpace by extension) as the default metadata field for naming persons primarily responsible for items. It is not a standard term used in metadata for research data outside DSpace implementations, but it maps to the “contributor” element in the Dublin Core metadata standards and the “creator” element in the DataCite standards. The term “author” is avoided for data submissions in the DataSpace documentation, in favor of the more general and more widely standardized term, “contributor.”

Contributor

A “contributor” is a person or organization responsible for making contributions to DataSpace. The term “contributor” is the general term used for persons responsible for submitting content throughout the DataSpace policies and guidelines and documentation. It maps to the “contributor” element in the Dublin Core metadata standards and the “creator” element in the DataCite standards (not the “contributor” element in the DataCite standards).

Creator

A “creator” is a person or organization primarily responsible for making a DataSpace item and/or its component files. Because DSpace (and DataSpace by extension) uses “author” as the default term for naming persons primarily responsible for items, the “data creator” field is typically unavailable during the submission process. Some collections in DataSpace do have specialty submission forms with “data creator” fields instead of “author” fields. Depending on the context and the other metadata available, a “creator” in DataSpace may map to either the “creator” or the “contributor” element in Dublin Core standards. However, a “creator” in DataSpace always maps to the “creator” element by DataCite standards.

Curator

A “curator” is not a special term defined by DSpace or by Dublin Core or DataCite metadata standards. For the purposes of DataSpace documentation, a “curator” is a Princeton staff member responsible for content control in DataSpace, including especially the vetting of submissions to ensure their authenticity, file integrity, and compliance with metadata standards.

 

Go back to the Research Lifecycle Guide

Go to DataSpace