Git access to OpenNeuro datasets
The underlying format for dataset is a git-annex repo created with datalad. Every dataset tracks changes as commits and snapshots are stored as git tags.
Datasets support clones, pushes, and other git operations. All datasets require authentication for access but public datasets will be available via anonymous clones in a future release.
A dataset must always be present in the root level of the repository.
OpenNeuro validates the size of regular git (non-annexed) files and a subset of bids-validation before accepting a git push. It is important to annex any large files in any commits before pushing.
.bidsignore must always be a regular file. Some features are only available for regular files (such as diffing) and textual data is generally best kept as regular git objects.
.gitattributes configuration for git-annex to automate annexing the correct files when using
git add or
* annex.largefiles=largerthan=1mb*.bval annex.largefiles=nothing*.bvec annex.largefiles=nothing*.json annex.largefiles=nothing*.tsv annex.largefiles=nothing.bidsignore annex.largefiles=nothingCHANGES annex.largefiles=nothingREADME annex.largefiles=nothing
Using openneuro-cli, git can be configured to automatically use your OpenNeuro credentials to allow access to datasets. This is the preferred method for authenticating regular git access. An advanced method of issuing a key is documented below if you cannot use the git credential helper for your use case.
Once you have openneuro-cli installed and you've logged in with
openneuro login, you can configure git to automatically use your login.
# This allows the helper to identify which dataset you are accessing automatically and issue a key for that datasetgit config credential.useHttpPath true# Point git at the openneuro-cli tool (this must be an absolute path)git config credential.helper "/path/to/openneuro git-credential"
Alternatively openneuro-cli can be given the name
git-credential-openneuro and this shorter command will work.
git config credential.helper "openneuro"
This will configure these options for one repository.
To enable for all OpenNeuro repositories add this to your git configuration file.
[credential "https://openneuro.org"]useHttpPath = truehelper = "/path/to/openneuro git-credential"
Most datalad or git operations will work as expected but there are a few limitations. Force pushes or unrelated history will be rejected. Annexed data is accepted but only via the git transport, using other annexes will result in unreachable files or failed validation due to missing data.
To download a new dataset using the credential helper you can start with an empty repo and then configure that repo.
mkdir ds000001cd ds0000001git initgit remote add origin https://openneuro.org/git/0/ds0000001# Follow the above steps to setup the credential helpergit pull origin mastergit pull origin git-annex:git-annex# From here you can treat this like a datalad dataset and export back to OpenNeuro to deploy changes
When you are ready to push changes, make sure to validate them before attempting to push. OpenNeuro will reject some invalid pushes but cannot run the full bids-validator until after your changes have been pushed.
If you cannot use the credential helper you can manually generate a short lived key and pass this as the password for git operations. Substitute path= with the repository path for the dataset being accessed.
openneuro git-credential fill <<EOFprotocol=httpshost=openneuro.orgpath=/git/0/ds000001EOF
Snapshots are represented as git tags. To use all OpenNeuro features, you must have at least one tag with a valid semantic versioning name. This is typically
1.0.0 but can be any valid numeric semantic version.
Direct access can clone, push, and pull dataset contents but it does not transfer annexed objects on its own. For public datasets, the annexed objects are available with a preconfigured remote on S3 shortly after the dataset is made public or a new snapshot is created if the dataset is already public.
For private datasets or to add new data with DataLad or git-annex, a special remote is available to push data directly to OpenNeuro.
Obtain the URL from the dataset page and run initremote (or enableremote if you need to update it).
# Make sure openneuro-cli is installed and available in your path# You should see 'VERSION 1' 'EXTENSIONS' if this is workingecho "EXTENSIONS" | git-annex-remote-openneuro# Configure the remote with the URL for your datasetgit annex initremote openneuro type=external externaltype=openneuro encryption=none url=https://openneuro.org/git/0/ds0000001
After this you can use regular git-annex or datalad commands to upload or download any annexed files by using the openneuro remote.
# To upload any annexed objects to the remotegit annex copy --to openneuro