Go to the content
or

Debian

 Go back to Planet Debian
Full screen Suggest an article

Benjamin Mako Hill: The Community Data Science Collective Dataverse

June 18, 2017 2:35 , by Planet Debian - 0no comments yet | No one following this article yet.
Viewed one time

I’m pleased to announce the Community Data Science Collective Dataverse. Our dataverse is an archival repository for datasets created by the Community Data Science Collective. The dataverse won’t replace work that collective members have been doing for years to document and distribute data from our research. What we hope it will do is get our data — like our published manuscripts — into the hands of folks in the “forever” business.

Over the past few years, the Community Data Science Collective has published several papers where an important part of the contribution is a dataset. These include:

Recently, we’ve also begun producing replication datasets to go alongside our empirical papers. So far, this includes:

In the case of each of the first groups of papers where the dataset was a part of the contribution, we uploaded code and data to a website we’ve created. Of course, even if we do a wonderful job of keeping these websites maintained over time, eventually, our research group will cease to exist. When that happens, the data will eventually disappear as well.

The text of our papers will be maintained long after we’re gone in the journal or conference proceedings’ publisher’s archival storage and in our universities’ institutional archives. But what about the data? Since the data is a core part — perhaps the core part — of the contribution of these papers, the data should be archived permanently as well.

Toward that end, our group has created a dataverse. Our dataverse is a repository within the Harvard Dataverse where we have been uploading archival copies of datasets over the last six months. All five of the papers described above are uploaded already. The Scratch dataset, due to access control restrictions, isn’t listed on the main page but it’s online on the site. Moving forward, we’ll be populating this new datasets we create as well as replication datasets for our future empirical papers. We’re currently preparing several more.

The primary point of the CDSC Dataverse is not to provide you with way to get our data although you’re certainly welcome to use it that way and it might help make some of it more discoverable. The websites we’ve created (like for the ones for redirects and for page protection) will continue to exist and be maintained. The Dataverse is insurance for if, and when, those websites go down to ensure that our data will still be accessible.


This post was also published on the Community Data Science Collective blog.


Source: https://mako.cc/copyrighteous/cdsc-dataverse

0no comments yet

Post a comment

The fields are mandatory.

If you are a registered user, you can login and be automatically recognized.