BitCurator

Software produced for the BitCurator family of research projects (BitCurator, BitCurator Access, BitCurator NLP).

View My GitHub Profile

The BitCurator group on GitHub is the primary store for all source code and development documentation developed as part of the Andrew W. Mellon Foundation funded BitCurator NLP (2016-2018), BitCurator Access (2014-2016), and BitCurator (2011-2014) projects.

Additional information, user documentation, and community contributions to the BitCurator Environment can be found on the BitCurator Environment wiki. Ongoing support for the BitCurator Environment is provided by members of the BitCurator Consortium.

Projects

BitCurator Environment

The BitCurator Environment (Live CD / Installation CD) is comprised of several repositories (two of which, -adduser and -metadata, are optional). End users should not clone or attempt to build these repositories; the current release can be found at http://wiki.bitcurator.net/. Developers should first consult the README in bitcurator-distro for information regarding contributions (test builds, pull requests, and other modifictions). Note that some older repositories are deprecated.

Current repositories:

https://github.com/BitCurator/bitcurator-distro https://github.com/BitCurator/bitcurator-distro-salt https://github.com/BitCurator/bitcurator-distro-adduser https://github.com/BitCurator/bitcurator-distro-metadata

Older (deprecated) repositories:

https://github.com/BitCurator/bitcurator-distro-main https://github.com/BitCurator/bitcurator-distro-bootstrap

BitCurator Access Webtools

The BitCurator Access Webtools project is comprised of a single repository. The README provides instructions for both end users and developers to clone and build from source.

https://github.com/BitCurator/bitcurator-access-webtools

BitCurator Access Redaction Tools

The BitCurator Access Redaction tools project is comprised of a single repository. The README provides instructions for both end users and developers to clone and build from source.

https://github.com/BitCurator/bitcurator-access-redaction

BitCurator NLP Tools

The BitCurator NLP project includes several repositories. The topic model generation environment (bitcurator-nlp-gentm) enables automatic extraction of text from heterogeneous document collections within disk images to generate user-browsable topic models within a web browser. The disk browsing environment (bitcurator-access-webtools) provides full-text browsing of documents contained within disk images, along with (in progress) analysis of entities identified within those documents. Various command-line tools are provided in another repository (bitcurator-nlp-entspan).

https://github.com/BitCurator/bitcurator-nlp-gentm https://github.com/BitCurator/bitcurator-access-webtools https://github.com/BitCurator/bitcurator-nlp-entspan

Dockerization scripts

A script to dockerize bulk_extractor for use in an iRODS-related workflow was produced in 2015. It can be found in the following repository.

https://github.com/BitCurator/bitcurator-docker-builds

Contributors

Members of the BitCurator group on GitHub are listed below (alphabetical order). Interested in being added as a member? Message the group!

Greg Jansen (@gregjansen) - Independent contractor

Sunitha Misra (@sunithamisra) - Software developer, BitCurator

Carl Wilson (@carlwilson) - Technical Lead, Open Preservation Foundation

Kam Woods (@kamwoods) - Technical Lead, BitCurator