An overview of the RDKit

What is it?

Open source toolkit for cheminformatics

  • Business-friendly BSD license
  • Core data structures and algorithms in C++
  • Python 3.x wrappers generated using Boost.Python
  • Java and C# wrappers generated with SWIG
  • 2D and 3D molecular operations
  • Descriptor generation for machine learning
  • Molecular database cartridge for PostgreSQL
  • Cheminformatics nodes for KNIME (distributed from the KNIME community site: https://www.knime.com/rdkit)

Operational:

  • http://www.rdkit.org
  • Supports Mac/Windows/Linux
  • Releases every 6 months
  • Web presence:
    • Homepage: http://www.rdkit.org Documentation, links
    • Github (https://github.com/rdkit) Downloads, bug tracker, git repository
    • Sourceforge (http://sourceforge.net/projects/rdkit) Mailing lists
    • Blog (https://rdkit.blogspot.com) Tips, tricks, random stuff
    • Tutorials (https://github.com/rdkit/rdkit-tutorials) Jupyter-based tutorials for using the RDKit
    • KNIME integration (https://github.com/rdkit/knime-rdkit) RDKit nodes for KNIME
  • Mailing lists at https://sourceforge.net/p/rdkit/mailman/, searchable archives available for rdkit-discuss and rdkit-devel
  • Social media:
    • Twitter: @RDKit_org
    • LinkedIn: https://www.linkedin.com/groups/8192558
    • Slack: https://rdkit.slack.com (invite required, contact Greg)

History:

  • 2000-2006: Developed and used at Rational Discovery for building predictive models for ADME, Tox, biological activity
  • June 2006: Open-source (BSD license) release of software, Rational Discovery shuts down
  • to present: Open-source development continues, use within Novartis, contributions from Novartis back to open-source version

Integration with other open-source projects

  • KNIME: Workflow and analytics tool
  • PostgreSQL: Extensible relational database
  • Django: “The web framework for perfectionists with deadlines”
  • SQLite - “The most used database engine in the world”
  • Lucene: Text-search engine 1

Usage by other open-source projects

This will, inevitably, be out of date. If you know of others, please let us know or submit a pull request!

  • stk (docs, paper) - a Python library for building, manipulating, analyzing and automatic design of molecules.
  • gpusimilarity - A Cuda/Thrust implementation of fingerprint similarity searching
  • Samson Connect - Software for adaptive modeling and simulation of nanosystems
  • mol_frame - Chemical Structure Handling for Dask and Pandas DataFrames
  • RDKitjs - port of RDKit functionality to JavaScript
  • DeepChem - python library for deep learning for chemistry
  • mmpdb - Matched molecular pair database generation and analysis
  • CheTo (paper)- Chemical topic modeling
  • OCEAN (paper)- Optimized cross reactivity estimation
  • ChEMBL Beaker - standalone web server wrapper for RDKit and OSRA
  • myChEMBL (blog post, paper) - A virtual machine implementation of open data and cheminformatics tools
  • ZINC - Free database of commercially-available compounds for virtual screening
  • sdf_viewer.py - an interactive SDF viewer
  • sdf2ppt - Reads an SDFile and displays molecules as image grid in powerpoint/openoffice presentation.
  • MolGears - A cheminformatics tool for bioactive molecules
  • PYPL - Simple cartridge that lets you call Python scripts from Oracle PL/SQL.
  • shape-it-rdkit - Gaussian molecular overlap code shape-it (from silicos it) ported to RDKit backend
  • WONKA - Tool for analysis and interrogation of protein-ligand crystal structures
  • OOMMPPAA - Tool for directed synthesis and data analysis based on protein-ligand crystal structures
  • OCEAN - web-tool for target-prediction of chemical structures which uses ChEMBL as datasource
  • chemfp - very fast fingerprint searching
  • rdkit_ipynb_tools - RDKit Tools for the IPython Notebook
  • Vernalis KNIME nodes
  • Erlwood KNIME nodes
  • AZOrange

The Contrib Directory

The Contrib directory, part of the standard RDKit distribution, includes code that has been contributed by members of the community.

Footnotes

1: These implementations are functional but are not necessarily the best, fastest, or most complete.

License

This document is copyright (C) 2013-2018 by Greg Landrum

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

The intent of this license is similar to that of the RDKit itself. In simple words: “Do whatever you want with it, but please give us some credit.”