Some cool open-source Python packages for Machine Learning Ep 1


There is a very rich ecosystem of Python libraries related to ML. Here is a list of some "active", open-source packages that may be useful for ML day-to-day activities. Of course, this list is far from being exhaustive and should evolve as fast as the Python ecosystem does. Also, we exclude from the current list:

  • main ML algorithm frameworks (scikit-learn, LightGBM, PyTorch, ...),
  • famous user-friendly libraries built on top of deep-learning libraries (fastai, Keras, ...),
  • specific application-oriented libraries (spaCy, scikit-image, StellarGraph, ...),
  • packages dealing with the general data/analytics environment (JupyterLab, Pandas, Dask, Conda, ...) that are also used in many other domains, even if some of the following tools are more on the data-engineering side than on the ML one.

We hope you will find this list informative!

Data cleaning


  • Featuretools - a library for automated feature engineering.
  • TPOT - an automated tool that optimizes ML pipelines using genetic programming.
  • Scikit-Optimize - a simple and efficient library to minimize expensive and noisy black-box functions.
  • Randopt - a package for ML experiment management, hyper-parameter optimization, and results visualization.
  • Optuna - an automatic hyper-parameter optimization software framework, particularly designed for ML.

Dimension reduction and visualization

  • UMAP - Uniform Manifold Approximation and Projection is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction.

Model analysis

  • ELI5 - a library which allows to visualize and debug various ML models using unified API.
  • Yellowbrick - a suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process.
  • SHAP - SHapley Additive exPlanations is a unified approach to explain the output of any ML model.

Experimentation frameworks and tools

  • Guild AI - a toolkit that automates and optimizes ML experiments.
  • ModelChimp - an experiment tracker for Deep Learning and ML experiments.
  • Sacred - a tool to help you configure, organize, log and reproduce experiments.
  • SKLL - SciKit-Learn Laboratory provides command-line utilities to make it easier to run ML experiments with scikit-learn.
  • DVC - Data Version Control is a tool for data science and ML projects.

Model export

  • ONNXMLTools - enables you to convert models from different ML toolkits into ONNX (Open Neural Network Exchange)


  • MLflow - a platform to streamline ML development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
  • Kubeflow - a Cloud Native platform for ML based on Google's internal ML pipelines.

Data pipelines

  • Kedro - a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned.
  • Dagster - a system for building modern data applications.