Some cool open-source Python packages for Machine Learning Ep 2
8 août 2019
|By François PACULL
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities.
This post is following that one:
(☞゚ヮ゚)☞
Database connectivity
- Turbodbc - a module to access relational databases via the Open Database Connectivity (ODBC) interface.
- ibis - a toolbox to bridge the gap between local Python environments, remote storage, execution systems like Hadoop components (HDFS, Impala, Hive, Spark) and SQL databases.
Data description
- Pandas Profiling - Generates profile reports from a pandas DataFrame.
Data preparation
- Snorkel - a system for quickly generating training data with weak supervision.
- imbalanced-learn - a package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Feature engineering
- dirty_cat - dirty cat helps with machine-learning on non-curated categories, by providing encoders that are robust to morphological variants, such as typos, in the category strings.
Dimension reduction
- ivis - a machine learning algorithm for reducing dimensionality of very large datasets.
Auto-ML
- auto-sklearn - an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
- Auto-Keras - an open source software library for automated machine learning.
- Keras Tuner - An hyperparameter tuner for Keras.
Model analysis
- Skater - a unified framework to enable Model Interpretation for all forms of model.
Workflow management
- prefect - a workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
- papermill - a tool for parameterizing, executing, and analyzing Jupyter Notebooks.
Model management
- Studio - a model management framework written in Python to help simplify and expedite your model building experience.
Data visualization
- kepler.gl - a powerful open source geospatial analysis tool for large-scale data sets with a jupyter widget to render large-scale interactive maps in Jupyter Notebook.
- glue - a library to explore relationships within and among related datasets.
- KeplerMapper - an implementation of the TDA Mapper algorithm for visualization of high-dimensional data.
Models
- pytorch-transformers - a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
- spacy-pytorch-transformers - provides spaCy model pipelines that wrap Hugging Face's pytorch-transformers package, so you can use them in spaCy.
Time series
- STUMPY - a powerful and scalable library that can be used for a variety of time series data mining tasks.