Datascience
Using Tableau to detect outlying trends and ruptures
Datascience Tableau detects outlying trends using Tabpy. To do that, we are going to get the outliers of a dataset, the change points and a piecewise approximation. For detailled information about the source code, you can look at this article :…
Applying a row-wise function to a Pandas dataframe
More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU. Because a lot of things evolved since 2018, this post is kind of an update. For example Pandas tag version was 0.23.3 at that time, it is now 1.4.0.…
Python Spatial Join with GeoPandas (and GEOS)
Updated Sep 13, 2021 The purpose of this post is to perform an "efficient" spatial join in Python. What is a spatial join? Here is the definition from wiki.gis.com: A Spatial join is a GIS operation that affixes data from one feature layer’s…
Goldbach's Comet with Numba and Datashader
Updated Jul 29, 2021 This Python notebook is about computing and plotting Goldbach function. It requires some basic mathematical knowledge, nothing fancy! The main point is to perfom some computations with Numba and some efficient plotting with…
Quick data exploration with pandas, matplotlib and seaborn
In this JupyterLab Python notebook we are going to look at the rate of coronavirus [COVID-19] cases in french departments [administrative divisions of France]. The data source is the french government's open data. We are going to perform a few…
Optuna and XGBoost on a tabular dataset
Updated Sep 16, 2021 following a comment by @k_nzw about XGBoostPruningCallback The purpose of this Python notebook is to give a simple example of hyperparameter optimization [HPO] using Optuna and XGBoost. We are going to perform a regression…
Saving a tf.keras model with data normalization
Training a DL model might take some time, and make use of some special hardware. So you may want to save the model for using it later, or on another computer. In this short Python notebook, we are going to create a very simple tensorflow.keras…
Benford's law and the population of french cities
In this Python notebook, we are going to look at Benford's law, which predicts the leading digit distribution, when dealing with some real-world collections of numbers. This distribution usually occurs when the numbers are rather smoothly…
Logistic regression with JAX
JAX is a Python package for automatic differentiation from Google Research. It is a really powerful and efficient library. JAX can automatically differentiate some Python code [supports the reverse- and forward-mode]. It can also speed up the…
Lunch break, ridge plots with Bokeh
Bokeh is a great visualization Python library. In this short post, we are going to use it to create a ridge plot. For that purpose, we use the COVID-19 death data from Johns Hopkins University, and plot the daily normalized death rate (100000 *…
Lunch break, fetching AROME temperature forecast in Lyon
Since a "small" heat wave is coming, I would like to get some temperature forecast for the next hours in my neighborhood, from my JupyterLab notebook. We are going to fetch the results from the Météo-France AROME 0.01 model. Here is the AROME item…
Outlier and Change Point Detection in Data Integration Performance Metrics
Data integration involves combining data residing in different sources, and providing users with a unified view of them. In this post, we are interested in detecting performance drift of large and complex daily data integration processes performed…
Some cool open-source Python packages for Machine Learning Ep 5
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities. Previous post list: Some cool open-source Python packages for Machine…
Lunch break, plotting excess death in french department zones with Python
Daily deaths data are provided by INSEE - the national institute of statistics and economic studies. Here is the link of the page displaying these data, and here is a short description: During the Covid-19 pandemic, INSEE is reporting the number of…
A Quick study of air quality in Lyon with Python
The aim of this post is to use Python to fetch air quality data from a web service and to create a few plots. We are going to: look at some daily data from earlier this year, before and after the lockdown, in the city of Lyon, France compare 2020…
Fitting a logistic curve to time series in Python
In this notebook we are going to fit a logistic curve to time series stored in Pandas, using a simple linear regression from scikit-learn to find the coefficients of the logistic curve. Disclaimer: although we are going to use some COVID-19 data in…
Fetching AROME weather forecasts and plotting temperatures
Accurate weather forecasts might be very usefull for various types of models. In this post, we are going to download the latest available weather forecasts for France and plot some temperature fields, using different Python libraries: Arome is…
Some cool open-source Python packages for Machine Learning Ep 4
DS There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities. This post is following these ones: Some cool open-source Python packages…
Some cool open-source Python packages for Machine Learning Ep 3
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities. This post is following these ones: Some cool open-source Python packages…
First try of auto-sklearn
DS Since we are big users of scikit-learn and XGBoost, we wanted to try a package that would automate the process of building a machine learning model with these tools. Here is the introduction to auto-sklearn from its github.io website:…
Some cool open-source Python packages for Machine Learning Ep 2
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities. This post is following that one: Some cool open-source Python packages for…
Loading data into a Pandas DataFrame - a performance study
Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python programmers will often load a full dataset into a Pandas dataframe, without actually…
Some cool open-source Python packages for Machine Learning Ep 1
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some "active", open-source packages that may be useful for ML day-to-day activities. Of course, this list is far from being exhaustive and should evolve as fast…
Pandas Time Series example with some historical land temperatures
Monthly averaged historical temperatures in France and over the global land surface The aim of this notebook is just to play with time series along with a couple of statistical and plotting libraries. Imports %matplotlib inline import pandas as pd…
Lyon DataVis and AI mini-conference
Yesterday I went to this mini-conference at ENS Lyon and enjoyed it very much. Here is a very short description of the speakers/subjects: Lane Harrison Lane is studying how people interact with data visualization: how do they explore a complex…