Pandas

Download some benchmark road networks for Shortest Paths algorithms

Download some benchmark road networks for Shortest Paths algorithms Updated September 26, 2022 bugfix The goal of this Python notebook is to download and prepare a suite of benchmark networks for some shortest path algorithms. We would like to…


Testing DuckDB with Discogs data

Testing DuckDB with Discogs data This notebook is a small example of using DuckDB with the Python API. What is DuckDB? DuckDB is an in-process SQL OLAP Database Management System It is a relational DBMS that supports SQL. OLAP stands for…


Reading a SQL table by chunks with Pandas

Reading a SQL table by chunks with Pandas In this short Python notebook, we want to load a table from a relational database and write it into a CSV file. In order to that, we temporarily store the data into a Pandas dataframe. Pandas is used to load…


Loading data from PostgreSQL to Pandas with ConnectorX

Loading data from PostgreSQL to Pandas with ConnectorX ConnectorX is a library, written in Rust, that enables fast and memory-efficient data loading from various databases to different dataframes. We refer to this interesting paper, in which the…


Applying a row-wise function to a Pandas dataframe

More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU. Because a lot of things evolved since 2018, this post is kind of an update. For example Pandas tag version was 0.23.3 at that time, it is now 1.4.0.…


data exploration with pandas matplotlib and seaborn

Quick data exploration with pandas, matplotlib and seaborn

In this JupyterLab Python notebook we are going to look at the rate of coronavirus [COVID-19] cases in french departments [administrative divisions of France]. The data source is the french government's open data. We are going to perform a few…


Loading data into a Pandas DataFrame - a performance study

Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python programmers will often load a full dataset into a Pandas dataframe, without actually…


GPU Analytics Ep 3, Apply a function to the rows of a dataframe

The goal of this post is to compare the execution time between Pandas (CPU) and RAPIDS (GPU) dataframes, when applying a simple mathematical function to the rows of a dataframe. Since the row-wise applied function is a re-projection of geographical…


Looping over Pandas data

I recently stumbled on this interesting post on RealPython (excellent website by the way!): Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects This post has different subjects related to Pandas: creating a datetime column…


Pandas Time Series example with some historical land temperatures

Monthly averaged historical temperatures in France and over the global land surface The aim of this notebook is just to play with time series along with a couple of statistical and plotting libraries. Imports %matplotlib inline import pandas as pd…