Pandas

Calculating daily mean temperatures with scikit-learn

The goal is of this post is to predict the daily mean air temperature TAVG from the following climate data variables: maximum and minimum daily temperatures and daily precipitation, using Python and some machine learning techniques available in…


Vector similarity search with pgvector

In the realm of vector databases, pgvector emerges as a noteworthy open-source extension tailored for Postgres databases. This extension equips Postgres with the capability to efficiently perform vector similarity searches, a powerful technique…


Using a local sentence embedding model for similarity calculation

A simple yet powerful use case of sentence embeddings is computing the similarity between different sentences. By representing sentences as numerical vectors, we can leverage mathematical operations to determine the degree of similarity. For the…


Python plot - Antarctic sea ice extent

Data source : https://ads.nipr.ac.jp/vishop/#/extent REGION SELECTOR = Antarctic At the bottom of the page : Download the sea ice extent (CSV file) - seasonal dataset From the National Institute of Polar Research (Japan) website: The sea-ice extent…


Python plot - North Atlantic daily water surface temperature

Warning: file_get_contents(https://api.github.com/repos/aetperf/aetperf.github.io/contents/_posts/WP_2023-06-13-Python-plot,-North-Atlantic-daily-water-surface-temperature.md?ref=master): failed to open stream: HTTP request failed! in…


Visualizing some polynomial roots with Datashader

Last week-end I found this interesting tweet by sara: The above figure shows all the complex roots from the various polynomials of degree 10 with coefficients in the set $\left\{ -1, 1 \right\}$. It made me think of Bohemian matrix…


Forward and reverse stars in Cython

This notebook is the following of a previous one, where we looked at the forward and reverse star representations of a sparse directed graph in pure Python: Forward and reverse star representation of a digraph. The motivation is to access the…


Forward and reverse star representation of a digraph

In this Python notebook, we are going to focus on a graph representation of directed graphs : the forward star representation [and its opposite, the reverse star]. The motivation here is to access a network topology and associated data efficiently,…


Download some benchmark road networks for Shortest Paths algorithms

Updated September 26, 2022 bugfix The goal of this Python notebook is to download and prepare a suite of benchmark networks for some shortest path algorithms. We would like to experiment with some simple directed graphs with non-negative weights. We…


Testing DuckDB performance with Discogs data

This notebook is a small example of using DuckDB with the Python API. What is DuckDB? DuckDB is an in-process SQL OLAP Database Management System It is a relational DBMS that supports SQL. OLAP stands for Online analytical processing,…


Reading a SQL table by chunks with Pandas

In this short Python notebook, we want to load a table from a relational database and write it into a CSV file. In order to that, we temporarily store the data into a Pandas dataframe. Pandas is used to load the data with read_sql() and later to…


Loading data from PostgreSQL to Pandas with ConnectorX

ConnectorX is a library, written in Rust, that enables fast and memory-efficient data loading from various databases to different dataframes. We refer to this interesting paper, in which the authors provide a detailed analysis of the pandas.read_sql…


Applying a row-wise function to a Pandas dataframe

More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU. Because a lot of things evolved since 2018, this post is kind of an update. For example Pandas tag version was 0.23.3 at that time, it is now 1.4.0.…


data exploration with pandas matplotlib and seaborn

Quick data exploration with pandas, matplotlib and seaborn

In this JupyterLab Python notebook we are going to look at the rate of coronavirus [COVID-19] cases in french departments [administrative divisions of France]. The data source is the french government's open data. We are going to perform a few…


Loading data into a Pandas DataFrame - a performance study

Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python programmers will often load a full dataset into a Pandas dataframe, without actually…


GPU Analytics Ep 3, Apply a function to the rows of a dataframe

The goal of this post is to compare the execution time between Pandas (CPU) and RAPIDS (GPU) dataframes, when applying a simple mathematical function to the rows of a dataframe. Since the row-wise applied function is a re-projection of geographical…


Looping over Pandas data

I recently stumbled on this interesting post on RealPython (excellent website by the way!): Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects This post has different subjects related to Pandas: creating a datetime column…


Pandas Time Series example with some historical land temperatures

Monthly averaged historical temperatures in France and over the global land surface The aim of this notebook is just to play with time series along with a couple of statistical and plotting libraries. Imports %matplotlib inline import pandas as pd…