Blog | Architecture & Performance

Databaseen13 juil. 2026

Fast Geographic Data Extraction from SQL Server to Parquet with FastBCP

SQL Server holds spatial data in geography and geometry columns.

#FastBCP#ArrowTDS#SQL Server

Data Visualizationfr4 juil. 2026

Cycle diurne de la température d'été à Lyon-Bron, par décennie

Météo-France publie ses données climatologiques de base horaires en open data sur data.gouv.fr.

#Python#pandas#Matplotlib

Pythonen29 mai 2026

German-style strings in Apache Arrow

Version 1.4 of the Apache Arrow Columnar Format added a new way to store variable-length values, the Variable-size Binary View Layout.

#Apache Arrow#StringView#BinaryView

SQL Serveren26 mai 2026

PerformanceStudio: Better Query Store Analysis for SQL Server

PerformanceStudio is an open-source GUI for SQL Server performance troubleshooting.

#SQL Server

SQL Serveren1 mai 2026

Database Engines Trends Dashboard

DB-Engines.com is a website that reference and rank hundreds of database engines, threw the most famous ones likes Oracle, MySQL, SQL Server, PostGreSQL…

#database#Dataviz#Tableau

Machine Learning & AIen27 avr. 2026

Running a private Qwen3.6 Coding Agent on Scaleway

This is the Scaleway sequel to two previous posts on self-hosted coding agents: the AWS run where we deployed a 27B model on a g5.xlarge, and the…

#LLM#Qwen#llama.cpp

Machine Learning & AIen18 avr. 2026

Running a Local LLM on a Consumer GPU [8 GB VRAM]

In a previous post, we ran Qwen3.5-27B on an AWS g5.xlarge with an NVIDIA A10G (24 GB VRAM).

#LLM#Qwen#llama.cpp

Machine Learning & AIen11 avr. 2026

Running a Local LLM Coding Agent on AWS

We deploy Qwen3.5-27B on a single GPU instance, serve it with llama.cpp, wire it to OpenCode as a coding agent frontend, and run ToolCall-15 to measure…

#LLM#Qwen#llama.cpp

Performanceen25 mars 2026

Summation of Floating-Point Numbers in Cython

Adding up a list of floating-point numbers: seems simple. IEEE 754 double-precision floats (float64) carry about 15 to 17 significant decimal digits, and…

#Python#Cython#floating-point

Technologyen4 mars 2026

My First Lean 4 Proof: the Irrationality of √2

Formal verification uses computers to automatically check whether proofs are correct.

#Lean#Lean 4#formal verification

SQL Serveren31 janv. 2026

Improve SQL Server Performance with sp_UpdateStats2 Statistics Maintenance

Updating statistics is an important aspects of SQL Server performance tuning.

#SQL Server#dba

SQL Serveren18 janv. 2026

SQL Server Cumulative Updates Dashboard

Since the version 2017 Microsoft does not use Service Packs anymore: only Cumulative Updates.

#SQL Server

Databaseen27 nov. 2025

An example ETL Pipeline with dlt + SQLMesh + DuckDB

In this post, we walk through building a basic ETL (Extract-Transform-Load) pipeline.

#DuckDB#SQLMesh#dlt

SQL Serveren27 nov. 2025

Get your embeddings on SQL Server 2025 with AI_GENERATE_EMBEDDINGS and EXTERNAL MODEL using OLLAMA local and your GPU

SQL Server 2025 introduces a powerful new feature: native vector embeddings generation using the AI_GENERATE_EMBEDDINGS function.

#Embeddings#SQL Server#vector

Data Visualizationfr20 nov. 2025

Evolution des prix de l'immobilier en France - DVF à l'iris

En croisant les données DVF (Demandes de Valeurs Foncières) maintenant disponibles en open data, nous avons construit ce dashboard.

#Business Intelligence#Dataviz#Spatial

SQL Serveren10 nov. 2025

sp_CompareTables : compare tables at column level in SQL Server

sp_CompareTables is a stored procedure for SQL Server that compares two tables (or views) defined in the configuration table T_COMPARE_CONFIG.

#SQL Server#data quality

Databaseen25 oct. 2025

Performance Analysis of Parallel Data Replication Between Two PostgreSQL 18 Instances on OVH

Parallel data replication between PostgreSQL instances presents unique challenges at scale, particularly when attempting to maximize throughput on…

#FastTransfer#PostgreSQL 18#Performance analysis

Databaseen4 oct. 2025

PG_FASTBCP Postgres Extension

Moving large volumes of data in and out of a PostgreSQL database has always been a critical challenge for developers, data engineers, and database…

#FastBCP#Performance#export

Databaseen29 sept. 2025

Data Loading from S3 Parquet Files to PostgreSQL on OVH Cloud

Loading data efficiently from cloud storage into databases remains a critical challenge for data engineers.

#FastTransfer#S3 Parquet files#PostgreSQL bulk loading

Databaseen29 sept. 2025

FastTransfer Performance with Citus Columnar Storage in PostgreSQL

Data migration between database systems often becomes a bottleneck in modern data pipelines, particularly when dealing with analytical workloads.

#FastTransfer#Citus PostgreSQL#Columnar storage

Databaseen29 sept. 2025

High-Speed PostgreSQL Replication on OVH with FastTransfer

PostgreSQL-to-PostgreSQL replication at scale requires tools that can fully leverage modern cloud infrastructure and network capabilities.

#FastTransfer#PostgreSQL replication#OVH

Data Visualizationen25 juil. 2025

Le Tour de France History Dashboard

After scraping historical data from the "Tour de France" website, we will leverage this data to build a beautiful dashboard.

#Tableau#Dataviz#web scraping

Databaseen14 juil. 2025

Parallel Data Movement Made Simple: FastTransfer & FastBCP Explained

In this article, we’ll show you why traditional tools are no longer fit for today’s servers — and how FastTransfer and FastBCP were built from the ground…

#FastTransfer#FastBCP

Data Visualizationen12 juil. 2025

A Git commit temporal analysis

In this Python notebook, we are going to analyze git commit timestamps across multiple repositories to identify temporal patterns in a git user coding…

#Python#git#Numba

SQL Serveren1 juil. 2025

SQL Server Features Comparison and Evolution

Let's concentrate on the last 3 versions to see features evolution and comparison between edition.

#database#SQL Server#Dataviz

Databaseen30 juin 2025

Oracle to PostgreSQL Data Transfer : pg_fasttransfer vs oracle_fdw

Migrating large tables from Oracle to PostgreSQL can be challenging when performance and reliability matter.

#FastTransfer#pg_fasttransfer#Oracle

Databaseen26 juin 2025

PostgreSQL Data Transfer: pg_fasttransfer vs postgres_fdw

Transferring large volumes of data between PostgreSQL databases can be challenging when performance, security, and simplicity matter.

#PostgreSQL#COPY#FastTransfer

SQL Serveren25 juin 2025

SQL Server Schema Restore : structures and data

on github: (https://github.com/aetperf/dbatoolsScripts/blob/main/DBA/Restore/restoreSchema.ps1) All objects within a specific schema in a target database…

#SQL Server#dba

Data Visualizationen23 mai 2025

Computing and Visualizing Billions of Bohemian Eigenvalues with Python

In our case, we sample 5x5 matrix entries from the discrete set {-1, 0, 1}.

#Python#Numba#Datashader

Machine Learning & AIen11 mai 2025

Zipf's Law on "La Comédie humaine"

In this Python notebook, we're going to explore Zipf's law as applied to words. This is the definition of Zipf's law from its wikipedia page.

#scraping#Python#scikit-learn

SQL Serveren25 avr. 2025

FastTransfer VS BCP

In data engineering and database administration, efficiently transferring large volumes of data between SQL Server instances is a common and critical…

#FastTransfer#dba#SQL Server

Technologyen4 avr. 2025

Powershell IOMeter Parser

When it comes to benchmarking disks, choosing the right tool can be a challenging task.

SQL Serveren2 avr. 2025

Licence to sp_kill on SQL Server

When it comes to administering SQL Server, you sometimes need to terminate (“kill”) an active session or process.

#SQL Server

Data Visualizationen31 mars 2025

Arnold tongues with Numba, Numba CUDA and Datashader

This Python notebook explores Arnold tongues. Here is a short description from wikipedia: In simpler terms, Arnold tongues represent regions where two…

#numba-cuda#Datashader#Python

Databaseen4 mars 2025

Spatial queries in DuckDB with R-tree and H3 indexing

This Python notebook explores the use of two indexing techniques, R-tree and H3, for performing spatial queries in DuckDB.

#Python#DuckDB#Spatial

Databaseen1 mars 2025

In-memory cosine distance calculation, NumPy vs DuckDB

This notebook compares the performance of NumPy and DuckDB for computing cosine distances on the same dataset.

#Numpy#Cosine distance#vector

Databaseen18 févr. 2025

Querying Private s3 Buckets with DuckDB

DuckDB is an in-process SQL database that allows you to query data from various sources, including private AWS s3 buckets.

#Python#DuckDB#secrets

Data Visualizationen12 déc. 2024

Pizzerias Around the World

In November 2024, Foursquare released a new dataset called Foursquare Open Source Places.

#Python#Datashader#Geospatial

Data Visualizationen24 nov. 2024

Visualizing the Ulam Spiral with Python

The Ulam spiral arranges integers in a spiral and highlights prime numbers, revealing some diagonal alignment patterns.

#Python#Cython#Matplotlib

Machine Learning & AIen10 nov. 2024

Analyzing tramontane wind frequency

The tramontane wind, a defining feature of Pyrénées-Orientales region, is known for sweeping through the south of France with its characteristic force and…

#Python#Pandas#regression

SQL Serveren6 nov. 2024

Beyond LAST_VALUE IGNORE NULLS: Efficient Alternatives for Handling Missing Data in SQL Server

When working with SQL Server, managing missing data efficiently is crucial, especially in large datasets.

#SQL Server#Performance#SQL

Databaseen4 nov. 2024

Using DuckDB with Tableau Desktop on Windows

DuckDB is an in-process SQL OLAP database management system. These database is very fast for OLAP queries and have several advantages: There is also…

#DuckDB#database#OLAP

SQL Serveren23 oct. 2024

LAST_VALUE performance issue on SQL Server

The story begins after migrating table structures and data from Netezza to SQL Server.

#database#SQL Server

Data Visualizationen5 août 2024

Finding the most concentrated areas of bakeries in Lyon

In this blog post, we'll use Python to analyze the distribution of bakeries in Lyon, France.

#Gaussian KDE#Kernel Density Estimation#Python

Databaseen30 mai 2024

A Hybrid information retriever with DuckDB

When it comes to information retrieval, vector search methods have demonstrated some good performance, especially when the embedding models have been…

#Python#DuckDB#Hybrid information retriever

Technologyen12 mai 2024

Databases Dashboard V2

A new version of the databases dashboard that allow you to visit near 1000 databases, writen in near 40 programming languages aver 55 countries and more…

Technologyfr12 mai 2024

La protection des données : partie 1 - les données

À l’ère numérique d’aujourd’hui, les données sont l’or noir de notre époque, le volume de données mondiale a été multiplié par 30 au cours des 10…

#protection#data

Geospatial & Graphsen8 avr. 2024

Enrich a digital terrain model

In this blog post, we will explore how to enrich a digital terrain model [DTM] using Python.

#RichDem#Geospatial#Digital Terrain Model

Machine Learning & AIen1 mars 2024

Calculating walking isochrones with Python

In this blog post, we'll explore how to calculate walking isochrones using Python, taking into account the slope of the terrain.

#Python#Isochrones#Spatial analysis

Performanceen29 févr. 2024

Fully Amortized Loan simulation with Numba and IPyWidgets

In this blog post, we will show how to use Python to simulate the amortization of a fully amortized loan, such as a mortgage or a car loan.

#Python#Numba#IPyWidgets

Geospatial & Graphsen31 janv. 2024

Create a routable pedestrian network with elevation

In this blog post, we will explore how to create a routable pedestrian network with elevation using Python and OSMnx.

#Python#OSM#OSMnx

Databaseen8 janv. 2024

Streaming data from PostgreSQL to a CSV file as fast as possible

In this post, we explore the process of streaming data from a PostgreSQL database to a CSV file using Python.

#Python#SQL#PostgreSQL

Geospatial & Graphsen26 déc. 2023

Lyon's Digital Terrain Model with IGN Data

In this post, we explore how to extract and merge data from a french high-resolution Digital Terrain Model [DTM].

#Python#Geospatial#raster

SQL Serveren22 déc. 2023

SQL Server AlwaysOn Feature Tutorial : Part 3 - install SQL Server on a domain VM

In these articles, we’ll look at how to set up high availability on one or more databases with the AlwaysOn feature, using the « dbatools » PowerShell…

#SQL Server

Technologyen22 déc. 2023

SQL Server AlwaysOn Feature Tutorial : Part 4 - Install and configure a Windows Failover Cluster

In these articles, we’ll look at how to set up high availability on one or more databases with the AlwaysOn feature, using the « dbatools » PowerShell…

Technologyen22 déc. 2023

SQL Server AlwaysOn Feature Tutorial : Part 5 - Activate AlwaysOn on SQL Server

In these articles, we’ll look at how to set up high availability on one or more databases with the AlwaysOn feature, using the « dbatools » PowerShell…

Technologyen12 déc. 2023

SQL Server AlwaysOn Feature Tutorial: Part 1 - Create a Windows Server 2022 Virtual Machine

In these articles, we'll look at how to set up high availability on one or more databases with AlwaysOn functionality, using the "dbatools" PowerShell…

Geospatial & Graphsen11 déc. 2023

A Transit graph for static macro assignment

This post is a description of a graph structure for a transit network, used for static, link-based, frequency-based assignment.

#Python#Graph#Transit network

Databaseen15 nov. 2023

Methane Gaz Emissions

The Emissions Database for Global Atmospheric Research (EDGAR) provides independent estimates of the global anthropogenic emissions and emission trends…

#database#GHG#OpenData

Databaseen15 nov. 2023

Parquet file sorting test

Some time ago, we came across an intriguing Parquet sorting test shared by Mimoune Djouallah on Twitter @mim_djo.

#Python#TPC-H#Benchmark

Pythonen10 nov. 2023

Installing your Python package on a Windows machine that does not have internet access

Suppose you've developed a Python package called MyPackage on Linux, with specific package requirements, and need to install it on a Windows machine that…

#Python#Python Installation#Windows

SQL Serveren19 oct. 2023

How minimal is SQL Server Minimal Logging

One day, one of my clients complained about the size of his SQL Server log. He mentioned that it had reached 1GB for just 1 million inserted rows.

#database#SQL Server

Databaseen12 oct. 2023

The Database of Databases Dashboard

After extensively delving into the database world through dbdb.io and enjoying numerous online courses that delve deep into databases (a subject I'm truly…

#database

Databaseen7 oct. 2023

ROLLBACK DELETE vs TRUNCATE vs DROP in several RDBMS

What are the differences between DELETE vs TRUNCATE vs DROP? The first one is DML whereas the last two are DDL but can which are rollbackle and which are…

#database

Databaseen25 sept. 2023

Oracle Database Features Evolution

From version 11.2 to the 23c version oracle database have more than 1600 features.

#database#Oracle

Machine Learning & AIen15 sept. 2023

Calculating daily mean temperatures with scikit-learn

The goal is of this post is to predict the daily mean air temperature TAVG from the following climate data variables: maximum and minimum daily…

#Python#supervised learning#regression

Databaseen13 sept. 2023

Oracle Database Editions Comparator

Oracle provide many informations about features that are available or not in their licensing page…

#database#Oracle

Databaseen30 août 2023

Vector similarity search with pgvector

In the realm of vector databases, pgvector emerges as a noteworthy open-source extension tailored for Postgres databases.

#Python#vector database#pgvector

Data Visualizationfr25 août 2023

Dashboard sur les entreprises de Genève en Suisse

A partir des données open data du canton de Genève sur les entreprises j'ai réussi à faire un dashboard qui je l'espère vous sera utile.

#Dashboard#OpenData#Suisse

Data Visualizationen16 août 2023

Using a local sentence embedding model for similarity calculation

A simple yet powerful use case of sentence embeddings is computing the similarity between different sentences.

#Python#sentence embedding#BAAI/bge-base-en

Data Visualizationen14 août 2023

Python plot - Antarctic sea ice extent

Data source: https://ads.nipr.ac.jp/vishop/#/extent REGION SELECTOR = Antarctic At the bottom of the page: Download the sea ice extent (CSV file) -…

#Python#Pandas#Matplotlib

SQL Serveren24 juin 2023

Semantic Vector Search with SQL Server and the help of KMeans indexing

In the previous article Semantic Vector Search with SQL Server we discuss about the possibility to make semantic search with vectors that comes from…

#SQL Server#semantic search#vector

SQL Serveren19 juin 2023

SQL Server vector search

Some time ago i read a blog article from a senior Microsoft azure programmer Davide Mauri: The article interest me because the current "semantic search"…

#Performance#SQL Server#SQL

Data Visualizationen13 juin 2023

Python plot - North Atlantic daily water surface temperature

Data source: https://climatereanalyzer.org [NOAA Optimum Interpolation SST [OISST] dataset version 2.1] In the present dataset, the surface temperature is…

#Python#Pandas#Matplotlib

Databaseen13 mai 2023

PostgreSQL Features Matrix GPT-Optimized

PostgreSQL is a hip and happening DBMS that's gaining more and more popularity.

#PostgreSQL#database

Geospatial & Graphsen10 mai 2023

Hyperpath routing in the context of transit assignment

How do transit passengers choose their routes in a complex network of lines and services?

#Python#Graph#Network

Databaseen27 avr. 2023

TPC-H benchmark of DuckDB and Hyper on native files

In this blog post, we examine the performance of two popular SQL engines for querying large files: These engines have gained popularity due to their…

#TPC-H#Benchmark#SQL

Databaseen4 avr. 2023

TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS

In this blog post, we explore the use of two SQL engines, and specifically their Python API, for querying files.

#TPC-H#Benchmark#SQL

Databaseen3 avr. 2023

TPCH with Snowflake : SF100

Snowflake is a fantastic Datawarehouse and Datalake SaaS Solution!

#database#Benchmark#TPC-H

Databaseen30 mars 2023

TPC-H benchmark of Hyper, DuckDB and Datafusion on Parquet files

In this blog post, we focus on directly querying Parquet files using three different SQL engines, and more specifically their Python API: The TPC-H…

#TPC-H#Benchmark#SQL

SQL Serveren23 mars 2023

TPCH SF10 : Query 13 and SQL Server Collations Performance Impact

After benchmarking several cloud databases (Snowflake, BigQuery, SingleStore, Databricks) using TPCH SF10 data, after benchmarking DuckDB and Tableau…

#TPC-H#SQL Server#SQL

Databaseen20 mars 2023

TPCH SF10 : Tableau Hyper Engine vs DuckDB vs Snowflake vs BigQuery vs Databricks vs SingleStore

After a first try with TPCH SF10 using DuckDB on 2 differents laptops and compare parquet storage vs native storage (see TPCH SF10 using DuckDB vs…

#database#Benchmark#DuckDB

Databaseen12 mars 2023

TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks

I was very interested in 2 articles of Mimoune Djouallah (aka mim) that compare Snowflake, BigQuery, Databricks, SingleStore, PowerBI Datamart and DuckDB…

#database#Benchmark

Data Visualizationen8 mars 2023

Gender Equality Indexes : 2023 Edition

Gender equality index is an indicator created by EIGE. It measures the progress of gender equality in the European union countries.

#Dataviz

SQL Serveren8 mars 2023

SQL Server Extended Events Dashboard

SQL Server Extended Events is a powerful feature in Microsoft SQL Server that enables database administrators to capture and analyze events that occur…

#database#SQL Server#Performance

SQL Serveren18 févr. 2023

SQL Server editions comparison dashboard by domains features and scale capacities

The article allow you to direct use the sql server editions comparison dashboardYou can select the version of SQL ServerYou can select a group of features…

#database#SQL Server

Data Visualizationfr17 févr. 2023

Tour de France des revenus des français

A partir des données Fiscales (FiLoSoFi) fournies par l'INSEE en open data, nous avons voulus étudier les revenus médians des français au travers un…

#Dataviz#FiLoSoFi#OpenData

Data Visualizationfr15 févr. 2023

Cartographie FiLoSoFi : données sur les revenus médians en France

Comme indiqué sur son site, l'INSEE fournit des informations socio-économiques sur près de 30 millions de ménages.

#Tableau#Dashboard#OpenData

Performanceen27 déc. 2022

Dijkstra's algorithm in Cython, part 3/3

Running time of Dijkstra's algorithm on DIMACS networks with various implementations in Python.

#Python#Dijkstra#Shortest path

Performanceen21 déc. 2022

Dijkstra's algorithm in Cython, part 1/3

In this post, we are going to present an implementation of Dijkstra's algorithm in Cython. Dijkstra's algorithm is a shortest path algorithm.

#Python#Dijkstra#Shortest path

Performanceen21 déc. 2022

Dijkstra's algorithm in Cython, part 2/3

This post is the second part of a three-part series. In the first part, we looked at the Cython implementation of Dijkstra's algorithm.

#Python#Dijkstra#Shortest path

Performanceen23 nov. 2022

A Cython implementation of a priority queue

Credit: Musée de l'illusion, Lyon [picture taken by myself] In this post, we describe a basic Cython implementation of a priority queue.

#Python#Cython#Priority queue

Data Visualizationen15 nov. 2022

Visualizing some polynomial roots with Datashader

Last week-end I found this interesting tweet by sara: The above figure shows all the complex roots from the various polynomials of degree 10 with…

#Python#Polynomial#Datashader

Performanceen4 nov. 2022

Forward and reverse stars in Cython

This notebook is the following of a previous one, where we looked at the forward and reverse star representations of a sparse directed graph in pure…

#Python#Graph#Cython

Geospatial & Graphsen21 oct. 2022

Forward and reverse star representation of a digraph

In this Python notebook, we are going to focus on a graph representation of directed graphs: the forward star representation [and its opposite, the…

#Python#Graph#Network

Performanceen8 oct. 2022

Cloud Comparator : CPU, RAM, Price of VMs

Cloud-Mercato.com is a mine of information about pricing in the cloud world.

#Performance#cloud#finops

Databaseen4 oct. 2022

Query Parquet files with DuckDB and Tableau Hyper engines

In this notebook, we are going to query some Parquet files with the following SQL engines: Both of these tools are optimized for Online analytical…

#Python#SQL#DuckDB

Databaseen23 sept. 2022

Download some benchmark road networks for Shortest Paths algorithms

The goal of this Python notebook is to download and prepare a suite of benchmark networks for some shortest path algorithms.

#Python#Networks#Pandas

Performanceen19 sept. 2022

Intel Processors Comparator

Intel Processor have several characteristics beyond frequency and number of cores.

#Performance#Hardware

Performanceen26 juil. 2022

Euler's number and the uniform sum distribution

Last year I stumbled upon this tweet from @fermatslibrary: I find it a little bit intriguing for Euler's number e to appear here!

#Python#Numba#Probability theory

Databaseen6 juil. 2022

Trying DuckDB with Discogs data

This notebook is a small example of using DuckDB with the Python API. It is a relational DBMS that supports SQL.

#Python#SQL#Pandas

Databaseen20 juin 2022

Talend Quick Tips : Increase BigQuery Talend timeout in components

In this article I will show you how to change a critical setting in Talend: the timeout.

#Talend#BigQuery#Performance

SQL Serveren13 juin 2022

Insert data from SQL Server into Excel crosstab

In this article, I will show you how to insert data into an Excel spreadsheet and safely keep your content.

#SQL Server#Excel#powershell

Data Engineeringen6 juin 2022

Dynamic TaskGroup in Airflow 2.0

In this article we will uncover a way to use Airflow new feature called TaskGroup which allow you to manage your dependencies in a dynamic way.

#Airflow#Scheduling#Python

Data Engineeringen2 juin 2022

Dynamic TaskGroup Scalability in Airflow 2.0

In the previous article I showed you how to instantiate TaskGroup in a Dynamic way.

#Airflow#Scheduling#Python

SQL Serveren28 mai 2022

SQL Server Standard vs Enterprise Edition Features History

Along the time Microsoft improve performance and add features to the SQL Server Database.

#SQL Server

Pythonen16 mai 2022

Reading a SQL table by chunks with Pandas

In this short Python notebook, we want to load a table from a relational database and write it into a CSV file.

#Python#Pandas#SQL

Data Engineeringen9 mai 2022

Apache Airflow 2.0 : How it works

Airflow became in the last recent years a major actor for scheduling a wide variety of actions.

#Airflow#Scheduling#Python

Data Visualizationen9 mai 2022

Loading data from CSV files into a Tableau Hyper extract

Hyper is Tableau’s in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets.

#Python#Tableau#Hyper

Performanceen26 avr. 2022

More Heapsort in Cython

This post/notebook is the follow-up to a recent one: Heapsort with Numba and Cython, where we implemented heapsort in Python/Numba and Cython and compared…

#Python#Performance#Benchmark

Databaseen20 avr. 2022

Loading data from PostgreSQL to Pandas with ConnectorX

ConnectorX is a library, written in Rust, that enables fast and memory-efficient data loading from various databases to different dataframes.

#Python#Performance#Benchmark

Databaseen16 avr. 2022

Export data as fast as possible : from HANA to CSV

I want to export from HANA to CSV. As source, a table or a sql query, as target an external client, of course as fast as possible and using a command line…

#hana#database#Performance

Performanceen14 avr. 2022

Heapsort with Numba and Cython

Heapsort is a classical sorting algorithm. We are going into a little bit of theory about the algorithm, but refer to Corman et al.

#Python#Performance#Dev

Data Visualizationen8 avr. 2022

Using Tableau to detect outliers and ruptures

To do that, we are going to get the outliers of a dataset, the change points and a piecewise approximation.

#Tableau#Python#Dataviz

Data Visualizationen1 avr. 2022

Tableau 2022 Optimize Workbook Feature : first test

"Optimize Workbook" is a new feature of Tableau 2022. .. but infortunatly the link redirect tableau.com root website where we could expect a technical…

#Tableau#Dataviz#Performance

Data Visualizationen28 mars 2022

Tableau Performance Tips #8 : Avoid using Tableau Groups

Tableau allow users to build their own custom groups very easily. This is a convenient way to regroup data on elements that users want to see regrouped.

#Tableau#Dataviz#Performance

Data Visualizationen27 mars 2022

Gartner Magic Quadrant BI 15 years History

Business Intelligence (BI) tools are competing for years. Gartner Magic Quadrant is one of the famous benchmark that classify mainstream BI tools.

#Dataviz#Business Intelligence#Benchmark

SQL Serveren25 mars 2022

SQL Server Trace Flags Classification

_Trace flags are used to set specific server characteristics or to alter a particular behavior.

#database#Performance#SQL Server

Data Visualizationen23 mars 2022

Tableau Performance Tips #7 : Prefer to use a one row datasource to display static metadata informations

Tableau does not have really customisable buttons. I mean you can put image on your dashboard but you cannot display the text you want with the formatting…

#Dataviz#Tableau#Performance

Data Visualizationen21 mars 2022

Plotting population density with datashader

In this short post, we are using the Global Human Settlement Layer from the European Commission: The downloaded file has a worldwide resolution of 250m…

#Dataviz#Python#Spatial

Data Visualizationen20 mars 2022

Tableau Performance Tips #6 : Avoid using NOW() for filtering or selecting against a fact datasource

Some databases implement a result cache (or query cache depending the name), it is a cache for the results of some queries.

#Dataviz#Tableau#Performance

Data Visualizationen17 mars 2022

Tableau Performance Tips #5 : Sort only with element(s) in the view

A performance issue that is not well known is that Tableau sorts is not done by the datasource but by tableau itself.

#Dataviz#Tableau#Performance

SQL Serverfr16 mars 2022

Compression des données XML dans SQL Server

On stocke des données XML dans une table qui commence à grossir sérieusement et il devient important de réfléchir à des solutions pour essayer de gagner…

#SQL Server#database#Performance

Data Visualizationen14 mars 2022

Tableau Performance Tip #4 : Avoid using a big datasource to display semi-constant informations

More than often you will want to display semi-constant information like the current timestamp, the Tableau Username, a chosen currency or a simple single…

#Dataviz#Tableau#Performance

Data Visualizationen11 mars 2022

Tableau Performance Tips #3 : Avoid small list of values to be in the context

Let's begin with a definition of what a list of values is. A list of values is linked to a filter. The filter can be in the context or not. Hum...

#Dataviz#Tableau#Performance

Data Visualizationfr8 mars 2022

8 mars 2022 : Journée internationale des droits des femmes

Après quelques semaines à étudier de l'open data sur le thème des inégalités homme/femme, voici une visualisation sur l'index d'egalité de genre de l'EIGE…

#Dataviz

Data Visualizationen8 mars 2022

Tableau Performance Tips #2 : Avoid total and sub-totals when using a count distinct metric aggregate

If you use a total (or worse) sub-totals when you have metric(s) that is count distinct, this will lead to a second (or several) sequential pass for each…

#Dataviz#Tableau#Performance

Performanceen3 mars 2022

Applying a row-wise function to a Pandas dataframe

More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU.

#Datascience#Python#Performance

Data Visualizationen26 févr. 2022

Tableau Performance Tips #1 : Tableau Performance Recording

Introduction to Tableau Performance RecordingIn this article we will discover how to diagnose your performances with the Tableau performance recording.

#Dataviz#Tableau#Performance

Performanceen15 févr. 2022

A Parallel loop in Python with Joblib.Parallel

The goal of this post is to perform an embarrassingly parallel loop in Python, with the same code running on different platforms [Linux and Windows].

#Python#Performance

Technologyfr2 févr. 2022

DataOops : le podcast francophone sur la data et le devops

Retrouvez Eric Duquesnoy pour des news et des explications autour du DevOps, Fabien Beaumont pour affuter vos connaissances sur la modélisation des…

#Dataoops

SQL Serveren3 janv. 2022

150 T-SQL Bad Practices

Many times i was asking for given best practices for T-SQL code but I don't like best practices advices.

#database#SQL#SQL Server

Data Visualizationen4 déc. 2021

Tableau Server performance impacted by version history depth of datasources and workbooks

After several tests on real word tableau production environment (+1000 workbooks, +100 shared datasource) we discover that object version history have an…

#Dataviz#Performance#Tableau

Geospatial & Graphsen10 sept. 2021

Python Spatial Join with GeoPandas (and GEOS)

The purpose of this post is to perform an "efficient" spatial join in Python. What is a spatial join?

#Datascience#Python#Spatial

Machine Learning & AIen3 sept. 2021

Built-in Expectations in Great Expectations

Great expectation is a Python tool for data testing, documentation, and profiling.

#Python#Data testing#MLOps

Data Visualizationen26 août 2021

COVID Deaths Worldwide Evolution Dashboard

This dashboard show main key figures for COVID deaths wordwide and the evolution of number of deaths.

#Business Intelligence#Dataviz#Santé

Data Visualizationfr7 août 2021

Tests, incidence, hospitalisations, réanimations et décès du au COVID en France par départements et par Régions

Filtrage via les cartes régions et département montrant l'incidence (hebdomadaire) du COVID en France pour la dernière semaine pleine (mise à jour le…

#Business Intelligence#Dataviz#Santé

Data Visualizationfr29 juil. 2021

Incidence COVID en France par départements et par tranches d'âge dernière semaine

Grâce aux données fournies par le SidDep voici une analyse de Incidence COVID en France par départements et par tranches d'age pour la dernière semaine…

#Business Intelligence#Dataviz#Santé

Data Visualizationen22 juil. 2021

Goldbach's Comet with Numba and Datashader

This Python notebook is about computing and plotting Goldbach function. It requires some basic mathematical knowledge, nothing fancy!

#Datascience#Python#Datashader

Pythonen2 juil. 2021

Le tour de france history web scraping

This file downloads raw data about every rider of every Tour de France (from 1903 up to 2020).

#Python#web scraping#Sport

Data Visualizationen9 juin 2021

Impact of food on CO2 production : Eat Better - Eat Local

A viz showing the Impact of food on CO2 production by showing how much Kg of CO2 is produced for 1 Kg of foods.

#Business Intelligence#Dataviz#Santé

Data Visualizationfr31 mai 2021

COVID Hospitalisations, Réanimations et Décès en France par départements

Afin de produire la visualisation montrant pour le COVID, les hospitalisations, réanimations et décès en France par départements, j'ai utilisé un web data…

#Business Intelligence#Dataviz#Santé

Data Visualizationfr26 mai 2021

Part des personnes testées positives au COVID en France

Après l'analyse des données COVID par régions et tranches d'âge, j'ai produit une nouvelle visualisation toujours basée sur les données Sidep sur…

#Business Intelligence#Dataviz#Santé

Data Visualizationen4 mai 2021

Stack Overflow Trends

Starting using Brent Ozar Stack Overflow database extract i tried to build a dashboard that show evolution of Tags Trends over time and if possible…

#Dataviz#Benchmark

Data Visualizationfr26 avr. 2021

Incidence COVID en France par départements et par tranches d'âge

Après l'analyse des données COVID par régions et tranches d'âge, j'ai récupéré les dernières données (mise à jour une fois par semaine) et détaillé…

#Business Intelligence#Santé#Dataviz

Data Visualizationfr15 avr. 2021

Incidence COVID en France par région et par tranche d'age

Après l'analyse des données COVID France-Départements et tranches d'âge initiée par François j'ai décidée de relever le challenge et de faire un dashboard…

#Business Intelligence#Dataviz

Pythonen1 avr. 2021

Some Pre-commit git hooks for Python

Pre-commit hooks are a great way to automatically check and clean the code. They are executed when committing changes to git.

#Dev#Python

Data Visualizationen24 mars 2021

Quick data exploration with pandas, matplotlib and seaborn

In this JupyterLab Python notebook we are going to look at the rate of coronavirus [COVID-19] cases in french departments [administrative divisions of…

#Datascience#Dataviz#Python

Data Visualizationen28 févr. 2021

Gartner analytics and Business Intelligence tools comparator

As promised here is an article on the Gartner® analytics and Business Intelligence tools comparator based on the capabilities and use cases of the tools.

#Business Intelligence#Dataviz#Benchmark

Performanceen16 févr. 2021

Optuna and XGBoost on a tabular dataset

The purpose of this Python notebook is to give a simple example of hyperparameter optimization [HPO] using Optuna and XGBoost.

#Datascience#Python#Performance

SQL Serverfr9 févr. 2021

SQL Server CLR Functions vs SQL 2019 Function Inlining

Ce tutorial montre toutes les étapes pour: Découvrez également la nouvelle fonctionnalité SQL Server 2019: le "function inlining" et comment l'activer.

#SQL#Performance#SQL Server

Machine Learning & AIen29 janv. 2021

Saving a tf.keras model with data normalization

Training a DL model might take some time, and make use of some special hardware.

#Datascience#Python#Machine Learning

Geospatial & Graphsen20 janv. 2021

Benford's law and the population of french cities

In this Python notebook, we are going to look at Benford's law, which predicts the leading digit distribution, when dealing with some real-world…

#Datascience#Python#Spatial

Performanceen30 déc. 2020

Merge Sort with Cython and Numba

In this post, we present an implementation of the classic merge sort algorithm in Python on NumPy arrays, and make it run reasonably "fast" using Cython…

#Performance#Python#Sorting

Pythonen21 sept. 2020

Logistic regression with JAX

JAX is a Python package for automatic differentiation from Google Research. It is a really powerful and efficient library.

#Datascience#Python

Machine Learning & AIen28 août 2020

Minimizing continuous non-convex functions with Optuna

In this post, we are going to deal with single-objective continuous optimization problems, using the open-source Optuna Python package.

#Python#Optuna#Optimization

Pythonen27 août 2020

Lunch break, fetching AROME temperature forecast in Lyon

Since a "small" heat wave is coming, I would like to get some temperature forecast for the next hours in my neighborhood, from my JupyterLab notebook.

#Datascience#Python

Data Visualizationen27 août 2020

Lunch break, ridge plots with Bokeh

Bokeh is a great visualization Python library. In this short post, we are going to use it to create a ridge plot.

#Datascience#Python#Dataviz

Pythonen3 juil. 2020

Outlier and Change Point Detection in Data Integration Performance Metrics

Data integration involves combining data residing in different sources, and providing users with a unified view of them.

#Datascience#Python#Time Series

Machine Learning & AIen23 juin 2020

Some cool open-source Python packages for Machine Learning Ep 5

There is a very rich ecosystem of Python libraries related to ML.

#Datascience#Machine Learning#MLOps

Data Visualizationen1 juin 2020

Lunch break, plotting excess death in french department zones with Python

Daily deaths data are provided by INSEE - the national institute of statistics and economic studies.

#Datascience#Dataviz#Python

Pythonen22 mai 2020

A Quick study of air quality in Lyon with Python

The aim of this post is to use Python to fetch air quality data from a web service and to create a few plots.

#Datascience#Python

Pythonen12 avr. 2020

Fitting a logistic curve to time series in Python

In this notebook we are going to fit a logistic curve to time series stored in Pandas, using a simple linear regression from scikit-learn to find the…

#Datascience#Python#Time Series

Performanceen4 avr. 2020

Cython and Numba applied to a simple algorithm: Insertion sort

The aim of this notebook is to show a basic example of Cython and Numba, applied to a simple algorithm: Insertion sort.

#Performance#Python#Sorting

Data Visualizationen13 févr. 2020

Lunch break: plotting traffic injuries with datashader

Well I love the datashader Python package and I am always happy to use it on some interesting datasets.

#Dataviz#Python#Datashader

Data Visualizationen12 févr. 2020

Gartner MQBI (Magic Quadrant BI) History

We had compiled Gartner BI Magic Quadrant data since 2007 in several dashboards that show the evolution of the Gartner MQBI history with BI vendors…

#Business Intelligence#Dataviz

Machine Learning & AIen20 janv. 2020

Fetching AROME weather forecasts and plotting temperatures

Accurate weather forecasts might be very usefull for various types of models.

#Datascience#Python#Machine Learning

Machine Learning & AIen8 janv. 2020

Some cool open-source Python packages for Machine Learning Ep 4

There is a very rich ecosystem of Python libraries related to ML.

#Datascience#Machine Learning#Python

Machine Learning & AIen23 oct. 2019

Some cool open-source Python packages for Machine Learning Ep 3

There is a very rich ecosystem of Python libraries related to ML.

#Machine Learning#Datascience#Python

Machine Learning & AIen9 août 2019

First try of auto-sklearn

Since we are big users of scikit-learn and XGBoost, we wanted to try a package that would automate the process of building a machine learning model with…

#Datascience#Machine Learning#Python

Machine Learning & AIen8 août 2019

Some cool open-source Python packages for Machine Learning Ep 2

There is a very rich ecosystem of Python libraries related to ML.

#Machine Learning#Datascience#Python

Performanceen30 juil. 2019

Loading data into a Pandas DataFrame - a performance study

Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python…

#Datascience#Performance#Python

Machine Learning & AIen11 juil. 2019

Some cool open-source Python packages for Machine Learning Ep 1

There is a very rich ecosystem of Python libraries related to ML.

#Machine Learning#Datascience#Python

Performanceen12 juin 2019

GPU Analytics Ep 3, Apply a function to the rows of a dataframe

The goal of this post is to compare the execution time between Pandas (CPU) and RAPIDS (GPU) dataframes, when applying a simple mathematical function to…

#Performance#GPU#Python

Databaseen6 mai 2019

GPU Analytics Ep 2, Load some data from OmniSci into a GPU dataframe

Although the post title is about loading some data from a GPU database into a GPU dataframe, most of it is about running JupyterLab on a GPU AWS instance…

#SQL#Performance#GPU

Databaseen24 avr. 2019

GPU Analytics Ep 1, GPU installation of OmniSci on AWS

In this post, we are going to install the OmniSci 4.6 GPU database on an Ubuntu 18.04 AWS instance.

#GPU#SQL#Performance

Data Visualizationen19 sept. 2018

Nighttime Lights with Rasterio and Datashader

In this post, we are going to plot some satellite GeoTIFF data in Python.

#Dataviz#Python#Datashader

Data Visualizationen8 sept. 2018

Symmetric Chaos with Datashader and Numba

Map equation and coefficient values are taken from here. Some mathematical explainations can be found here, by Mike Field and Martin Golubitsky.

#Dataviz#Python#Datashader

Data Visualizationen29 août 2018

Plotting Hopalong attractor with Datashader and Numba

What is an attractor? Definition from wikipedia: Most of the following code comes from James Bednar's notebook about 2D strange attractor plotting with…

#Dataviz#Python#Datashader

Performanceen3 juil. 2018

Looping over Pandas data

I recently stumbled on this interesting post on RealPython (excellent website by the way!): Fast, Flexible, Easy and Intuitive: How to Speed Up Your…

#Performance#Python#Pandas

Pythonen24 mai 2018

Pandas Time Series example with some historical land temperatures

The aim of this notebook is just to play with time series along with a couple of statistical and plotting libraries.

#Datascience#Python#Pandas

📄

Technologyen18 mai 2018

Lyon DataVis and AI mini-conference

Yesterday I went to this mini-conference at ENS Lyon and enjoyed it very much.

#Datascience

Blog etActualités

Tous les articles

Fast Geographic Data Extraction from SQL Server to Parquet with FastBCP

Cycle diurne de la température d'été à Lyon-Bron, par décennie

German-style strings in Apache Arrow

PerformanceStudio: Better Query Store Analysis for SQL Server

Database Engines Trends Dashboard

Running a private Qwen3.6 Coding Agent on Scaleway

Running a Local LLM on a Consumer GPU [8 GB VRAM]

Running a Local LLM Coding Agent on AWS

Summation of Floating-Point Numbers in Cython

My First Lean 4 Proof: the Irrationality of √2

Improve SQL Server Performance with sp_UpdateStats2 Statistics Maintenance

SQL Server Cumulative Updates Dashboard

An example ETL Pipeline with dlt + SQLMesh + DuckDB

Get your embeddings on SQL Server 2025 with AI_GENERATE_EMBEDDINGS and EXTERNAL MODEL using OLLAMA local and your GPU

Evolution des prix de l'immobilier en France - DVF à l'iris

sp_CompareTables : compare tables at column level in SQL Server

Performance Analysis of Parallel Data Replication Between Two PostgreSQL 18 Instances on OVH

PG_FASTBCP Postgres Extension

Data Loading from S3 Parquet Files to PostgreSQL on OVH Cloud

FastTransfer Performance with Citus Columnar Storage in PostgreSQL

High-Speed PostgreSQL Replication on OVH with FastTransfer

Le Tour de France History Dashboard

Parallel Data Movement Made Simple: FastTransfer & FastBCP Explained

A Git commit temporal analysis

SQL Server Features Comparison and Evolution

Oracle to PostgreSQL Data Transfer : pg_fasttransfer vs oracle_fdw

PostgreSQL Data Transfer: pg_fasttransfer vs postgres_fdw

SQL Server Schema Restore : structures and data

Computing and Visualizing Billions of Bohemian Eigenvalues with Python

Zipf's Law on "La Comédie humaine"

FastTransfer VS BCP

Powershell IOMeter Parser

Licence to sp_kill on SQL Server

Arnold tongues with Numba, Numba CUDA and Datashader

Spatial queries in DuckDB with R-tree and H3 indexing

In-memory cosine distance calculation, NumPy vs DuckDB

Querying Private s3 Buckets with DuckDB

Pizzerias Around the World

Visualizing the Ulam Spiral with Python

Analyzing tramontane wind frequency

Beyond LAST_VALUE IGNORE NULLS: Efficient Alternatives for Handling Missing Data in SQL Server

Using DuckDB with Tableau Desktop on Windows

LAST_VALUE performance issue on SQL Server

Finding the most concentrated areas of bakeries in Lyon

A Hybrid information retriever with DuckDB

Databases Dashboard V2

La protection des données : partie 1 - les données

Enrich a digital terrain model

Calculating walking isochrones with Python

Fully Amortized Loan simulation with Numba and IPyWidgets

Create a routable pedestrian network with elevation

Streaming data from PostgreSQL to a CSV file as fast as possible

Lyon's Digital Terrain Model with IGN Data

SQL Server AlwaysOn Feature Tutorial : Part 3 - install SQL Server on a domain VM

SQL Server AlwaysOn Feature Tutorial : Part 4 - Install and configure a Windows Failover Cluster

SQL Server AlwaysOn Feature Tutorial : Part 5 - Activate AlwaysOn on SQL Server

SQL Server AlwaysOn Feature Tutorial: Part 1 - Create a Windows Server 2022 Virtual Machine

A Transit graph for static macro assignment

Methane Gaz Emissions

Parquet file sorting test

Installing your Python package on a Windows machine that does not have internet access

How minimal is SQL Server Minimal Logging

The Database of Databases Dashboard

ROLLBACK DELETE vs TRUNCATE vs DROP in several RDBMS

Oracle Database Features Evolution

Calculating daily mean temperatures with scikit-learn

Oracle Database Editions Comparator

Vector similarity search with pgvector

Dashboard sur les entreprises de Genève en Suisse

Using a local sentence embedding model for similarity calculation

Python plot - Antarctic sea ice extent

Semantic Vector Search with SQL Server and the help of KMeans indexing

SQL Server vector search

Python plot - North Atlantic daily water surface temperature

PostgreSQL Features Matrix GPT-Optimized

Hyperpath routing in the context of transit assignment

TPC-H benchmark of DuckDB and Hyper on native files

TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS