Benchmark
TPC-H benchmark of DuckDB and Hyper on native files
TPC-H benchmark of DuckDB and Hyper on native files In this blog post, we examine the performance of two popular SQL engines for querying large files: Tableau Hyper / Proprietary License DuckDB / MIT License These engines have gained popularity due…
TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS
TPC-H benchmark of Hyper and DuckDB on Windows and Linux OS Update Apr 12, 2023 - It seems that Windows 11's poor performance may be due to conflicting BIOS/OS settings when dual-booting. We are investigating... Additionally, I have corrected the…
TPCH SF100 with Snowflake
Snowflake is a fantastic Datawarehouse and Datalake SaaS Solution ! Very easy to use, scale, develop with and continuously integrate data inside of it central storage is quite fun. But what about performance ? Snowflake have a really good reputation…
TPC-H benchmark of Hyper, DuckDB and Datafusion on Parquet files
TPC-H benchmark of Hyper, DuckDB and DataFusion on Parquet files Update Apr 14, 2023 - An issue has been opened on the DataFusion GitHub repository regarding its poor reported performance compared to DuckDB and Hyper in this specific case: #5942.…
TPCH SF10 : Tableau Hyper Engine vs DuckDB vs Snowflake vs BigQuery vs Databricks vs SingleStore
After a first try with TPCH SF10 using DuckDB on 2 differents laptops and compare parquet storage vs native storage (see TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks) I would like to try th Hyper Engine used by Tableau…
TPCH SF10 using DuckDB vs SnowFlake, Bigquery, SingleStore and Databricks
I was very interested in 2 articles of Mimoune Djouallah (aka mim) that compare Snowflake, BigQuery, Databricks, SingleStore, PowerBI Datamart and DuckDB using tpch benckmark with different scale factor (TPCH-SF10 then SF100) TPCH benchmark is a…
Database Engines Trends Dashboard
Database Engines Trends DashboardDB-Engines.com is a website that reference and rank hundreds of database engines, threw the most famous ones likes Oracle, MySQL, SQL Server, PostGreSQL, MongoDB, Redis to omes more confidential like Hazelcast…
More Heapsort in Cython
More Heapsort in Cython This post/notebook is the follow-up to a recent one : Heapsort with Numba and Cython, where we implemented heapsort in Python/Numba and Cython and compared the execution time with NumPy heapsort. However, heapsort in NumPy…
Loading data from PostgreSQL to Pandas with ConnectorX
Loading data from PostgreSQL to Pandas with ConnectorX ConnectorX is a library, written in Rust, that enables fast and memory-efficient data loading from various databases to different dataframes. We refer to this interesting paper, in which the…
Export data as fast as possible : from HANA to CSV
What is the fastest method to export HANA data (table or query result) to CSV ?I use a HANA 2.0 database. I want to export a table or a sql query from the database to an external client as fast as possible and using a command line (i’m on…
Gartner Magic Quadrant BI 15 years History
BenchmarkBusiness Intelligence (BI) tools are competing for years. Gartner Magic Quadrant is one of the famous benchmark that classify mainstream BI tools. We have retrieved old MQBI (Magic Quandrant Business Intelligence) benchmarks and compiled…
Applying a row-wise function to a Pandas dataframe
More than 3 years ago, we posted a comparative study about Looping over Pandas data using a CPU. Because a lot of things evolved since 2018, this post is kind of an update. For example Pandas tag version was 0.23.3 at that time, it is now 1.4.0.…
Stack Overflow Trends
How i built a Stack overflow trends dashboard Starting using Brent Ozar Stack Overflow database extract i tried to build a dashboard that show evolution of Tags Trends over time and if possible compare trends of several tags between them. I first…
Gartner analytics and Business Intelligence tools comparator
As promised here is an article on the Gartner® analytics and Business Intelligence tools comparator based on the capabilities and use cases of the tools. There are interesting differences on the final scores compared to the Magic Quadrant which…
Loading data into a Pandas DataFrame - a performance study
Because doing machine learning implies trying many options and algorithms with different parameters, from data cleaning to model validation, the Python programmers will often load a full dataset into a Pandas dataframe, without actually…
Looping over Pandas data
I recently stumbled on this interesting post on RealPython (excellent website by the way!): Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects This post has different subjects related to Pandas: creating a datetime column…