Updated April 20, 2022 following usefull comments by @scoder. Thank you very much for your pull requests!

Heapsort is a classical sorting algorithm. We are going into a little bit of theory about the algorithm, but refer to Corman et al. [1] for more details, or the heapsort wikipedia page.

In this post, we are going to implement the classical heapsort in Python, Python/Numba and Cython. The regular implementation is array-based and performed in-place. We use 0-based indices. Note that this is not a stable sorting method [keeping items with the same key in the original order].

Imports

from itertools import cycle

from binarytree import build
import cython
from numba import njit
import numpy as np
import perfplot
%load_ext cython

SD = 124  # random seed
rng = np.random.default_rng(SD)  # random number generator

Language/package versions:

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.2.0
binarytree           : 6.5.1
matplotlib           : 3.5.1
perfplot             : 0.10.2
numpy                : 1.21.5
cython               : 0.29.28
numba                : 0.55.1

Float array creation

We assume that we want to sort an NumPy array of 64-bit floating-point numbers :

n = 10
A = rng.random(n, dtype=np.float64)
A
array([0.78525311, 0.78585936, 0.96913602, 0.74805977, 0.65555081,
       0.93888454, 0.17861445, 0.58864721, 0.44279917, 0.34884712])

Binary trees

Heapsort is based on binary heap data structure, which relies on a nearly complete binary tree:

a [nearly] complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.

In a binary heap, the elements in an array are mapped to the tree nodes of a virtual binary tree. Assuming that we know the maximum number of elements in the heap, the array representation of the tree is more convenient than a linked structure. Note that it is also possible to use and array that can grow or shrink dynamically when threshold sizes are reached. Here is a figure showing the array elements and the corresponding tree nodes:

binary tree

So we can see the array as a representation of the tree (implicit data structure): the node at the top [A[0]] of the tree is called the root, and the ones at the bottom, without children, are called the leaves [A[5], A[6], A[7], A[8], A[9]]. An advantage of binary trees is that it is easy to navigate among the nodes. Tree node indexing goes from the root to the leaves and from left to right. Level $k\geq 0$ goes from node $2^k -1$ to node $2 (2^k -1)$. Within a level, you can go left by decrementing, and right by incrementing the node index by one. A node may have two children : the left and right ones. Given a node index i, the parent node index can be easily found as (i - 1) // 2:

def parent(i):
    assert i > 0
    return (i - 1) // 2
parent(1)
0
parent(8)
3

The left child has index 2 * i + 1 and the right child 2 * (i + 1):

def left_child(i):
    return 2 * i + 1


def right_child(i):
    return 2 * (i + 1)
left_child(0)
1
right_child(0)
2

The depth of a node is the number of edges from the root node to the target node. The height $h$ of a binary tree is the number of edges from the root to the most distant leaf node [in the above example, the height is equal to 3]. We can bound the number of elements $n$ of the nearly complete tree using its height:

$$2^h = \sum_{k=0}^{h-1} 2^k +1 \leq n \leq \sum_{k=0}^h 2^k = 2^{h+1}-1$$

So that we have $h = \lfloor \log_2 n \rfloor $

Binary heap

Here is the definition of a heap [2]:

A heap is a set of nodes with keys arranged in a complete heap-ordered binary tree, represented as an array.

A binary heap is binary tree data structure that satisfies a heap property. There are two kinds of heaps : min- and max-heaps satisfying respectively a min-heap or a max-heap property. In our case, we are going to use a max-heap to easily implement the heapsort algorithm. A min-heap would be used if sorting the array in a descending order, or would imply extra space/work to sort in ascending order.

max-heap property : A value of a given node i is not larger than the value of its parent node parent(i)

$$A[parent(i)] \geq A[i]$$

Thus value of the root node A[0] of a max-heap is greater than or equal to all the tree values. This is also true for any sub-tree of the binary heap: the root node of any sub-tree A[i] is greater than or equal to all the values in this sub-tree.

An important point is that not all the array elements might be in the heap. We differentiate the heap size from the array length $n$. We have $0 \leq size \leq n$. The element in heap are the size elements in the left part of the array: A[0:size] [with a Python slicing indexing]. All remaining elements A[size:n] are not in the heap, implicitely.

Initially, given the above array A, it mostly likely does not satisfy the max-heap property, which would imply in our case: A[0] >= A[1], A[0] >= A[2], A[1] >= A[3], A[1] >= A[4], A[3] >= A[7], A[3] >= A[8 ], A[2] >= A[5], A[2] >= A[6]. Indeed we have:

A[0] >= A[1]
False
A[0] >= A[2]
False

We need to build the max-heap from an array by moving around the elements. In order to do that, we use a function called max_heapify from the leaves to the root of the tree, in a bottom-up manner. So let's implement this function.

Max_heapify

The max_heapify(A, size, i) function presupposes that sub-trees rooted at the children nodes of i do satisfy the max-heap property. But the i-th node may violate it, i.e. A[i] < A[left_child(i)] or A[i] < A[right_child(i)], assuming that the children are in the heap. If this is the case, the i-th node is swapped with its child with largest value:

A[i], A[largest] = A[largest], A[i]

This process is then repeated from the largest children node. Eventually, the sub-tree rooted at i satisfies the max-heap property after a call to max_heapify(A, size, i).

This process is also refered to as sift down: move a value violating the max-heap property down the tree by exchanging it with the largest of its 2 children values. Now it would also be possible to use the exact opposite process: sift up, which would start from a leaf node, and move its value up the tree by exchanging it with the parent node value, if violating the max-heap property. However, building a max-heap with the sift up process has a larger complexity than with sift down. I found this explanation by @alestanis on Stack Overflow really clear (The question was Why siftDown is better than siftUp in heapify?):

Sift up

So here is a Python implementation:

def max_heapify(A, size, node_idx=0):

    largest = node_idx
    l = left_child(largest)
    r = right_child(largest)

    if (l < size) and (A[l] > A[largest]):
        largest = l

    if (r < size) and (A[r] > A[largest]):
        largest = r

    if largest != node_idx:
        A[node_idx], A[largest] = A[largest], A[node_idx]  # exchange 2 nodes
        max_heapify(A, size, largest)

Note that this function is recursive. Let's have a look at the tree values of our example:

tree_values = build(list(A.round(5)))
print(tree_values)
                            ___________________0.78525___________
                           /                                     \
             __________0.78586___________                   ___0.96914___
            /                            \                 /             \
     ___0.74806__                   ___0.65555         0.93888         0.17861
    /            \                 /
0.58865         0.4428         0.34885

This array is random, but we can observe that the max-heap property is satisfied everywhere except at the root! The sub-trees rooted at nodes 1 and 2 do happen to satisfy the max-heap property already. So let's call the max_heapify function on the root node.

max_heapify(A, len(A), node_idx=0)
tree_values = build(list(A.round(5)))
print(tree_values)
                            ___________________0.96914___________
                           /                                     \
             __________0.78586___________                   ___0.93888___
            /                            \                 /             \
     ___0.74806__                   ___0.65555         0.78525         0.17861
    /            \                 /
0.58865         0.4428         0.34885

The root node A[0] = 0.78525 moved from node index 0 to 2, and then from 2 to 5. Nodes 2 and 5 moved one step up in the process.

Steps:

  • node 0 is compared with nodes 1 and 2. Node 0 is smaller than one of its children [0.78525 < 0.96914]. Node 2 is the child with largest value : exchange node 0 and 2.
  • node 2 is compared with nodes 5 and 6. Node 0 is smaller than one of its children [0.78525 < 0.93888]. Node 5 is the child with largest value : exchange node 2 and 5.
  • Node 5 is a leaf, so stop.

Let's generate a new random array to see another example:

A = rng.random(n, dtype=np.float64)
A
array([0.3309295 , 0.15936868, 0.98946349, 0.25711078, 0.71576487,
       0.50588512, 0.66411132, 0.70234247, 0.05208023, 0.06009649])
tree_values = build(list(A.round(5)))
print(tree_values)
                             __________________0.33093___________
                            /                                    \
             ___________0.15937__________                   ___0.98946___
            /                            \                 /             \
     ___0.25711___                  ___0.71576         0.50589         0.66411
    /             \                /
0.70234         0.05208         0.0601

If we start from the leaves toward the root, we can notice a max-heap violation at node 3. So we call max_heapify on that node:

max_heapify(A, len(A), node_idx=3)
tree_values = build(list(A.round(5)))
print(tree_values)
                             __________________0.33093___________
                            /                                    \
             ___________0.15937__________                   ___0.98946___
            /                            \                 /             \
     ___0.70234___                  ___0.71576         0.50589         0.66411
    /             \                /
0.25711         0.05208         0.0601

Now let's take care of node 1:

max_heapify(A, len(A), node_idx=1)
tree_values = build(list(A.round(5)))
print(tree_values)
                             __________________0.33093___________
                            /                                    \
             ___________0.71576__________                   ___0.98946___
            /                            \                 /             \
     ___0.70234___                  ___0.15937         0.50589         0.66411
    /             \                /
0.25711         0.05208         0.0601

And finally we need to fix the root value:

max_heapify(A, len(A), node_idx=0)
tree_values = build(list(A.round(5)))
print(tree_values)
                             __________________0.98946___________
                            /                                    \
             ___________0.71576__________                   ___0.66411___
            /                            \                 /             \
     ___0.70234___                  ___0.15937         0.50589         0.33093
    /             \                /
0.25711         0.05208         0.0601

We actually built a max-heap manually. In the following let's see the general method to build it.

Build_max_heap

The method used to build the max-heap with a sift-down-based heapify is called Floyd's heap construction:

Floyd's algorithm starts with the leaves, observing that they are trivial but valid heaps by themselves, and then adds parents. Starting with element n/2 and working backwards, each internal node is made the root of a valid heap by sifting down. The last step is sifting down the first element, after which the entire array obeys the heap property.

This bottom-up heap construction technique runs in $O(n)$ time. Here is a Python implementation:

def build_max_heap(A):
    size = len(A)
    node_idx = size // 2 - 1  # last non-leaf node index
    for i in range(node_idx, -1, -1):
        max_heapify(A, size, node_idx=i)
    return size
A = rng.random(n, dtype=np.float64)
A
array([0.94535273, 0.25035043, 0.40579395, 0.27596342, 0.30065296,
       0.36667218, 0.14878984, 0.34834079, 0.59713033, 0.99416357])
tree_values = build(list(A.round(5)))
print(tree_values)
                             ___________________0.94535___________
                            /                                     \
             ___________0.25035___________                   ___0.40579___
            /                             \                 /             \
     ___0.27596___                   ___0.30065         0.36667         0.14879
    /             \                 /
0.34834         0.59713         0.99416
_ = build_max_heap(A)
tree_values = build(list(A.round(5)))
print(tree_values)
                             ___________________0.99416___________
                            /                                     \
             ___________0.94535___________                   ___0.40579___
            /                             \                 /             \
     ___0.59713___                   ___0.30065         0.36667         0.14879
    /             \                 /
0.34834         0.27596         0.25035

Heapsort

The classical heapsort has two steps:

1 - build the heap
2 - destroy the heap by removing the root from the heap and moving it to the end of the heap $n$ times.

Step 2 corresponds to the following iterations:

  • swap the root [largest element] with the last leaf
  • remove the last leaf from the heap.
  • heapify the root on the reduced heap.

The average and worst-case performance are $O(n\log n)$. Here is a Python implementation:

def heapsort(A_in):

    A = np.copy(A_in)

    # build a max heap
    size = build_max_heap(A)

    for i in range(size - 1, 0, -1):

        # swap the root (largest element) with the last leaf
        A[i], A[0] = A[0], A[i]

        # removing largest element from the heap
        size -= 1

        # call _max_heapify from the root on the heap with remaining elements
        max_heapify(A, size, node_idx=0)

    return A
n = 10
A = rng.random(n, dtype=np.float64)
A
array([0.18525118, 0.99313433, 0.78561885, 0.44814329, 0.85044505,
       0.86088208, 0.96716993, 0.17096352, 0.69956773, 0.8288503 ])
A_sorted_python = heapsort(A)
A_sorted_python
array([0.17096352, 0.18525118, 0.44814329, 0.69956773, 0.78561885,
       0.8288503 , 0.85044505, 0.86088208, 0.96716993, 0.99313433])
A_ref = np.sort(A)
np.testing.assert_array_equal(A_sorted_python, A_ref)

Let's use Numba to make this Python code running fast.

Numba

@njit
def max_heapify_numba(A, size, node_idx=0):

    largest = node_idx
    left_child = 2 * node_idx + 1
    right_child = 2 * (node_idx + 1)

    if (left_child < size) and (A[left_child] > A[largest]):
        largest = left_child

    if (right_child < size) and (A[right_child] > A[largest]):
        largest = right_child

    if largest != node_idx:
        A[node_idx], A[largest] = A[largest], A[node_idx]  # exchange 2 nodes
        max_heapify_numba(A, size, largest)

@njit
def heapsort_numba(A_in):

    A = np.copy(A_in)

    # build a max heap
    size = len(A)
    node_idx = size // 2 - 1  # last non-leaf node index
    for i in range(node_idx, -1, -1):
        max_heapify_numba(A, size, node_idx=i)

    for i in range(size - 1, 0, -1):
        # swap the root (largest element) with the last leaf
        A[i], A[0] = A[0], A[i]

        # removing largest element from the heap
        size -= 1

        # call _max_heapify from the root on the heap with remaining elements
        max_heapify_numba(A, size)
    return A
A_sorted_numba = heapsort_numba(A)
A_sorted_numba
array([0.17096352, 0.18525118, 0.44814329, 0.69956773, 0.78561885,
       0.8288503 , 0.85044505, 0.86088208, 0.96716993, 0.99313433])
np.testing.assert_array_equal(A_sorted_numba, A_ref)

Cython

%%cython --compile-args=-Ofast

# cython: boundscheck=False, initializedcheck=False, wraparound=False

import cython
import numpy as np

from cython import ssize_t, double


@cython.exceptval(check=False)
@cython.nogil
@cython.cfunc
def _max_heapify(A: double[::1], size: ssize_t, node_idx: ssize_t) -> cython.void:
    largest: ssize_t = node_idx

    left_child = 2 * node_idx + 1
    right_child = 2 * (node_idx + 1)

    if left_child < size and A[left_child] > A[largest]:
        largest = left_child

    if right_child < size and A[right_child] > A[largest]:
        largest = right_child

    if largest != node_idx:
        A[node_idx], A[largest] = A[largest], A[node_idx]
        _max_heapify(A, size, largest)


@cython.exceptval(check=False)
@cython.nogil
@cython.cfunc
def _heapsort(A: double[::1]) -> cython.void:
    i: cython.int
    size: ssize_t = len(A)
    node_idx: ssize_t = size // 2 - 1

    for i in range(node_idx, -1, -1):
        _max_heapify(A, size, i)

    for i in range(size - 1, 0, -1):
        A[i], A[0] = A[0], A[i]
        size -= 1
        _max_heapify(A, size, 0)


@cython.ccall
def heapsort_cython(A_in):
    A = np.copy(A_in)
    _heapsort(A)
    return A
A_sorted_cython = heapsort_cython(A)
A_sorted_cython
array([0.17096352, 0.18525118, 0.44814329, 0.69956773, 0.78561885,
       0.8288503 , 0.85044505, 0.86088208, 0.96716993, 0.99313433])
np.testing.assert_array_equal(A_sorted_cython, A_ref)

Performance comparison

We do not include the Python version and compare the Numba and Cython versions with the NumPy heapsort implementation. It seems that this NumPy heapsort is written in C++ and is fairly well optimized.

out = perfplot.bench(
    setup=lambda n: rng.random(n, dtype=np.float64),
    kernels=[
        lambda A: heapsort_numba(A),
        lambda A: heapsort_cython(A),
        lambda A: np.sort(A, kind="heapsort"),
    ],
    labels=["Numba", "Cython", "NumPy"],
    n_range=[10**k for k in range(1, 9)],
)
out
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ n          Numba                Cython                 NumPy                  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10        │ 1.266e-06           │ 1.819e-06             │ 1.7500000000000002e-06 │
│ 100       │ 5.587e-06           │ 5.004000000000001e-06 │ 3.0430000000000003e-06 │
│ 1000      │ 0.000116266         │ 6.1561e-05            │ 6.2518e-05             │
│ 10000     │ 0.001819482         │ 0.0008792880000000001 │ 0.000939073            │
│ 100000    │ 0.028507956         │ 0.014659915           │ 0.013501216000000002   │
│ 1000000   │ 0.42401333100000005 │ 0.26564063200000004   │ 0.224838018            │
│ 10000000  │ 5.938197069         │ 5.285706523           │ 3.4547337810000003     │
│ 100000000 │ 78.544810064        │ 84.04674386900001     │ 49.002946636000004     │
└───────────┴─────────────────────┴───────────────────────┴────────────────────────┘
figsize = (14, 14)

labels = out.labels
ms = 10
fig = plt.figure(figsize=figsize)
ax = fig.add_subplot(1, 1, 1)
plt.loglog(
    out.n_range,
    out.n_range * np.log(out.n_range) * 5.0e-8,
    "o-",
    label="$c \; n \; ln(n)$",
)
for i, label in enumerate(labels):
    plt.loglog(out.n_range, out.timings_s[i], "o-", ms=ms, label=label)
markers = cycle(("o", "v", "^", "<", ">", "s", "p", "P", "*", "h", "X", "D", "."))
for i, line in enumerate(ax.get_lines()):
    marker = next(markers)
    line.set_marker(marker)
plt.legend()
plt.grid("on")
_ = ax.set(
    title="Timing comparison between Numba, Cython and NumPy heapsort",
    xlabel="Array length (log scale)",
    ylabel="Elapsed_time [s] (log scale)",
)

Timings

Conclusion

Going from Python to Numba is seamless and is allowing us to reach a similar level of efficiency as with Cython [that turned 20 years old last week, happy birthday!!]. We can observe that the NumPy heapsort implementation is faster on larger arrays, but we do not know which optimizations did they implement.

References

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition [3rd. ed.]. The MIT Press.

[2] Robert Sedgewick. 2002. Algorithms in C [3rd. ed.]. Addison-Wesley Longman Publishing Co., Inc., USA.