In this post, we explore how to extract and merge data from a french high-resolution Digital Terrain Model [DTM]. This DTM is provided by the IGN [National Institute of Geographic and Forest Information]. It gives a detailed grid-based depiction of the topography of the entire French territory on a large scale. For our purposes, we will be working with the 5-meter resolution option, although a 1-meter resolution is also available. It can be found on this web page : https://geoservices.ign.fr/rgealti.

In the following we merge all the small files to generate a comprehensive raster. While this facilitates batch processing, it's important to note that the memory usage might be substantial. Here are the steps followed:

  • HTML Parsing with BeautifulSoup
  • Data Organization with Pandas
  • Bounding Box Definition
  • Intersected "Department" Zones List
  • Data Download
  • Conversion to GeoTIFF Format
  • Raster Mosaic
  • Raster Bounding Box Cropping
  • Point Queries

System and package versions

OS and package versions:

Python version       : 3.11.6  
OS                   : Linux  
Machine              : x86_64  

contextily           : 1.4.0
geopandas            : 0.14.1
matplotlib           : 3.8.2
numpy                : 1.26.2
pandas               : 2.1.3
py7zr                : 0.20.6
pyproj               : 3.6.1
rasterio             : 1.3.9
requests             : 2.31.0
bs4                  : 4.12.2
osgeo                : 3.7.3
shapely              : 2.0.2
tqdm                 : 4.66.1

Python imports

import glob
import os
import re
import contextily as cx
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import py7zr
import pyproj
import rasterio as rio
import requests
from bs4 import BeautifulSoup
from osgeo import gdal, osr
from rasterio.enums import Resampling
from rasterio.mask import mask
from rasterio.merge import merge
from rasterio.plot import show
from shapely.geometry import Point, Polygon
from shapely.ops import transform
from tqdm import tqdm
DATA_DIR_PATH = "/home/francois/Data/RGE_ALTI/5M/"
IGN_URL = "https://geoservices.ign.fr/rgealti"
DEPS_SHP_FP = "/home/francois/Data/RGE_ALTI/departements-20140306-50m-shp/departements-20140306-50m.shp"
tif_dir_path = os.path.join(DATA_DIR_PATH, "tif_files")

Parse the HTML from the IGN web page using BeautifulSoup

We specifically target and extract download links related to the 5-meter resolution, filtering out any links associated with the 1-meter resolution.

# Retrieve HTML content from IGN web page
response = requests.get(IGN_URL)
# Extract download links using BeautifulSoup
download_links = None
if response.status_code == 200:
    html_content = response.text
    soup = BeautifulSoup(html_content, "html.parser")
    pattern = re.compile(
        r"https://wxs\.ign\.fr/[^/]+/telechargement/prepackage/RGEALTI-5M_PACK_[^$]+\$[^/]+/file/[^/]+\.7z"
    )
    links = soup.find_all("a", href=pattern)
    download_links = [link["href"] for link in links]
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")
# Display the number of download links found
if download_links:
    link_count = len(download_links)
else:
    link_count = 0
print(f"Found {link_count} download link(s)")
Found 105 download link(s)

Each download link is linked to its respective French "department" code.

Organize Data with Pandas

# Create Pandas DataFrame with download links
df = pd.DataFrame(data={"download_link": download_links})
df["compressed_file_name"] = df["download_link"].map(os.path.basename)
# Extract components and assert data integrity
def extract_components(s):
    base_name = s.split(".7z")[0]
    keys = ["dataset", "version", "res", "filetype", "crs", "dep", "date"]
    d = dict(zip(keys, base_name.split("_")))
    return pd.Series(d)

df[["dataset", "version", "res", "filetype", "crs", "dep", "date"]] = df[
    "compressed_file_name"
].apply(extract_components)
assert np.array_equal(df["dataset"].unique(), np.array(["RGEALTI"]))
assert np.array_equal(df["res"].unique(), np.array(["5M"]))
assert np.array_equal(df["filetype"].unique(), np.array(["ASC"]))
# Extract department codes and refine DataFrame
def get_dep_code(s):
    dep_code = s.removeprefix("D")
    if dep_code[0] == "0":
        dep_code = dep_code.removeprefix("0")
    return dep_code

df["dep_code"] = df["dep"].map(get_dep_code)
df = df.drop(["dataset", "version", "res", "dep", "filetype", "date"], axis=1)
df.head(3)
download_linkcompressed_file_namecrsdep_code
0https://wxs.ign.fr/9u5z4x13jqu05fb3o9cux5e1/te...RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D001_2023-08-0...LAMB93-IGN6901
1https://wxs.ign.fr/9u5z4x13jqu05fb3o9cux5e1/te...RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D002_2020-09-0...LAMB93-IGN6902
2https://wxs.ign.fr/9u5z4x13jqu05fb3o9cux5e1/te...RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D003_2023-08-1...LAMB93-IGN6903

Note that the Coordinate Reference System [CRS] is always Lambert-93 or EPSG:2154 for this data when dealing with regions in European France.

Bounding box definition

The first step involves defining a bounding box that encapsulates the region of interest: Lyon, france.

# Define bounding box coordinates in EPSG:4326
bbox = (4.346466, 45.463020, 5.340729, 46.030342)
# Define points to create a polygon
point1 = (bbox[0], bbox[1])
point2 = (bbox[0], bbox[3])
point3 = (bbox[2], bbox[3])
point4 = (bbox[2], bbox[1])
# Create a polygon from the points
polygon = Polygon([point1, point2, point3, point4])
# Display the polygon
polygon

Bounding box

# display the polygon with map tiles
bbox_gdf = gpd.GeoSeries([polygon], crs="EPSG:4326")
fig, ax = plt.subplots(1, figsize=(12, 12))
ax = bbox_gdf.plot(alpha=0.3, ax=ax)
cx.add_basemap(ax, crs=bbox_gdf.crs.to_string(), source=cx.providers.CartoDB.Positron)
ax.set_axis_off()

Bounding box with map tiles

Intersected "Department" zones list

To determine which French departments intersect with our defined bounding box, we use a shapefile containing the polygons of French department zones expressed in EPSG:4326 coordinates. The shapefile was downloaded from this page, specifically the Export de mars 2014 - vérifié et simplifié à 50m

zones = gpd.read_file(DEPS_SHP_FP)[["code_insee", "geometry"]]
zones.head(2)
code_inseegeometry
001POLYGON ((5.25559 45.78459, 5.23987 45.77758, ...
102POLYGON ((3.48175 48.86640, 3.48647 48.85768, ...

We can now identify which department zones intersect with the previously defined bounding box.

intersected_zones = zones.loc[zones.intersects(polygon)]
intersected_zones
code_inseegeometry
001POLYGON ((5.25559 45.78459, 5.23987 45.77758, ...
3838POLYGON ((5.44781 45.07178, 5.44982 45.07078, ...
4242POLYGON ((4.02072 45.32827, 4.01866 45.32764, ...
6969POLYGON ((4.87189 45.52749, 4.86173 45.51689, ...

We visualize these intersected zones along with the bounding box.

ax = intersected_zones.boundary.plot(alpha=0.2, label="insee_code")
ax = bbox_gdf.plot(alpha=0.3, ax=ax)

Intersected zones

Finally, we obtain the list of French department codes that intersect with the bounding box.

zone_codes = intersected_zones.code_insee.values.tolist()
zone_codes
['01', '38', '42', '69']

IGN data download

Now that we have identified the French department codes intersecting with the bounding box, let's proceed with downloading and organizing the relevant DTM data.

# Create a list of download items for the intersected zones
download_items = []
for zone_code in zone_codes:
    download_items.append(
        df.loc[
            df.dep_code == zone_code,
            ["download_link", "compressed_file_name", "crs"],
        ].to_dict(orient="records")[0]
    )

Next, we iterate through the download items, downloading and uncompressing the relevant data.

# List to store paths of downloaded and uncompressed files
dem_dir_paths = []
# Iterate through download items
for download_item in download_items:
    crs = download_item["crs"]
    # Define file paths
    dem_file_url = download_item["download_link"]
    dem_file_name = download_item["compressed_file_name"]
    dem_file_path = os.path.join(DATA_DIR_PATH, dem_file_name)
    dem_dir_path = os.path.splitext(dem_file_path)[0]
    # Download file if not already downloaded
    is_downloaded = os.path.isfile(dem_file_path) | os.path.isdir(dem_dir_path)
    if not is_downloaded:
        print(f"downloading file {dem_file_name}")
        r = requests.get(dem_file_url, stream=True)
        with open(dem_file_path, "wb") as fd:
            for chunk in r.iter_content(chunk_size=512):
                fd.write(chunk)
    # Uncompress file if not already uncompressed
    dem_dir_paths.append(dem_dir_path)
    is_uncompressed = os.path.isdir(dem_dir_path)
    if not is_uncompressed:
        print(f"uncompressing file {dem_file_name}")
        with py7zr.SevenZipFile(dem_file_path, "r") as archive:
            archive.extractall(path=DATA_DIR_PATH)
        if os.path.exists(dem_file_path):
            os.remove(dem_file_path)

The dem_dir_paths list now contains the paths of the downloaded and uncompressed files:

dem_dir_paths
['/home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D001_2023-08-08',
 '/home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D038_2020-11-13',
 '/home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D042_2023-08-10',
 '/home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D069_2023-08-10']

These directories contain the DTM data for the specified French departments, and we can proceed with the data conversion.

Convert .asc files to .tif files

The next step is to convert the .asc files to .tif files for compatibility with various geospatial tools and libraries.

# Create a directory to store the tif files if it doesn't exist
if not os.path.exists(tif_dir_path):
    os.makedirs(tif_dir_path)
    print(f"tif files directory : {tif_dir_path}")

The following functions and class, extracted from Guillaume Attard's great blog post [1], are used to handle the conversion from .asc to .tif.

def get_header_asc(filepath):
    """Function to read the header of an asc file and return the data as a dictionary.
    """
    file = open(filepath)
    content = file.readlines()[:6]
    content = [item.split() for item in content]
    return dict(content)

class RGEitem:
    """Class to handle RGE items.
    """
    def __init__(self, filepath):
        self.filename = os.path.basename(filepath)
        self.dir = os.path.dirname(filepath)
        self.data = np.loadtxt(filepath, skiprows=6)
        self.header = get_header_asc(filepath)
        self.ncols = int(self.header["ncols"])
        self.nrows = int(self.header["nrows"])
        self.xllc = float(self.header["xllcorner"])
        self.yllc = float(self.header["yllcorner"])
        self.res = float(self.header["cellsize"])
        self.zmin = float(self.data.min())
        self.zmax = float(self.data.max())
        self.novalue = -99999.0

def asc_to_tif(file, output_rasterpath, epsg):
    """Function to transform an .asc file into a geoTIFF
    """
    xmin = file.xllc
    ymax = file.yllc + file.nrows * file.res
    geotransform = (xmin, file.res, 0, ymax, 0, -file.res)
    # Open the file
    output_raster = gdal.GetDriverByName("GTiff").Create(
        output_rasterpath, file.ncols, file.nrows, 1, gdal.GDT_Float32
    )
    # Specify the coordinates.
    output_raster.SetGeoTransform(geotransform)
    # Establish the coordinate encoding.
    srs = osr.SpatialReference()
    # Specify the projection.
    srs.ImportFromEPSG(epsg)
    # Export the coordinate system to the file.
    output_raster.SetProjection(srs.ExportToWkt())
    # Writes the array.
    output_raster.GetRasterBand(1).WriteArray(file.data)
    # Set nodata value.
    output_raster.GetRasterBand(1).SetNoDataValue(file.novalue)
    output_raster.FlushCache()

Now, we find all .asc files in the downloaded directories and convert them to .tif.

# Function to find all asc files in a directory
def find_asc_files(directory):
    directory = os.path.join(directory, "**/*.asc")
    asc_file_paths = glob.glob(directory, recursive=True)
    return asc_file_paths
# List to store paths of all converted tif files
all_tif_file_paths = []
# Iterate through the downloaded directories
for dem_dir_path in dem_dir_paths:
    print(f"top directory : {dem_dir_path}")
    asc_file_paths = find_asc_files(dem_dir_path)
    asc_file_count = len(asc_file_paths)
    print(f"{asc_file_count} asc files")
    for asc_file_path in tqdm(asc_file_paths):
        asc_file_name = os.path.basename(asc_file_path)
        asc_file_name, asc_file_extension = os.path.splitext(
            os.path.basename(asc_file_name)
        )
        tif_file_path = os.path.join(tif_dir_path, asc_file_name + ".tif")
        all_tif_file_paths.append(tif_file_path)
        if not os.path.isfile(tif_file_path):
            file = RGEitem(asc_file_path)
            asc_to_tif(file, tif_file_path, epsg=2154)
top directory : /home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D001_2023-08-08
291 asc files
top directory : /home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D038_2020-11-13
383 asc files
top directory : /home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D042_2023-08-10
248 asc files
top directory : /home/francois/Data/RGE_ALTI/5M/RGEALTI_2-0_5M_ASC_LAMB93-IGN69_D069_2023-08-10
174 asc files

Here is an example of one of these small .tif files:

fig, ax = plt.subplots(1, figsize=(6, 6))
with rio.open(tif_file_path) as img:
    show(img, cmap="terrain", ax=ax)
ax.set_axis_off()

A small tif file

The total number of converted .tif files is:

len(all_tif_file_paths)
1096

Raster Mosaic with Python

The next step is to create a raster mosaic from these individual tiles.

# Define the path for the output mosaic
output_mosaic_path = os.path.join(tif_dir_path, "lyon_mosaic.tif")
# List to store raster objects for mosaic
raster_to_mosaic = []
for p in all_tif_file_paths:
    raster = rio.open(p)
    raster_to_mosaic.append(raster)

We use the merge function from rasterio to create the mosaic:

mosaic, output = merge(raster_to_mosaic)

Update the metadata for the output mosaic:

output_meta = raster.meta.copy()
output_meta.update(
    {
        "driver": "GTiff",
        "height": mosaic.shape[1],
        "width": mosaic.shape[2],
        "transform": output,
    }
)

Write the mosaic to a new GeoTIFF file:

with rio.open(output_mosaic_path, "w", **output_meta) as m:
    m.write(mosaic)
del mosaic

Now, let's downsample this mosaic in order to save some memory space, and plot it:

upscale_factor = 0.125
with rio.open(output_mosaic_path) as dataset:
    # resample data to target shape
    data = dataset.read(
        out_shape=(
            dataset.count,
            int(dataset.height * upscale_factor),
            int(dataset.width * upscale_factor),
        ),
        resampling=Resampling.bilinear,
    )
    data = np.where(data == -99999.0, np.nan, data)
fig, ax = plt.subplots(1, figsize=(16, 16))
show(data, cmap="terrain", ax=ax)
ax.set_axis_off()

Mosaic tif file

Raster Bounding Box Cropping with Rasterio

Now that we have our mosaic raster and a defined bounding box in the Lambert 93 [LAM93] projected CRS, let's proceed with cropping the raster to the extent of our bounding box. Remember that the bounding box is expressed in EPSG:4326 while the raster is in EPSG:2154.

# Transform the bounding box polygon to the mosaic's CRS (Lambert 93)
wgs84 = pyproj.CRS("EPSG:4326")
lam93 = pyproj.CRS("EPSG:2154")
project = pyproj.Transformer.from_crs(wgs84, lam93, always_xy=True).transform
polygon_lam93 = transform(project, polygon)
polygon_lam93

Bounding box on projected CRS

# Open the mosaic raster
mosaic_raster = rio.open(output_mosaic_path)
# Check the CRS of the mosaic raster
mosaic_raster.crs
CRS.from_epsg(2154)

The CRS of the mosaic raster is confirmed to be EPSG:2154, which corresponds to Lambert 93. Now, let's use Rasterio to crop the mosaic raster to the extent of the transformed bounding box:

# Crop the mosaic raster to the bounding box
out_image, out_transform = mask(mosaic_raster, [polygon_lam93], crop=True, filled=False)
out_meta = mosaic_raster.meta
out_meta.update(
    {
        "driver": "GTiff",
        "height": out_image.shape[1],
        "width": out_image.shape[2],
        "transform": out_transform,
    }
)

We've successfully cropped the mosaic raster to the extent of the defined bounding box. The resulting cropped raster is stored in out_image, and the associated metadata is stored in out_meta.

Save and plot

Now that we have successfully cropped the mosaic raster to the extent of our bounding box, let's save the resulting raster and create a visual representation.

# Save the cropped raster to a new GeoTIFF file
with rio.open("lyon_dem.tif", "w", **out_meta) as dest:
    dest.write(out_image)

Now, let's open the saved raster for visualization and analysis:

dem = rio.open("lyon_dem.tif")
# Plot the cropped raster
fig, ax = plt.subplots(1, figsize=(16, 16))
_ = show(dem, cmap="terrain", ax=ax)
ax.set_axis_off()

Cropped mosaic

The resulting plot showcases the elevation data within the cropped region.

Point queries

In this section, we perform point queries on the cropped elevation data to extract elevation values at randomly generated coordinates within the region of interest.

rng = np.random.default_rng(421)
# Set the number of random coordinates you want
num_points = 1000
# Generate random x, y coordinates within the bounding box
random_x = rng.uniform(bbox[0], bbox[2], num_points)
random_y = rng.uniform(bbox[1], bbox[3], num_points)
point_coords = [transform(project, Point(x, y)) for x, y in zip(random_x, random_y)]
point_coords = [(p.x, p.y) for p in point_coords]
# Create a GeoDataFrame to store the random points
points = pd.DataFrame(point_coords, columns=["x_lam93", "y_lamb93"])
points["x_wgs84"] = random_x
points["y_wgs84"] = random_y
points_gdf = gpd.GeoDataFrame(
    points,
    geometry=gpd.points_from_xy(points.x_lam93, points.y_lamb93),
    crs="EPSG:2154",
)
# Plot the elevation map with sample points
fig, ax = plt.subplots(1, figsize=(10, 8))
figure = show(dem, cmap="terrain", ax=ax)
im = figure.get_images()[0]
clb = fig.colorbar(im, ax=ax)
clb.ax.set_title("Elevation (m)", fontsize=8)
ax = points_gdf.plot(ax=ax, c="k", markersize=3, alpha=0.4)
_ = plt.axis("off")
_ = ax.set(title="Elevation map of the region of Lyon, FR")

Sample points queries mosaic

Now, let's perform point queries to extract elevation values at these random points:

%%time
points["elevation"] = [x[0] for x in dem.sample(point_coords)]
points.loc[points["elevation"] < 0.0, "elevation"] = np.nan
points.head(3)
CPU times: user 15.8 ms, sys: 2.31 ms, total: 18.1 ms
Wall time: 17.2 ms
x_lam93y_lamb93x_wgs84y_wgs84elevation
0818322.3022106.540550e+064.52768045.954361297.980011
1820552.2954446.510972e+064.54897445.687670709.400024
2826049.4495856.494117e+064.61515845.534934353.480011

The resulting GeoDataFrame points now includes the random points' coordinates in both Lambert 93 [x_lam93, y_lamb93] and WGS84 [x_wgs84, y_wgs84] CRS, along with the corresponding elevation values.

print(points["elevation"].max(), points["elevation"].min())
839.79 150.5

References

[1] Guillaume Attard (2022) https://medium.com/@gui.attard/pre-processing-the-dem-of-france-rge-alti-5m-for-implementation-into-earth-engine-de9a0778e0d9