Step-by-Step PDAL Python Workflow for LiDAR Data Processing

Anvita Shrivastava
11 hours ago
4 min read

Light Detection and Ranging (LiDAR) is now a dominant and critical technology in the fields of geospatial analysis, modeling, and data collection for planning and building cities, forestry, and infrastructure development. With the ever-increasing volumes and complexity of LiDAR datasets, the need for well-organized and streamlined workflows to access and process meaningful information from these types of data is becoming more important than ever.

Point Data Abstraction Library (PDAL) is an open-source library for manipulating point cloud data, and when coupled with the Python programming language, allows developers, GIS professionals, and Data Scientists to automate tasks associated with LiDAR processing; to create scalable workflows for LiDAR processing; and to build point cloud analysis into larger and more detailed geospatial applications.

In this document, we present a step-by-step workflow for LiDAR data processing using PDAL and Python. We will cover how to download and install PDAL; how to view and check the contents of a LiDAR dataset; how to filter a LiDAR dataset; how to assign classes to all of the points in your LiDAR dataset; and finally, how to export your filtered and classified LiDAR dataset into various formats.

PDAL Python Workflow for LiDAR Data Processing

What Is PDAL?

PDAL is an open-source toolset used to manipulate, translate, filter, and analyze point cloud data. It accommodates multiple data formats:

LAS
LAZ
E57
GeoTIFF
BPF
PLY

PDAL features a pipeline architecture allowing users to define their own sequence of operations when working with LiDAR data.

Benefits of Using PDAL

Active Development/Support.
Capable of handling very large LiDAR datasets.
Can be integrated via Python.
Has a comprehensive range of filters and classifiers.
Fully compatible with GIS and Remote Sensing workflows.

Prerequisites

Before starting, ensure the following are installed:

Install PDAL

Using Conda:

conda install -c conda-forge pdal python-pdal

Verify installation:

python -c "import pdal; print(pdal.__version__)"

Required Python Libraries

pip install numpy pandas matplotlib

Step 1: Load a LiDAR Dataset

The first step is reading a LAS or LAZ file using PDAL.

import pdal
import json

pipeline_json = {
    "pipeline": [
        "sample.las"
    ]
}

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()

arrays = pipeline.arrays
point_cloud = arrays[0]

print(point_cloud.dtype)
print(len(point_cloud))

What This Does

Loads the LiDAR file
Executes the pipeline
Stores point cloud data in a NumPy array.
Displays available dimensions and point count

Step 2: Inspect Point Cloud Attributes

LiDAR datasets contain multiple attributes such as:

X, Y, Z coordinates
Intensity
Return number
Classification
Scan angle

View available fields:

print(point_cloud.dtype.names)

Example output:

('X', 'Y', 'Z', 'Intensity', 'Classification')

Understanding these attributes helps determine the appropriate processing strategy.

Step 3: Create a Basic PDAL Pipeline

PDAL workflows are typically built using JSON pipelines.

Example:

pipeline_json = {
    "pipeline": [
        {
            "type": "readers.las",
            "filename": "sample.las"
        }
    ]
}

This simple pipeline reads a LAS file into memory.

Step 4: Filter LiDAR Data by Elevation

Filtering allows you to isolate specific points based on conditions.

Example: Keep points above 100 meters.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.range",
            "limits": "Z[100:]"
        }
    ]
}

Execute:

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()
filtered_points = pipeline.arrays[0]

Use Cases

Terrain analysis
Building extraction
Vegetation studies

Step 5: Remove Noise from Point Clouds

LiDAR datasets often contain outlier points caused by sensor errors or atmospheric interference.

PDAL provides statistical outlier filtering:

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.outlier",
            "method": "statistical",
            "mean_k": 8,
            "multiplier": 2.0
        }
    ]
}

Benefits include:

Improved surface models
Cleaner visualizations
More accurate classifications

Step 6: Ground Classification

Ground classification separates terrain points from vegetation and structures.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf",
            "scalar": 1.2,
            "slope": 0.2,
            "threshold": 0.45,
            "window": 16.0
        }
    ]
}

Why Ground Classification Matters

Ground points are essential for:

Digital Terrain Models (DTMs)
Flood modeling
Slope analysis
Contour generation

Step 7: Generate a Digital Terrain Model (DTM)

After identifying ground points, create a rasterized terrain model.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf"
        },
        {
            "type": "writers.gdal",
            "filename": "dtm.tif",
            "resolution": 1.0,
            "output_type": "min"
        }
    ]
}

Output:

dtm.tif

The generated GeoTIFF can be used in GIS software such as QGIS or ArcGIS.

Step 8: Extract Vegetation Points

Many forestry applications require identifying vegetation.

Example:

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.range",
            "limits": "Classification[3:5]"
        }
    ]
}

Common vegetation classes include:

Classification	Description
3	Low Vegetation
4	Medium Vegetation
5	High Vegetation

Applications:

Forest inventory
Biomass estimation
Canopy height analysis

Step 9: Visualize LiDAR Data in Python

Use Matplotlib for quick visual inspection.

import matplotlib.pyplot as plt

x = point_cloud['X']
y = point_cloud['Y']

plt.figure(figsize=(8,6))
plt.scatter(x, y, s=1)
plt.title("LiDAR Point Cloud")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

Visualization helps identify:

Data coverage
Noise
Classification errors

Step 10: Export Processed LiDAR Data

Save processed point clouds to a new LAS file.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf"
        },
        {
            "type": "writers.las",
            "filename": "processed.las"
        }
    ]
}

Execute:

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()

Output:

processed.las

Building an Automated PDAL Workflow

For production environments, organize processing into reusable functions.

Example:

def process_lidar(input_file, output_file):

    pipeline_json = {
        "pipeline": [
            input_file,
            {
                "type": "filters.outlier"
            },
            {
                "type": "filters.smrf"
            },
            {
                "type": "writers.las",
                "filename": output_file
            }
        ]
    }

    pipeline = pdal.Pipeline(json.dumps(pipeline_json))
    pipeline.execute()

Benefits include:

Automation
Scalability
Reproducibility
Reduced manual effort

Best Practices for PDAL LiDAR Processing

Optimization of Large Data Sets

If possible, use LAZ file compression,
Tile Data for Processing
Make Use of Parallel Processing

Validate Coordinate System

Make Sure to Always Validate:

EPSG Codes
Vertical Datums
Projection Consistency

Maintain Metadata

Preserve:

Classification Information
Return Number
Date of Acquisition

Test Pipeline in Stages

Validate Completed Outputs at Each Stage of Processing Before Proceeding With Large-Scale Jobs.

PDAL and Python are a powerful domain-specific toolset for processing LiDAR data! With PDAL's pipeline architecture, you can load, filter, classify, view, and export point cloud data quickly and easily, allowing you to automate repetitive jobs involving geospatial data.

PDAL helps improve accuracy, reproducibility, and efficiency when building features such as Digital Terrain Models (DTM), extracting vegetation, performing analysis on infrastructure, and preparing data for machine learning.

As LiDAR datasets become larger and more complex, becoming proficient in using PDAL and Python will be a valuable skill for GIS professionals, remote sensing experts, and geospatial software developers.

For more information or any questions regarding the LizardTech suite of products, please don't hesitate to contact us at:

Email: info@geowgs84.com

USA (HQ): (720) 702–4849

LizardTech.com

(A GeoWGS84 Corp Company)

https://www.lizardtech.com

https://www.geowgs84.com

Get LizardTech GeoExpress — Start Optimizing Your Imagery