top of page

Step-by-Step PDAL Python Workflow for LiDAR Data Processing

  • Writer: Anvita Shrivastava
    Anvita Shrivastava
  • 11 hours ago
  • 4 min read

Light Detection and Ranging (LiDAR) is now a dominant and critical technology in the fields of geospatial analysis, modeling, and data collection for planning and building cities, forestry, and infrastructure development. With the ever-increasing volumes and complexity of LiDAR datasets, the need for well-organized and streamlined workflows to access and process meaningful information from these types of data is becoming more important than ever.


Point Data Abstraction Library (PDAL) is an open-source library for manipulating point cloud data, and when coupled with the Python programming language, allows developers, GIS professionals, and Data Scientists to automate tasks associated with LiDAR processing; to create scalable workflows for LiDAR processing; and to build point cloud analysis into larger and more detailed geospatial applications.


In this document, we present a step-by-step workflow for LiDAR data processing using PDAL and Python. We will cover how to download and install PDAL; how to view and check the contents of a LiDAR dataset; how to filter a LiDAR dataset; how to assign classes to all of the points in your LiDAR dataset; and finally, how to export your filtered and classified LiDAR dataset into various formats.


PDAL Python Workflow for LiDAR Data Processing
PDAL Python Workflow for LiDAR Data Processing

What Is PDAL?


PDAL is an open-source toolset used to manipulate, translate, filter, and analyze point cloud data. It accommodates multiple data formats:


  • LAS

  • LAZ

  • E57

  • GeoTIFF

  • BPF

  • PLY


PDAL features a pipeline architecture allowing users to define their own sequence of operations when working with LiDAR data.


Benefits of Using PDAL


  • Active Development/Support.

  • Capable of handling very large LiDAR datasets.

  • Can be integrated via Python.

  • Has a comprehensive range of filters and classifiers.

  • Fully compatible with GIS and Remote Sensing workflows.


Prerequisites


Before starting, ensure the following are installed:


Install PDAL

Using Conda:

conda install -c conda-forge pdal python-pdal

Verify installation:

python -c "import pdal; print(pdal.__version__)"

Required Python Libraries

pip install numpy pandas matplotlib

Step 1: Load a LiDAR Dataset


The first step is reading a LAS or LAZ file using PDAL.

import pdal
import json

pipeline_json = {
    "pipeline": [
        "sample.las"
    ]
}

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()

arrays = pipeline.arrays
point_cloud = arrays[0]

print(point_cloud.dtype)
print(len(point_cloud))

What This Does

  • Loads the LiDAR file

  • Executes the pipeline

  • Stores point cloud data in a NumPy array.

  • Displays available dimensions and point count


Step 2: Inspect Point Cloud Attributes


LiDAR datasets contain multiple attributes such as:

  • X, Y, Z coordinates

  • Intensity

  • Return number

  • Classification

  • Scan angle

View available fields:

print(point_cloud.dtype.names)

Example output:

('X', 'Y', 'Z', 'Intensity', 'Classification')

Understanding these attributes helps determine the appropriate processing strategy.


Step 3: Create a Basic PDAL Pipeline


PDAL workflows are typically built using JSON pipelines.

Example:

pipeline_json = {
    "pipeline": [
        {
            "type": "readers.las",
            "filename": "sample.las"
        }
    ]
}

This simple pipeline reads a LAS file into memory.


Step 4: Filter LiDAR Data by Elevation


Filtering allows you to isolate specific points based on conditions.

Example: Keep points above 100 meters.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.range",
            "limits": "Z[100:]"
        }
    ]
}

Execute:

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()
filtered_points = pipeline.arrays[0]

Use Cases

  • Terrain analysis

  • Building extraction

  • Vegetation studies


Step 5: Remove Noise from Point Clouds


LiDAR datasets often contain outlier points caused by sensor errors or atmospheric interference.

PDAL provides statistical outlier filtering:

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.outlier",
            "method": "statistical",
            "mean_k": 8,
            "multiplier": 2.0
        }
    ]
}

Benefits include:

  • Improved surface models

  • Cleaner visualizations

  • More accurate classifications


Step 6: Ground Classification


Ground classification separates terrain points from vegetation and structures.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf",
            "scalar": 1.2,
            "slope": 0.2,
            "threshold": 0.45,
            "window": 16.0
        }
    ]
}

Why Ground Classification Matters

Ground points are essential for:


Step 7: Generate a Digital Terrain Model (DTM)


After identifying ground points, create a rasterized terrain model.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf"
        },
        {
            "type": "writers.gdal",
            "filename": "dtm.tif",
            "resolution": 1.0,
            "output_type": "min"
        }
    ]
}

Output:

dtm.tif

The generated GeoTIFF can be used in GIS software such as QGIS or ArcGIS.


Step 8: Extract Vegetation Points


Many forestry applications require identifying vegetation.

Example:

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.range",
            "limits": "Classification[3:5]"
        }
    ]
}

Common vegetation classes include:

Classification

Description

3

Low Vegetation

4

Medium Vegetation

5

High Vegetation

Applications:

  • Forest inventory

  • Biomass estimation

  • Canopy height analysis


Step 9: Visualize LiDAR Data in Python


Use Matplotlib for quick visual inspection.

import matplotlib.pyplot as plt

x = point_cloud['X']
y = point_cloud['Y']

plt.figure(figsize=(8,6))
plt.scatter(x, y, s=1)
plt.title("LiDAR Point Cloud")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

Visualization helps identify:

  • Data coverage

  • Noise

  • Classification errors


Step 10: Export Processed LiDAR Data


Save processed point clouds to a new LAS file.

pipeline_json = {
    "pipeline": [
        "sample.las",
        {
            "type": "filters.smrf"
        },
        {
            "type": "writers.las",
            "filename": "processed.las"
        }
    ]
}

Execute:

pipeline = pdal.Pipeline(json.dumps(pipeline_json))
pipeline.execute()

Output:

processed.las

Building an Automated PDAL Workflow


For production environments, organize processing into reusable functions.

Example:

def process_lidar(input_file, output_file):

    pipeline_json = {
        "pipeline": [
            input_file,
            {
                "type": "filters.outlier"
            },
            {
                "type": "filters.smrf"
            },
            {
                "type": "writers.las",
                "filename": output_file
            }
        ]
    }

    pipeline = pdal.Pipeline(json.dumps(pipeline_json))
    pipeline.execute()

Benefits include:

  • Automation

  • Scalability

  • Reproducibility

  • Reduced manual effort


Best Practices for PDAL LiDAR Processing


Optimization of Large Data Sets


  • If possible, use LAZ file compression,

  • Tile Data for Processing

  • Make Use of Parallel Processing


Validate Coordinate System


Make Sure to Always Validate:


  • EPSG Codes

  • Vertical Datums

  • Projection Consistency


Maintain Metadata


Preserve:


  • Classification Information

  • Return Number

  • Date of Acquisition


Test Pipeline in Stages


Validate Completed Outputs at Each Stage of Processing Before Proceeding With Large-Scale Jobs.


PDAL and Python are a powerful domain-specific toolset for processing LiDAR data! With PDAL's pipeline architecture, you can load, filter, classify, view, and export point cloud data quickly and easily, allowing you to automate repetitive jobs involving geospatial data.


PDAL helps improve accuracy, reproducibility, and efficiency when building features such as Digital Terrain Models (DTM), extracting vegetation, performing analysis on infrastructure, and preparing data for machine learning.


As LiDAR datasets become larger and more complex, becoming proficient in using PDAL and Python will be a valuable skill for GIS professionals, remote sensing experts, and geospatial software developers.


For more information or any questions regarding the LizardTech suite of products, please don't hesitate to contact us at:



USA (HQ): (720) 702–4849


(A GeoWGS84 Corp Company)



Comments


bottom of page