Aerial image quality control at machine speed: how synthetic defects solved the rare-error problem

Less than half a percent of aerial images contains defects – and that single statistic, not GPU power or model architecture, is what breaks the textbook ML approach to image quality control

The bottleneck behind every orthophoto

Aerial image production has been automated at every step except the last one. Flight planning, sensor calibration, aerial triangulation and orthophoto mosaicking have moved from manual to algorithmic over the past two decades; quality control of the raw frames has not. In most production pipelines, an operator still flips through thousands of images one by one to flag clouds, blur, glares and the rest. The work is slow, expensive and prone to omission, and as image volumes keep growing – driven by national orthophoto programs, infrastructure monitoring and dense urban capture – manual QC scales linearly with people while everything around it scales with software.

The data paradox

Three numbers explain why automating this step is harder than it looks. 99.5% of aerial images are correct. A single large-format frame can weigh up to 300 MB. A mid-sized aerial mapping operator captures several hundred thousand frames a year; the largest process several times more.

Each of those numbers breaks a different assumption that underpins standard supervised learning. The class imbalance is so extreme that even after years of operations a company holds at most a few thousand truly defective frames – too few for a model to generalise across cameras, GSDs, seasons and terrain types. Pixel-level masks for those few examples are costly to draw. And once you start streaming 300-megabyte frames through GPUs, disk and memory throughput become the limit, not network depth. The data problem itself had to be reframed.

Inverting the problem: simulators, not datasets

If defects cannot be collected at scale, generate them. The approach taken at OPEGIEKA was to build a dedicated synthetic-data pipeline for each defect class and to train one convolutional neural network per class on samples produced on the fly during training.

For blur, randomised Gaussian kernels are convolved with correct images, with a small noise layer added back to preserve a natural appearance. The detector treats blur as an all-or-nothing property of the frame rather than a local one – a deliberate simplification that matches how blur behaves in aerial photography.

For glares, the generator builds random radial shapes, smooths them through morphological filters and pastes them onto random tiles. Detection runs through a U-net. Only tiles with high pixel values are scored, so most of the image never reaches the network.

For clouds, the source material is Sentinel-2 imagery – specifically scenes of partial cloud cover over open sea, where the uniform radiometric background of the water makes per-cloud segmentation clean. Each extracted cloud is composited onto correct aerial tiles with random rotation, scale and brightness. Cloud shadows follow the same logic in negative, with an offset that respects sun angle.

The training set is effectively unlimited, and the model does not overfit to a particular scene.

What the model gets right, and where it draws the line

CertiflAI by OPEGIEKA currently covers six radiometric defect types – clouds, cloud shadows, blur, glares, discolorations and burned-out areas – five of them via dedicated CNNs and the last one via a rule based on continuous high-value regions with low variance. A second control layer runs fast geometric checks across the block: endlap, sidelap, pixel size, omega/phi/kappa, sun height and AOI compliance.

Radiometric check verification panel, cloud check result in images and error mask

Geometric check results panel in map view – longitudinal coverage check

In production the radiometric pipeline processes a medium-format frame in around 3 seconds and a large-format frame in around 5. A 1,631-image block from a recent campaign passed the geometric stage in roughly 90 seconds. Omission rates measured on internal benchmarks stay below 5% for the two critical defects – clouds and blur – and below 10% for the rest, with a tunable false-positive threshold typically set around 20%.

No detector catches everything cleanly, and CertiflAI is no exception. The most common false positives observed in production are biogas plants, hot spots, smoke from small fires and fields in unusual states; smoke has emerged as a recurring class of artefact worth flagging. Candidates are surfaced for operator review rather than hidden, and the sensitivity threshold is exposed as a parameter.

Geometric check results panel in report view

Geometric control results panel in map view – photo control within the development range

Changes for a national mapping agency

CertiflAI was used in production at three national mapping agencies – GUGiK in Poland, ICGC in Catalonia and the Geodetic Institute of Slovenia – and has been used to verify more than one million aerial frames between them. Days of operator time clear in hours of compute, the procedure is deterministic and parameterised rather than dependent on who is on shift, every project carries a CSV or XLSX audit trail with thresholds embedded – fit for documentation regimes such as Poland’s PZGiK – and same-day detection tightens the feedback loop to acquisition. For public procurement that means QC thresholds can sit directly in tender documents, with vendors demonstrating compliance against a defined procedure rather than against assurances.

The deployment problem

A model that works in the lab is not the same as a tool an agency will deploy. Geospatial data carries confidentiality constraints that rule out cloud SaaS for most national mapping bodies. CertiflAI is therefore packaged as an on-premise Windows application with a CUDA-enabled GPU, with no requirement for internet connectivity during operation.

The broader lesson

The instinct in machine learning is to fix data problems with more data. Aerial QC is a case where that instinct fails: the absence of defects is a property of the data itself, not a flaw to be patched. When the imbalance becomes that extreme, the simulator becomes the dataset – a pattern that already extends from raw aerial frames to orthophotos, and that travels naturally to oblique imagery and satellite scenes.

For more information on high quality data collection and processing contact Jakub Krawczyk, Head of R&D and Remote Sensing at OPEGIEKA

Find more: www.opegieka.pl

OPEGIEKA, part of the Dephos Group consortium, is confirmed as an exhibitor at DroneShow, MundoGEO Connect, SpaceBR Show, and Expo eVTOL 2026, which will be held from June 16th to 18th at Expo Center Norte – Blue Pavilion, in São Paulo (SP).

Check out the schedule of courses, seminars, and forums at DroneShow Robotics, MundoGEO Connect, SpaceBR Show, and Expo eVTOL. Registration is available in advance with a discount, and spaces are limited.

See the highlights from the last edition: