The problem
A client operating thousands of distributed industrial gateways had no unified way to monitor device health, push firmware updates, or react to field failures. Each site was a black box.
What we built
We engineered an end-to-end telemetry and management platform: hardened Embedded Linux images on the gateways, a store-and-forward MQTT pipeline resilient to flaky connectivity, and a cloud backend on AWS IoT for ingestion, dashboards, and over-the-air updates.
Key decisions & tradeoffs
We chose store-and-forward buffering at the edge over a purely streaming model so that no telemetry is lost during connectivity gaps — critical for industrial compliance. OTA updates use an A/B partition scheme with automatic rollback to guarantee a brick-proof field fleet.
Outcome
The platform now manages 12,000 devices at 99.95% fleet uptime, with sub-200ms edge-to-cloud telemetry latency and zero-downtime firmware rollouts.