AI Featured

Why 60% of Your Operational Data Never Reaches Your AI and Analytics Platforms (And How to Fix It)

Inkwelldata

07 Oct 2025 • 11 min read

The invisible devices costing you insights, efficiency, and competitive advantage

If you've invested in AI-powered operations, digital twins, Databricks, or Palantir, you're expecting sharper decisions, predictive insights, and real-time control of your assets.

But there's a problem you might not fully realize yet: your platform is probably only seeing about 40% of your operational devices.

That's not a typo. Across Europe and worldwide, close to 60% of enterprise-class sensors, meters, and controllers remain effectively invisible to modern data platforms. Globally, we're talking about roughly 4.4-4.8 billion devices. In Europe alone, that's about 1.3 billion devices whose data streams never reach the analytics and AI systems that need them most.

        The insights from your AI, digital twins, and SaaS platforms are only as good as the data you can feed them. And for most organizations, a huge chunk of that data simply isn't making it through.
    

The Devices You're Missing Are the Ones You Need Most

This isn't about a lack of devices. The sensors are already there—billions of them, installed across energy networks, factories, refineries, buildings, and infrastructure. The challenge is that most of these devices simply can't speak to modern platforms in a language they understand.

And here's what makes this particularly frustrating: the invisible devices aren't peripheral equipment. They're often exactly what your modern platforms need to deliver real value.

In Europe, the "invisible" estate includes:

Smart buildings (≈480 million devices, 37%) — HVAC systems, air quality sensors, lighting controls, elevators, and life-safety systems. Much of this communicates over BACnet, KNX, or serial loops. Data often stays trapped in building control systems or arrives upstream without standard names, units, or reliable timestamps.

Utilities and grids (≈430 million devices, 33%) — Over 200 million water meters, around 90 million electricity meters, more than 80 million gas meters, plus grid sensors at substations and feeders. Many are read intermittently or trapped in legacy head-ends. Even where smart meter penetration is high, enterprise-grade access beyond utility systems varies widely.

Manufacturing and industrial facilities (≈260 million devices, 20%) — Diverse machines and test equipment speaking RS-485, Modbus, and vendor-specific dialects. These produce the signals you need for safety, efficiency, and compliance. Even when readings are collected locally, they're rarely normalized across an entire estate.

Transport infrastructure (≈65 million devices, 5%) — Rail, ports, airports, and logistics facilities mixing long-lived serial links with newer IoT. Ownership is fragmented, and data surfaces inconsistently.

Cities and other infrastructure (≈65 million devices, 5%) — Street lighting, environmental monitoring, and water treatment facilities, often using LPWAN or proprietary systems with limited context.

These are precisely the data streams that digital twins, AI models, and enterprise analytics require. Without them, insights are partial, predictions are unreliable, and optimization is limited.

Your Databricks models can't predict equipment failures if they can't see the vibration sensors. Your Palantir workflows can't optimize energy use if they can't access submeter data. Your Autodesk or Bentley digital twin can't reflect reality if half the building systems aren't connected.

Why Modern Platforms Need Better Data (And Why It Matters More Now)

Over the past year or two, three major shifts have changed what's possible in enterprise operations:

Agentic AI systems can now reason about goals and execute multi-step workflows autonomously. Think Microsoft Copilot or Salesforce Einstein Copilot coordinating complex operational tasks.

Industrial SaaS platforms have embedded sophisticated automation and analytics directly into operational workflows—from customer operations to facilities management.

Digital twins create live, structured replicas of your assets and systems for optimization, planning, and prediction. Tools like Azure Digital Twins, Siemens Teamcenter, Bentley iTwin, Autodesk Tandem, and Dassault 3DEXPERIENCE are moving from pilot projects to production.

The promise? These systems work together to transform how you operate. The catch? They need timely, secure, well-labeled operational data from across your entire physical estate. Tools that once coped with patchy or delayed inputs now expect complete, quality data if they're to deliver reliable results.

And that's where most organizations hit a wall.

Why Your OT Data Is Fragmented (And Why That Won't Change)

Unlike the IT world, which converged around internet standards like TCP/IP and HTTP, operational technology evolved differently. Different industries developed their own specialized protocols over decades:

Factories speak Modbus, Profinet, and EtherNet/IP. Buildings use BACnet, KNX, and LON. Utilities rely on DLMS/COSEM, DNP3, and IEC 61850. Legacy systems still run on RS-232 and RS-485 serial connections. Newer transports include OPC UA, LoRaWAN, NB-IoT, and LTE-M.

Each protocol solved real-world constraints—deterministic timing, long cable distances, power budgets, harsh environments. But none of them unified the landscape.

Most of today's enterprise devices were designed before the internet was widely adopted. Their communications were tailored for very specific industrial environments. M-Bus and wM-Bus (common for European meters) were designed for low-power radios and wired loops, not TCP/IP networks. BACnet MSTP in buildings often runs over RS-485 cabling—efficient for HVAC systems, but not IP-based. DLMS/COSEM in electricity and water meters encodes data in compact binary frames that mainstream IT systems can't interpret without translation.

Two realities keep this fragmentation in place:

Long-lived assets. Industrial equipment is designed to last decades. A heat meter installed in 2005 may still be in service in 2035. You can't just swap out a production line or building management system because you want a different communication protocol. Replacing devices purely to change protocol is commercially and operationally unrealistic.

Brownfield predominance. Most modernization happens in existing facilities with established equipment and wiring. New systems must integrate with what's already there.

Walk through a typical Rolls-Royce aero-engine facility or Stellantis assembly plant, and you'll find CNC machines on Modbus or Profinet, robot cells on EtherNet/IP, environmental controls on BACnet, energy meters on DLMS/COSEM, and serial loops connecting instruments and drives. One building, dozens of protocols. That's not going away.

What About MQTT and Other Common Solutions?

Many organizations have turned to MQTT (Message Queuing Telemetry Transport) as a bridge between OT and IT. It's lightweight, cloud-ready, and widely supported by AWS, Azure, and Google Cloud. For new, IP-enabled devices, MQTT works beautifully.

But here's what MQTT doesn't do:

It doesn't translate meaning. If one device publishes "temperature=20" and another sends "temp=68," MQTT will deliver both messages faithfully—but it won't tell you one is Celsius and the other is Fahrenheit. MQTT moves messages but doesn't define meaning.

It requires IP connectivity. MQTT relies on TCP/IP. Billions of devices on RS-485, M-Bus, DLMS/COSEM, BACnet MSTP, or proprietary radio networks can't speak MQTT natively.

It doesn't provide governance. Enterprise requirements like consistent timestamps, quality flags, audit trails, and data validation aren't part of the MQTT specification.

Security is left to the endpoints. MQTT doesn't enforce Zero Trust; many legacy devices can't manage modern certificates or strong authentication.

MQTT is valuable for what it does. But it doesn't bring the legacy majority with it, and it doesn't make data meaningful or trustworthy by itself.

The Patchwork of Partial Solutions

Most enterprises operate a mosaic of approaches alongside MQTT:

Industrial historians like OSIsoft PI and AVEVA excel at collecting high-frequency equipment data at site level. Proven and effective for trending and near-plant analytics, but they're often proprietary and costly to extend across heterogeneous estates or share in cloud-native ways.

Protocol gateways and middleware translate between specific protocols (Modbus to BACnet, serial to IP), solving individual gaps quickly but creating a fragmented landscape of boxes and configurations to maintain at scale, with limited handling of semantics, security, and governance.

Cloud IoT hubs like AWS IoT Core and Azure IoT Hub ingest IP-device telemetry at internet scale and feed cloud analytics and AI services. They assume MQTT or HTTPS at the edge, so legacy estates still need upstream conversion and enrichment to make data usable.

Apache Kafka and event backbones are the standard IT event fabric for high-throughput, durable distribution. Kafka is not an OT edge connector. Translation, labeling, security, and governance must occur before data reach Kafka; otherwise raw inconsistencies are simply moved more efficiently.

Each solution works within its lane. None creates a unified, secure, governed fabric across your full, mixed-protocol OT estate.

In practice, this means in a large automotive plant with 8,000 smart devices, maybe 10-20% speak MQTT out of the box, another 30% can be pushed through gateways, and 50-60% remain outside without proper translation, labeling, and governance. Many of those invisible devices produce the core operational context your AI and digital twins need—energy data from DLMS/COSEM meters, environmental compliance data from BACnet systems, critical process signals from serial instruments.

Why This Problem Is Getting Worse, Not Better

You might think that as more IP-ready devices come to market, the problem would gradually solve itself. Unfortunately, it's not that simple.

The industrial world adds roughly 500-700 million enterprise-class devices every year. While a growing share offers IP options, most deployments still happen in brownfield environments that must remain compatible with existing wiring and protocols. The percentage of IP-ready endpoints edges up slowly, but the absolute number of "unsupported" devices continues to rise.

Meanwhile, demand is accelerating. Agentic AI, industrial SaaS, and digital twins dramatically increase the appetite for data. They need more granular, more diverse telemetry—energy patterns, environmental conditions, equipment utilization, quality metrics—precisely the signals often trapped in DLMS/COSEM, BACnet, and various serial protocols today.

It's a structural issue, not a temporary blip.

Introducing Altior: Middleware Purpose-Built for OT Data

This is why we built Altior. It's middleware that sits one step earlier in the data chain than most platforms—close to your devices—and transforms raw operational signals into clean, secure, enterprise-ready data.

Altior works in two complementary modes: curate or route. In both modes, our Aegis security framework provides Zero Trust protection and comprehensive audit trails.

Three Ways Altior Adds Value

1. Make invisible devices visible

Altior connects directly to hard-to-reach devices—DLMS/COSEM meters, M-Bus networks, BACnet systems, RS-485 serial loops, LoRaWAN sensors, NB-IoT devices—and turns their raw binary data into well-labeled, governed events with proper units, timestamps, identity, and quality flags.

A water meter using M-Bus, a gas submeter on RS-485, or a manufacturing line sensor on Modbus can all connect into Altior. We understand their specialist "languages" (protocols), decode the raw binary signals they produce, and turn them into labeled, structured readings.

Your data layer modernizes without ripping out and replacing operational assets. The devices your platform couldn't see suddenly become accessible.

2. Clean up or securely route what already flows

Where data streams already exist (from MQTT brokers, Kafka topics, cloud IoT hubs, or historians), Altior can either standardize, validate, and relabel those feeds against a clear schema, or act as a secure router when devices are already well-modeled upstream.

Every reading is validated: Does it make sense? Is it on time? Has it been duplicated? A quality flag is attached to each piece of data, marking its reliability and creating an audit trail.

Either way, Aegis security and audit apply end-to-end. This matters particularly in regulated industries where you need to prove data provenance and integrity—not just collect numbers.

3. Define device-level business logic at the edge

Altior can run lightweight, per-device logic close to the source to shape events before distribution—derived metrics, alerting rules, time alignment, local aggregation, metadata enrichment. All logic is schema-aware, versioned, and auditable, so outputs remain consistent and traceable across sites and partners.

How Altior Fits With Your Existing Stack

Altior is designed to be non-intrusive. It can read existing data streams, clean and label them, then publish a curated copy—or securely route them unchanged when they already meet your requirements. Your current producers and consumers continue unchanged; downstream teams can adopt curated streams when they're ready.

If you use MQTT (HiveMQ, EMQX, Mosquitto), Altior can subscribe to raw topics, apply curation, and republish to clean topics—or sit in-path to validate before forwarding.

If you use Kafka (including Confluent Platform), Altior reads from raw topics, applies schema and quality checks, and writes to curated topics. Schema registry integration means all teams see the same data shape.

If you use cloud IoT hubs (AWS IoT Core, Azure IoT Hub), Altior can clean data either upstream at the edge before it reaches the cloud, or downstream by normalizing hub messages before fanning them out to data lakes, MQTT, Kafka, and APIs.

If you rely on historians (PI, AVEVA, Aspen), Altior can read via APIs or exports, standardize names and units, add proper identity and timestamps, and publish governed feeds to your enterprise backbone.

If you use Databricks, you receive structured, validated data streams ready for analytics and AI—no more wrestling with inconsistent formats or missing timestamps.

If you use Palantir, you gain secure, governed data with full provenance, strengthening confidence in the decisions and workflows built on top.

If you use Autodesk or Bentley digital twins, you receive accurate live feeds from building systems and infrastructure, ensuring your models truly reflect reality rather than an idealized version.

You can deploy Altior at the edge on gateways or site servers for low-latency needs and data residency, on-premises in your data center, in your private cloud, or as a managed service in public cloud. Hybrid deployments are common—edge nodes handle translation and validation locally, then send curated streams to your central systems. Aegis maintains consistent identity, policy, and audit across the split.

How It Actually Works

Let's walk through what happens when a device sends data through Altior:

First, Aegis secures the path. Before accepting a single reading, Aegis establishes trust between all parties—devices, site nodes, Altior services, and downstream platforms. Every device, gateway, and server must mutually authenticate. All traffic is encrypted end-to-end, from the meter or sensor to your enterprise platform. Every action is logged in a tamper-evident audit trail, ensuring full traceability for regulators. Tenant separation and strict access policies protect shared infrastructure.

Next, Altior connects to authorized devices over M-Bus, RS-485 (via small gateways), LoRaWAN, NB-IoT, LTE-M, or Ethernet/Wi-Fi for IP-ready equipment. Connection parameters are governed and auditable.

Then it reads and decodes the binary data. Many devices send compact binary frames—efficient for machines, opaque for humans. Altior decodes these according to each protocol's rules (DLMS/COSEM OBIS codes for meters, BACnet objects for building systems, Modbus registers for instruments), producing structured readings with proper field names and values.

At onboarding, you define a schema—the "data contract" for each device type and instance. This specifies names and units, valid ranges, identity and location, timestamp and quality rules, and optional transforms. Templates assist with common protocols. Schemas are human-readable and versioned.

Every message is validated before it flows anywhere. Checks cover completeness, plausibility, timeliness, and duplication. A quality flag is added. Your policies determine whether to pass with a flag, quarantine, or buffer and retry.

Finally, events are enriched, standardized, and routed one-to-many to MQTT, Kafka, and APIs. Where curation isn't necessary, Altior securely routes unchanged streams, applying Aegis controls and audit only.

Once cleaned and verified, data flow into Databricks for analysis, into Palantir for orchestration, or into Autodesk and Bentley for digital twins. Your destination platforms don't need to change. Altior simply feeds them with data they would otherwise never see.

Why It Scales and Stays Reliable

Altior was built from the ground up for portfolio-scale operations:

Isolated micro-processes per device contain faults and prevent cascading failures. Built-in buffering and flow control prevent data loss during network bursts or temporary outages. Horizontal clustering lets you add nodes for near-linear capacity gains. Safe change management with health checks, metrics, rolling upgrades, and dual publishing for schema changes. Security that keeps pace through Aegis tenant isolation, certificate lifecycle management, least-privilege policies, and forensic audit at scale.

A Simple Question That Reveals the Scale

Organizations often discover the scope of this problem with a straightforward question:

"Are there any smart devices in your operational estate that are not currently ingested into your chosen enterprise platform—either because of cost, security, or protocol reasons?"

In almost every large enterprise, the answer is yes. And it's usually not a handful of devices, but thousands—often the meters, submeters, and manufacturing sensors that record energy, safety, and compliance signals.

This is why we can arrange proof-of-concepts at short notice. A pilot typically starts with a single site, connecting a small group of previously invisible devices, then showing their data appearing cleanly and securely in Databricks, Palantir, or your digital twin platform. From there, it's straightforward to extend the pattern across your entire estate.

The Bottom Line

Modern platforms—from agentic AI and industrial SaaS to Databricks, Palantir, Autodesk, and Bentley—promise a step change in enterprise intelligence. But their value depends entirely on the data they're fed.

Today, close to 60% of enterprise devices remain invisible, including many of the most critical: meters, submeters, building systems, and manufacturing line sensors. This estate isn't shrinking—it's growing, with 500-700 million devices added globally each year.

Altior was created to close this structural gap by bringing previously invisible devices into view, translating legacy and mixed-network signals into governed, labeled events—while also cleaning up or securely routing your existing data flows.

Because Altior fits alongside your existing stack and offers flexible deployment, you can improve data quality without disruption and while preserving compliance. Aegis ensures Zero Trust security, governance, and audit are built into the path from device to destination.

        The result? Trustworthy operational data at scale—consistent, well-timed, properly identified, and quality-marked—delivered to the platforms where value is realized.
    

Without this step, modern platforms will always be working from an incomplete picture, no matter how sophisticated their analytics or AI capabilities. With Altior in place, they gain the solid foundation they need to deliver their full promise.