Demandbase

Introducing warp speed loading

How we cut prediction load time from 48 hours to 1

December 1, 2025 | 8 minute read


Joshua Cason
Staff Software Engineer, Demandbase
Warp Speed

At Demandbase, we love building the invisible systems that make B2B marketing smarter—and faster. Our latest breakthrough, Warp Speed Loading (WSL), is one of those behind-the-scenes upgrades that changes everything.

WSL slashes data load times from nearly two days to about an hour. That means our customers can react to opportunities and risks in real time instead of waiting for predictions to catch up. It’s not just a performance boost—it’s a major leap toward true predictive agility.

This project was also a masterclass in collaboration. Our Unified Data Platform team provided invaluable insights that made WSL possible, and our Developer Experiences team gave us the frameworks to move quickly and safely. Together, we turned a slow, shared data system into one of the fastest, most reliable pipelines at Demandbase.

Here’s how we reimagined our architecture, tackled long-standing bottlenecks, and made our platform faster, cleaner, and more resilient than ever.

Why speed is everything

Our approach to predictive accuracy is a collaborative, two-way street. We are continually refining and improving our machine learning (ML) models to ensure they deliver the most precise and reliable predictions possible. However, the true power of these models is unlocked when customers actively align their setup and configuration with their specific business goals.

This collaboration—merging our enhanced models with the customer’s specific implementation—accelerates results and cultivates a dynamic environment where customers can drive improvements through experimentation. They can quickly test diverse strategies, evaluate the impact of faster predictions, and continuously refine their methods, establishing an ongoing cycle of optimization and elevated performance. Intuitively, few will have the patience to improve the model setup if it takes two days to observe the impact of changes.

Moreover, imagine giving your competitors an extra two days to plan and address an account’s recent massive change in pipeline likelihood. The profound business impact of timely predictions cannot be overstated. When predictive models deliver results swiftly, customers gain the crucial ability to adapt their strategies in real-time. This immediate responsiveness allows them to pivot quickly, mitigate potential threats before they escalate, and seize opportunities that might otherwise be missed.

Enter Warp Speed Loading.

The architecture we started with

Our previous data loading system faced several significant challenges. The aged and complex orchestration stack required urgent replacement and simplification. Loading data into PostgreSQL was inefficient due to a single JSON column shared by multiple teams. Customer workloads with millions of accounts often took up to 24 hours to process.

Furthermore, change data capture (CDC) via Debezium introduced substantial delays, with updates processed only once daily, potentially adding another 24-hour lag. The ML-Scheduler, built on a Quartz framework, initiated the ML-Loader via a Pulsar queue, after which the PostgreSQL loading process would begin.

Another complication arose from handling soft-deleted model IDs and the varying number of models per customer. All scores and associated metadata were stored within the aforementioned shared JSON column. Direct transmission to CDC was unfeasible because other users of the shared JSON column could overwrite results if our data wasn’t present in PostgreSQL at the time of writing but arrived later. Moreover, end-user clients were exclusively aware of the single JSON column, making it impossible to simply add a concurrency-safe alternative.

The advancements detailed in the following section ultimately resolved these issues. By leveraging Demandbase’s evolving data platform, we’ve reduced a worst-case load time of 48 hours to approximately one hour.

The turning point: enabling developments

The fascinating thing is that WSL was made possible by only two changes.

One, our internal semantic layer was updated to provide a way for teams to point end-user clients to any column on a data object to find inner json fields. So imagine we have column A that holds json keys Bi <= B1, B2, … BN. This new metadata field associated with each Bi specifies the column where it can be found, in this case A. The previous approach would require a search for the field name in the available columns, and if the column was not found the field was expected to be a json key in the default column (fields). This provided a way to serve our data from a column that was independent of other teams’ writes. Nevertheless, it would not be a simple thing to support loading only some columns of a row in the same physical table in our Iceberg and Starrocks centric platform.

Fortunately, with our platform’s recent migration to Starrocks, our colleagues were able to support serving our data as part of a view of the original table. Starrocks provides scalable and efficient mechanisms to make this approach feasible. Thus our loading process is now simply a Spark job overwriting Iceberg partitions and a notification to the downstream data platform when the partitions are ready to use. We have exceptionally fast loading time and no CDC to wait on!

Warp Speed Loading (WSL) architecture

Our process, though still initiated by the ML-Scheduler, has undergone a significant streamlining to enhance efficiency and reliability. The updated workflow is as follows:

Data ingestion and processing: upstream data science outputs are directly channeled into tabular data files stored in S3. This acts as a centralized and accessible data lake.

Workflow orchestration: A Temporal Workflow is then triggered, orchestrating a series of EMR jobs. This ensures a robust and fault-tolerant execution of the data processing pipeline.

Data transformation with EMR: Two distinct EMR jobs are executed:

  • Job 1 (Run-date level scores): This job loads into a granular, run-date level score table, which serves as the definitive source of truth for our predictive data.
  • Job 2 (Account-level aggregation): This job takes the run-date level scores and aggregates them into semantically split columns at the account level. This eliminates the previous use of a single JSON blob, providing a cleaner and more structured data representation.

Data loading and integration: The processed data is subsequently loaded into Starrocks, our high-performance analytical database. Here, it is seamlessly joined into the existing Account view, providing a comprehensive and unified perspective.

This refined process offers several key benefits:

  • Enhanced speed: The streamlined flow and optimized EMR jobs contribute to significantly faster data processing and delivery.
  • Improved data cleanliness: The transition to semantically split columns and the elimination of the single JSON blob result in a much cleaner and more manageable data structure.
  • Prevention of race conditions: Our data is physically separate from other columns in the Account view and can be loaded at any time independent of when other columns arrive.

Beyond speed: resilience, savings, and control

  • Reducing operational risk through infrastructure independence: By moving away from shared infrastructure components like Quartz, Pulsar, and Kafka, we significantly reduce our dependency on external systems. This “shift left” approach empowers our team to control our operational environment more directly, leading to fewer points of failure and a more resilient system. This independence allows for greater agility in troubleshooting and resolving issues, as we are no longer beholden to the maintenance schedules or potential outages of shared services.
  • Minimizing stress on core databases and CDC: Decoupling from shared infrastructure also alleviates considerable pressure on our tenant database (Postgres) and Change Data Capture (CDC) processes. The elimination of billions of writes that previously flowed through these shared systems dramatically reduces the load, improving performance, stability, and scalability. This reduction in stress translates to a more robust and efficient data layer.
  • Accelerating development and deployment with isolated release cycles: A key benefit of this strategic shift is the establishment of independent and isolated release cycles. This means our team can take full ownership of our deployment pipeline, allowing us to roll out fixes, updates, and new features at our own pace, without being constrained by the release schedules of other teams or shared release paths. This autonomy not only accelerates our ability to respond to market demands and address critical issues but also mitigates the risk of our changes impacting other services. This “shift left” in responsibility and control empowers our team to deliver value more quickly and reliably.

With Warp Speed Loading, we’ve transformed our data processing from a 24–48 hour cycle to near real time. That’s more than a technical upgrade—it’s a strategic shift.

Faster predictions mean faster reactions, smarter experimentation, and sharper competitive advantage. For our customers, this means better decisions and bigger wins.

And for us? It’s another step in our mission to make data intelligence truly instant.

Want to see how Warp Speed Loading can power your pipeline intelligence?


Joshua Cason
Staff Software Engineer, Demandbase