
Key Takeaways
Engineers at Demandbase migrated their Firmographics Data Service from Elasticsearch to Celerdata to enable cross-database joins and unify their data architecture. The migration was driven by the need to perform high-performance filtering across two datasets: the global Demandbase database and customer-specific account data.
By consolidating storage engines, the team eliminated scalability bottlenecks associated with cross-database processing while maintaining latency parity for critical search and autocomplete features.
The Firmographics Data Service is the backbone of the Demandbase ecosystem. It manages the essential “firmographic” data—the organizational equivalent of demographics—including industry, revenue, and employment data. This service powers core features of the DB1 platform, such as company and contact pages and list-building tools, as well as external offerings like Demandbase Data Integrity and Public APIs. It is responsible for handling high-volume read operations, including searches, fetches, bucket aggregations, and fuzzy searches on company and contact data.
The legacy architecture relied on a split storage model:
User fetching firmographics data from Elasticsearch and account-person data from Celerdata
Technical debt accrued when product requirements shifted to demand complex filtering across both datasets simultaneously—specifically, performing right or left joins to extract “company-only” data or “known contacts” intersecting both global and customer lists.
The engineering team evaluated three distinct approaches to solve this:
The team selected Celerdata, the enterprise platform built on the open-source StarRocks engine, as the unified datastore, effectively moving from a search engine (ES) to a real-time analytics database.
User fetching both firmographics and account-person data from Celerdata
The team moved Firmographics search from Elasticsearch (built for search) to StarRocks (built for real-time analytics). That meant identifying the key feature differences and building practical equivalents so the user experience stayed the same—without keeping Elasticsearch in the loop.
| Area | Elasticsearch Capability | How the Team Addressed It in StarRocks |
|---|---|---|
| Text Processing (Analyzers vs. Ingest-Time Tokenization) | Built-in analyzers automatically handle stopwords, diacritics, stemming, and token filters at index/query time. | Text processing moved to the ingestion layer. Strings are pre-tokenized and normalized in application code using Lucene, and “search-ready” versions are stored in StarRocks. |
| Exact Match + Sorting (Multi-fields vs. Multiple Columns) | Supports multiple “views” of the same field (analyzed for search, keyword for exact match, normalized for sorting) within a single mapping. | Modeled explicitly using separate columns: raw/exact value, sort-normalized value, and tokenized/search value. |
| Complex Data Structures (Nested + Relationships) | Native support for nested objects and parent/child relationships within the index. | Implemented using SQL-native modeling with normalized tables and joins for strict correctness; JSON/arrays used where appropriate. |
| Ranking and Relevance | Built-in relevance scoring, boosting, and ranking via Query DSL. | Ranking made explicit and application-driven. Elasticsearch-style boosting recreated in SQL using weighted expressions combining ngram_search similarity scores with popularity and business signals for deterministic ranking. |
Initial translations of Elasticsearch queries to CelerData caused latency regressions for complex search endpoints. These queries attempted to fetch all fields, apply all filters, and perform joins in a single pass—an execution pattern better suited to Elasticsearch’s document-oriented access model than a columnar OLAP engine.
Why this failed in CelerData:
CelerData is columnar and scan-optimized. Wide row fetches, early joins, and unnecessary column access significantly increased scan cost and memory pressure.
Solution: Two-phase query execution
Why this worked:
Result:
Performance Improvement within CelerData: latency reduced from ~30s (initial, unoptimized CelerData queries) to ~1.2s after optimizations (~98% reduction).
Additional optimizations:
This approach significantly reduced scan volume and improved response times without sacrificing correctness.
Autocomplete was the biggest feature gap in the migration. Elasticsearch supports autocomplete-style search naturally because it is designed for text lookup. StarRocks did not originally have the n-gram style search needed for “type-ahead” experiences.
To close this gap, the team partnered with CelerData and helped get a native ngram_search capability added, including case-insensitive support. However, edge n-gram style behavior (commonly used in Elasticsearch for prefix autocomplete) still wasn’t available in StarRocks.
To mimic the same user experience, team implemented a hybrid approach:
This combination allowed the team to preserve the existing autocomplete behavior while fully removing Elasticsearch from the serving path.
For global search, joining across million-row tables proved expensive. Instead of large joins:
Performance Improvement within CelerData: latency reduced from ~6s (initial, unoptimized CelerData queries) to ~2.5s after optimizations (~60% reduction).
Several fields contained large arrays that were frequently queried using array_contains. In practice, this resulted in poor performance in StarRocks for high-cardinality arrays.
To resolve this:
Performance Improvement within CelerData: latency reduced from ~4s (initial, unoptimized CelerData queries) to ~1s after optimizations (~75% reduction).
While this required careful tokenization and boundary handling, the change resulted in dramatic query performance improvements, especially under high concurrency.
Aggregating list-type
The solution involved:
Performance Improvement within CelerData: latency reduced from ~40s (initial, unoptimized CelerData queries) to ~800ms after optimizations (~98% reduction).
This approach preserved correctness while keeping aggregation latency within acceptable bounds.
Direct infrastructure savings: Consolidating on CelerData eliminated the need for dedicated, self-managed Elasticsearch clusters on AWS, resulting in a significant reduction in infrastructure run-rate costs.
Operational bandwidth savings: Removing Elasticsearch reduced ongoing maintenance overhead (cluster operations, scaling, upgrades, on-call). This translated to ~8 weeks of one engineer’s time per year freed up from ES upkeep.
Workflow simplification savings: Decommissioning Elasticsearch also removed the separate ES indexing workflow (pipelines, reindex jobs, backfills), reducing both compute spend and the engineering time spent monitoring and troubleshooting indexing health.
Monitoring and reliability overhead reduction: Fewer moving parts (no dual datastore + indexing lag to watch) lowered the cost and effort spent on alerting, dashboards, and incident response tied to index freshness and search cluster stability.
The migration successfully unified data sources and enabled complex joins without application-side overhead.
Full Traffic moved from Elasticesearch to Celerdata on 20 Nov 20:30 IST

Notice that the average latency remain unaffected even with the traffic switch

The retrospective highlighted several key outcomes:
This migration demonstrates that modern OLAP systems can successfully replace traditional search engines—if teams are willing to rethink query patterns, data modeling, and feature implementation.
Rather than attempting a one-to-one replacement of Elasticsearch, Demandbase leaned into the strengths of a columnar analytics engine. By redesigning queries, rebuilding search capabilities, and collaborating closely with their vendor, the team achieved both architectural simplification and performance gains.
The result is a more scalable, maintainable, and cost-efficient system that not only meets existing requirements but creates a stronger foundation for future innovation.
Related content
We have updated our Privacy Notice. Please click here for details.