Vinted Search Scaling Chapter 8: Goodbye Elasticsearch, Hello Vespa Search Engine

September 5, 2024 by Ernestas Poškus

Ever wondered what goes into making the search experience seamless at Vinted? In this post, our Search Platform team take you through their latest improvement. Discover how the team’s hard work and expertise allowed us to transition from Elasticsearch to Vespa, a cutting-edge search engine that’s set to elevate how our members find their next second-hand gem.

The search for a new solution

According to the short commit 9963ab0c171 back in May 2015, Vinted started using Elasticsearch for our item search. Before that, we used the Sphinx search engine, but that’s ancient history now.

Suffice it to say, Elasticsearch served us well for years. But as Vinted grew, so did our data and the complexity of the queries. Eventually, we started to hit the limits of what Elasticsearch could handle, so we set out to find a new, long-term, and scalable solution.

After exploring our options, we settled on Vespa, an open-source, fully-featured search engine and vector database. Originally built by Yahoo!, they became a separate company last year. Vespa supports vector search, lexical search, and searching within structured data, all in the same query. Plus, the integrated machine-learned model inference allows us to apply AI to make sense of our data in real-time.

Vespa is a battle-tested technology that can handle billions of documents and thousands of queries per second. It’s also used by some of the biggest companies in the world. As such, we did not consider other solely vector or hybrid search databases to be an alternative. Our extensive research has found that Vespa is the ideal solution for handling the scale and complexity of our data.

At the time of writing, we have about 1 billion active searchable items. We peaked at about 20,000 requests per second under 150 ms at the 99th percentile. The daily feeding rate (indexing in Vespa terms) is done in real-time at 10,300 RPS for both update/remove operations. The time it takes for a single item to be updated is 4.64 seconds at the 99th percentile from Apache Flink to Vespa. We run bare metal servers in our own data centres.

Before migrating, we had 6 Elasticsearch clusters with 20 data nodes each, plus dozens of client nodes running on virtual machines. Each server had 128 cores, 512GB RAM, 0.5TB SSD RAID1 disks and a 10Gbps network. You can read more about that setup here.

Today, for our item search, we’ve moved to 1 Vespa deployment (cluster) with 60 content nodes, 3 config nodes and 12 container nodes. Each content node has 128 cores, 512GB RAM, 3TB NVMe RAID1 disks, and a 10Gbps network. We use a HAProxy load balancer to route the traffic to the stateless Vespa container nodes, which run on virtual machines.

So, with all the specs and stats laid out, let’s dive into the migration process.

The migration process

We began the migration in May 2023. By November of the same year, we’d completely switched all item search traffic to Vespa. Then, in April 2024, the migration was wrapped up when we switched the facet traffic.

TL;DR

The migration was a roaring success. We managed to cut the number of servers we use in half (down to 60). The consistency of search results has improved since we’re now using just one deployment (or cluster, in Vespa terms) to handle all traffic. Search latency has improved by 2.5x and indexing latency by 3x. The time it takes for a change to be visible in search has dropped from 300 seconds (Elasticsearch’s refresh interval) to just 5 seconds. Our search traffic is stable, the query load is deterministic, and we’re ready to scale even further.

Load distribution is evenly distributed across all nodes, meaning no more “hot nodes”. We’ve also increased our ranking depth by more than 3 times, up to 200,000 candidate items. This had a significant business impact, making our search results more relevant due to the increase in ranking depth. Plus, we’re saved some hassle, as we no longer need to toil about continually fine-mingling Elasticsearch shards and replica ratio.

Here’s how we did it

We formed a Search Platform team by bringing together four Search Engineers, each with their own unique background, and a shared expertise in search technologies.

The project was divided into five key areas: architecture, infrastructure, indexing, querying, and metrics/performance testing.

Along the way, we had to learn a great deal about Vespa, as it differs significantly from Elasticsearch. We needed to understand how to deploy it, feed data into it, query it, monitor it, and scale it in order to gain the tacit knowledge needed to run it in production.

Let’s dig.

Architecture

In moving from Elasticsearch to Vespa, we faced the challenge of redesigning our search architecture to maximise performance and scalability.

One of the guiding principles we adhered to was Little’s and Amdahl’s law, which underscores the importance of optimising the parts of the system that most impact overall performance. Vespa’s architecture allows us to distribute content across multiple nodes and scale horizontally, which is crucial given our growing items dataset.

The flexibility of Vespa’s architecture also enables us to effectively manage and balance loads across our nodes. Unlike Elasticsearch, which required careful tuning of the shard and replica configurations, Vespa’s content groups allow us to easily scale by adding more nodes or content groups without complex data reshuffling. This scalability has been vital for ensuring consistent performance as our query volumes and data size continue to grow.

Infrastructure

Shifting from Elasticsearch to Vespa required a major transformation of our infrastructure deployment strategy. In our previous setup, we maintained multiple Elasticsearch clusters with a large number of data nodes and client nodes. While effective for a time, this approach became increasingly unwieldy as our data and traffic scaled. Maintenance operations were a constant burden.

Vespa’s deployment model required us to rethink our approach, particularly around the Vespa Application Package (VAP) deployment. VAP deployment encapsulates the entire application model into a single package, including schema definitions, ranking configurations, and content node specifications.

Our infrastructure has also evolved to meet Vespa’s high-performance demands. We’ve moved to a setup where each content node now has 3TB NVMe RAID1 disks, ensuring each node can handle a large volume of data and queries. Additionally, HAProxy has been implemented to manage load balancing, with plans to further optimise this with the Istio Envoy proxy. This will enhance our ability to handle complex routing and scaling scenarios, including automatically mirroring traffic to inactive deployments.

Indexing

Indexing in Vespa has been a game-changer compared to Elasticsearch. With Elasticsearch, managing shard and replica configurations – especially during re-indexing processes when fields changed – was time-consuming and error-prone. Vespa simplifies this with a distributed architecture that automatically manages data partitioning and replication across content nodes.

We’ve adopted Vespa’s document schema in our Search Indexing Pipeline (SIP), built on top of Apache Flink. Vespa was integrated into our existing data pipeline using Vespa Kafka Connect. We also open sourced Vespa Kafka Connect as there was no Vespa sink available. The Vespa Kafka Connect sink connector can work in two operational modes, which act very differently and serve two distinct cases. It can be configured using the vespa.operational.mode configuration parameter.

Several months ago, we started writing directly from Apache Flink, which allows data to be fed into Vespa in real-time. This integration ensures our search index is always up-to-date, with new items being searchable within seconds. The efficiency of Vespa’s indexing process in our tests was evident in our ability to handle up to 50k RPS for updates and removals per deployment – significantly reducing the time to index a single item to just 4.64 seconds at the 99th percentile after it’s uploaded to Vinted.

The ability to manage large-scale reindexing without downtime or performance degradation was another crucial factor in our migration. Vespa’s architecture allows for continuous indexing while serving queries, eliminating the need for complex shard rebalancing, alias switches, and recreating indices when fields change, which results in minimised operational burden.

In modern Search systems, one of the crucial requirements is how fast you can make a change in a system. Indexing latency directly affects the lead time of feature development and the pace of search performance experimentation.

Querying

Querying in Vespa presents a marked shift from our previous experience with Elasticsearch. Vespa supports both traditional lexical search and modern vector search, allowing us to combine these approaches in a single query for more relevant results. This flexibility is further enhanced by Vespa’s support for structured data queries, which allows us to filter and rank results based on complex criteria in real-time.

Figure 1: Vinted’s triangle of search

One of the major innovations we have contributed to open source is Vespa’s ability to integrate with Lucene text analysis components – the same underlying technology used by Elasticsearch. By adopting Lucene components in upstream Vespa, we retained the language analysers and capabilities while benefiting from Vespa’s superior scalability and performance. This integration allowed us to migrate text analysis configurations from Elasticsearch while taking full advantage of Vespa’s advanced features.

We also developed custom searchers to implement the search query contract. A searcher is a component that extends the class com.yahoo.search.Searcher, gaining access to the Query and Execution. These custom searchers eventually construct YQL to call Vespa.

Product applications using search communicate with Vespa via the Golang service. This middleware Go service acts as a gateway, accepting search requests in a predefined flat contract that we call a search contract. The search contract was one of the key elements that allowed us to change the search engine seamlessly. Implementing item search in Vespa involved about 12 unique queries across various product channels.

Initially, incoming query traffic was served by Elasticsearch and shadowed to the Vespa item search. We were able to amplify traffic from the Go service to Vespa by a fractional amount and have tested traffic by up to 3x. Once we were confident about Vespa’s performance per each unique query, we began A/B testing the search relevance. To port the main search items query, it took about four A/B test iterations until the relevance of the results was satisfactory.

Metrics and Performance Testing

Monitoring and performance testing are vital for ensuring our search platform’s reliability and efficiency. Vespa’s built-in Prometheus metrics system offers detailed insights into every aspect of the search process, from query latency to indexing throughput. This visibility allows us to continuously monitor performance and identify potential bottlenecks before they impact our users by testing infrastructural or query workload changes in inactive regions.

One of the key benefits of Vespa is its ability to dynamically re-distribute content into nodes when content groups are changed or created. This allows us to optimise resource allocation and maintain consistent performance, even as traffic patterns fluctuate. During the migration, we implemented a rigorous performance testing regimen to ensure that Vespa could meet our stringent requirements. This testing involved simulating peak traffic loads, stress testing the indexing pipeline, and validating the accuracy and relevance of search results under various conditions.

The metrics we gathered during these tests were instrumental in fine-tuning our Vespa deployment. We identified the optimal configuration, adjusted our ranking algorithms to improve result relevance, and ensured our search platform could scale efficiently as our data and query volumes continue to grow. This proactive approach to monitoring and performance testing has been key to the success of our migration, ensuring that we can deliver fast, reliable, and relevant search results to our users.

Vespa team

The Vespa team deserves high praise for their exceptional contributions and unwavering commitment to the open-source community. Their dedication to developing Vespa (with its origins in Norway) stands as a testament to their expertise and the collaborative spirit that drives innovation. The team’s pragmatic approach to problem-solving, combined with their genuine willingness to offer support – particularly through their active presence on Slack and GitHub – has made Vespa the most innovative and fully-featured search engine and vector database. The Vespa team’s contributions and their dedication to fostering a community built on collaboration and shared success are truly invaluable.

Future

Today, we have 21 unique Vespa deployments across a diverse range of use cases, from item search and image retrieval to search suggestions, all provided by the Search Platform team. As we continue to refine and expand Vespa’s capabilities, our focus remains on enhancing the performance, scalability, and flexibility of our deployments to meet the evolving needs of our users.

Only a handful of features still rely on Elasticsearch, we aim to fully transition these to Vespa by the end of 2024. This transition is not just a technical migration – it’s a strategic step towards unifying our search infrastructure under one powerful platform that we can optimise and control end-to-end. By consolidating these features into Vespa, we expect improvements in efficiency, consistency, and innovation, allowing us to better support the growing demands of our customers. The future with Vespa looks bright, and we are dedicated to pushing the boundaries of what our search platform can achieve.

Vinted Engineering

These are the voyages of code tailors that help create Vinted