How We Approach Engineering at Vinted
This blog post is about how we approach Engineering at Vinted. It’s a high-level overview, which describes how we think and what principles we apply. By design, it doesn’t go into details, so we would love to hear from readers of this blog on social media. Which parts have you found the most interesting? What would you like us to expand on in the future blog posts?
Changes
Let’s start with Vinted’s long-term mission. It informs everything we do. We want to make second-hand the first choice worldwide. Our mission won’t change.
To achieve this mission, we’ve built a marketplace. In this marketplace, our users can sell and buy items, primarily fashion ones right now.
A lot of things won’t change. In our business, our members will always want to feel safe while transacting. Our sellers will want to sell fast. Our buyers will want vast selection, lower prices, and faster delivery.
But the environment will change. Our company will. Our competitors will. Various technologies, like AI, augmented and virtual reality will affect us in unpredictable ways. Our members will pick products and services which, using those changes, continue to ensure safety, ability to sell, bigger selection, lower prices, and faster delivery.
When changes happen, one thing will matter over every other one. How fast we are. How quickly we can go from understanding the changing circumstances to delivering value to our members.
Sustainable Speed
Speed is a competitive advantage. Fast companies can make mistakes and quickly fix them. Fast companies can adopt the ever-changing technologies. Fast companies can match their competitors blow for blow. Slow ones lose.
But the importance of speed doesn’t mean that we value speed over everything else or we want “move fast and break things”. That’s not our approach. Breaking things doesn’t mean that we’re actually delivering value faster to our members. Most of the time, quite the opposite.
There are a lot of companies that value speed. Who wouldn’t? Those companies want speed. But in their quest for speed, they create bugs, things to rework, technical debt and unhappy people.
Those bugs and debt will need to be fixed eventually. When they are fixed later, they cost more to fix and slow those companies down.
The unhappy people become unhappier. Sometimes they leave. This slows companies down.
We want Vinted to succeed long-term. We have long-term ambitions. We don’t desire short-term speed. We don’t want bugs and debt, which we’ll have to fix a year from now. We don’t want unhappy people, who leave.
We want the opposite of short-term speed. We want sustainable speed. We’re even willing to sacrifice speed today to be faster long-term. We want to deliver value to our members fast, and even faster a year from now.
Our desire for “sustainable speed” is not unique. For example, Facebook’s motto is move fast with stable infra. There are various ways to achieve this goal. Our hope is that you’ll find something to learn in our approach.
Lean Software Development
We apply Lean Software Development principles to achieve sustainable speed.
You can read more about them in one of the books by Mary Poppendieck and Tom Poppendieck. A lot of what follows is based on their book Lean Software Development: An Agile Toolkit.
There are two schools of thought when it comes to transforming ideas into products. The deterministic school starts by creating a complete product definition and then creates a realization of that definition. The empirical school begins with a high-level product concept and then establishes well-defined feedback loops that adjust activities to develop an optimal interpretation of the concept.
Lean Software Development belongs to the empirical school. A development process that deals with a changing environment should be an empirical process because it provides the best-known approach for adapting to change. Software by its very nature should be designed to adapt to change both during initial development and over its lifecycle.
Lean Software Development principles are not new. They’ve been applied successfully in various industries over many years. Their story started 70 years ago. In the 50s, Toyota engineers developed an integrated socio-technical system. It was called the Toyota Production System. Later, in the 90s, it was generalized and became Lean Manufacturing (or “Lean”). In 2003, the Lean Manufacturing principles were translated into principles applying to software development.
Here are the seven Lean Software Development (LSD) principles, which are based on similar seven Lean Manufacturing principles:
Eliminate waste. Anything that does not add customer value, causes uneveness or overburden is waste. Any delay that keeps customers from getting value is waste. Examples of waste - non-value adding processes, meetings, queues, inventory, extra features, unused code. The key is to learn how to recognize waste. Examples of us eliminating waste: not keeping features after AB tests, fixing MySQL queries that are significantly slower or bigger than other queries.
Build quality in. We prefer to prevent bugs, crashes, errors than to fix them afterward. The earlier we find a bug, the faster we can fix it. A bug, discovered a year from when it was created, is difficult to understand and fix. It’s better to build quality right at the start. We’ve simplified our bug tracking to that end - putting defects into a tracking system is worse than avoiding creating defects in the first place. Example of us building quality in: writing unit tests.
Create knowledge. It’s critical to continue learning. Instead of starting a project assuming we already know everything, we should keep learning during the project. Those learning loops should be fast. To that end, we do retros, improve our processes. When abnormalities happen, we search for the root cause, write post-mortems and share them company-wide. We should not stay locked-in into a certain “standard” process, but continue improving.
Defer commitment. Predictions don’t create predictability. No amount of trying to make predictions more accurate is going to do much good. There are well-proven ways to create reliable outcomes even if we cannot start with accurate predictions. We should stop acting as if our predictions of the future are fact rather than a forecast. We need to reduce our response time so we can respond correctly to events as they unfold. We already do that to some extent by working in short sprints. We maintain our future options by working with modular and service-oriented design. We do plan to gain insight and knowledge while understanding that plans change.
Deliver fast. Companies that compete on the basis of time have a significant advantage over their competitors. They have eliminated a considerable amount of waste, and waste costs money. They have meager defect rates. Sustainable speed is impossible without superb quality. They develop a deep customer understanding. They are so fast that they can afford to take an experimental approach to product development, trying new ideas and learning what works. Examples of when we deliver fast: running AB tests and continuously deploying backend/web.
Respect people. People are valued and trusted to self-organize with only high-level goals. People that work on something are the best people to make decisions on that something. Our whole organization functions with this principle in mind, with the overarching OKRs framework giving us high-level goals and leaving the execution to the people.
Optimize the whole. To optimize the entire value stream, not just the individual parts. When too many things are measured, the real goal of the effort gets lost, there is no guidance for making tradeoffs among surrogate measurements. If we optimize the one thing that really matters, the other numbers will take care of themselves.
Present
We practice sustainable speed in multiple different ways. A couple of examples follow.
In the first half of 2018, the Engineering organization grew from 30 people to 41. We could’ve hired more people, but didn’t. We believe that there’s such thing as growing too fast. We want to onboard new people well, help them become familiar with our culture and practices and productive members of our team. We see growing our team size more than 2x per year as risky, potentially negatively impacting our productivity and culture.
We build new teams with the same mindset. New people are hired as part of an existing healthy team. They get familiar with our culture and practices. When the team reaches a size of two teams, the team is split into two. We try to split the team as equally as possible, based on seniority and time spent at Vinted. We don’t want to have “stronger” and “weaker” teams. We want every team to stay fast.
Some companies are averse to hiring junior people. We believe that only by hiring and growing the next generation, we can stay fast long-term. Junior people bring energy and new ideas. While the more senior people help them grow, juniors help seniors grow too. We want a right balance of junior and senior people in our team. As long as every junior has someone to mentor him, the balance is achieved.
We put as much growth as possible into the hands of the person. We respect them to know how they want to grow. We share high-level goals and (by design) general guidelines, leaving as much leeway as possible to the people themselves and their leads.
We follow Site Reliability Engineering (SRE) practices, starting with our infrastructure teams. One of the principles exemplifies sustainable speed very well. The SRE-operational load should never be over 50%. That means if the SRE team is spending more than half of their time fire-fighting, something is wrong. At least half of their time, SREs should be working on automation and improving reliability.
As we’ve learned from SRE practices, we look at our uptime (and other quality metrics) as a budget. We’re willing to spend some of that budget to create knowledge. The long-term is more important than the short-term.
We have a small technology stack. While introducing more new technologies can help in the short-term, it would be damaging in the long-term. Bigger technology stack requires more maintenance and would make jumping between projects/areas slower. We’re not afraid to add technologies to our stack when truly needed, for example - using Python for Deep Learning service. But in the end, we stay pragmatic and focus on the long-term sustainable speed.
We’re setting a high bar for a lot of our quality metrics (bugs, crashes, errors, performance, security). It forces us to fix defects today. Correcting defects today is cheaper than fixing them a year from now. We don’t want to be overrun by defects and slow down. We’ll continue increasing the quality bar in the future too.
It must be evident by now - we don’t want death marches. We don’t want people to work 100 hours week. It’s not worth the cost to finish a project a week or two earlier if it means dealing with bugs afterward and people leaving.
We respect people and believe that the people doing the work should make the decisions about the work they’re doing. No one asked our Head of Engineering, the humble author of this post, about using Python for Deep Learning service. And that’s how it should be. Our leaders are here to help, to show direction, but not to make every decision.
Fin
It’s likely that our platform will continue growing in the number of members we have and the number of transactions they make. It’s just as likely that our Engineering organization will keep growing as well. We’ll need to continue improving our infrastructure. We’ll want to invest in new product directions because more of them will start making sense with the growth of our member-base. We’ll do all that with sustainable speed in mind.