The adoption of microservices as an architectural style has allowed organizations to realize many of the purported benefits: separation of concerns, data independence, and faster velocity, to name a few. While it’s easy to apply the microservices style to green field applications and systems, how do you transition from a single monolithic application to a microservices-based architecture?
How Did We Get Here?
Tapjoy is not unlike many other startups, in that our application had grown organically over time from one that served a fairly narrow business function, to one that had many other loosely related business functions and capabilities bolted on along the way. When you are constantly working to ship new products and features as fast as possible, it’s not uncommon to step back one day and realize that your application has become a monolith: a single, brittle, difficult to manage beast that is all encompassing.
As we started to hit the pain points associated with a monolith, we set out on an effort to decompose our monolith into a service oriented architecture. What follows is a high level approach that we took in our decomposition efforts, highlighting some best practices and lessons learned along the way.
Identify Services For Extraction
The first step in any decomposition effort is to perform an audit of all of the current business capabilities of your application in order to identify potential services that can be extracted. This is an art more than a science, as it largely depends on the specifics of your application and domain model, but we found that using Bounded Contexts from Domain Driven Design as a guide served us well. The goal is to extract small sets of related model objects that form a logical part of your system into a microservice. Unfortunately, what you’ll often find is that, over time, the model objects in your monolithic application start depending on more and more of the other model objects directly, and teasing these apart to define clear boundaries can be difficult. Inevitably there will be cases where an entity in one microservice will need to refer to another entity that now lives across a service boundary in another microservice, and in general that’s ok. You’ll want to try to keep these interactions to a minimum to ensure your microservices aren’t too chatty, and prone to cascading failures.
Orchestration Layer And Shared Libraries
Before extracting and standing up any new microservices, we put a significant amount of time and effort into designing and developing some core libraries which allow us to not only define microservices in a predictable and repeatable way, but also to consume services easily as well. The foundation of this effort was defining our microservice APIs via JSON Hyper Schema documents. By having a microservice definition that is machine readable, we were able to build libraries which allow us to produce and consume microservices without having to repeat a lot of boilerplate code each time we needed a new microservice. You can read more about those efforts here.
In addition to building new libraries that facilitate producing and consuming services, there were several foundational type things in the monolith that would need to be extracted into a shared library to be shared across all microservices. Think cross-cutting concern things such as database base classes, caching mixins, constants, etc. You obviously don’t want to copy these types of things multiple times into each new microservice application, so extracting these into a shared library would be necessary as well. We tried to keep these types of shared libraries to a bare minimum, pushing as much into services as humanly possible.
Extract And Test Services
With the infrastructure pieces in place, we were ready to start extracting services from our monolith. Our goal was to do a “bug-for-bug” extraction, meaning, the new service should behave exactly as the old code did, bugs/inefficiencies and all. We took this approach so that we could easily identify if we had broken anything during the extraction or not. If we did significant refactoring along with the extraction, there would be too many moving pieces to know what exactly changed the behavior.
There were many complications performing the extractions, but one of the primary ones was the pervasive use of constants within the monolith. The constants that were relevant to a microservice being extracted would still be needed by consuming clients. In order to avoid duplicating these constants across microservice boundaries, we defined “metadata” resources in the microservice’s JSON Hyper-Schema, and exposed these constants as metadata resources in the microservice API. This allowed clients to query this metadata and cache it locally in memory.
After defining the JSON Hyper-Schema for our new microservice, extracting the code into a new independent application, and completing unit testing, we deployed the new microservice to production on an isolated server, and ran A/B tests of the old code path vs. the new code path in order to verify that the new results matched the old results exactly. The folks at GitHub have a nice library (if you’re using Ruby) called dat-science for doing these sorts of experiments. With dat-science, you can run the experiment on some percentage of the traffic and compare the results. While this is a nice approach, unfortunately it really only works for testing read operations, as you obviously can’t (easily) test simultaneous write operations. But even by just testing read operations, we were able to shake out bugs in configuration, caching, performance, and many other things, simply by having the new microservice being exercised by live production traffic.
Integrate Extracted Services
With the A/B testing of our read operations completed, we then proceeded to roll out the actual integration of the old monolith application with the new microservice. If the surface area of the monolith affected by the models that were extracted is small enough, you can likely get away with a “big bang” release, where you refactor all direct references to the models with their service call counterpart, and remove the old code. For us, the more common case was that the models being extracted were used pervasively within the monolith, and a big bang release would be too risky. In this case, we would stage the refactoring out along “facets”, or logical parts of the monolith application that served a specific business function. These releases would be careful and deliberate, making sure that each facet is fully vetted before moving onto the next one. Once all of the facets had been refactored and released, we would finally remove the models and any supporting code from the monolith, and lather, rinse, repeat for other service extractions.
As you can see in the last two sections, we rolled out the changes introducing services very carefully, and with a lot of verification at every step along the way. Inevitably some subtle/minor bugs slipped through here and there, but overall, using this methodical approach the rollout was very smooth.
The move to a service oriented architecture means that there are many more applications in play, and for a developer working on a specific business function, they may need to interact with many different discrete services to get their work done. In order to avoid the problem of developers having to worry about each of the individual services, and keeping them up to date, we developed a simple utility that would manage all of the microservices with a few simple commands. This can be something as fancy as a virtual machine or Docker images, or as simple as a script or a master Foreman Procfile.
The Future Looks Bright
While we are still in the process of our decomposition efforts, we have already started to realize the benefits of a service oriented architecture for the handful of services we have managed to extract thus far. In future blog posts, we’ll cover some other aspects of running a service oriented architecture at scale in a cloud environment, so stay tuned for more!
Think this is interesting? We're hiring! See our current openings here.