[tech] Evolutionary system architecture
Earlier this year while I was on a work hiatus I wrote about evolutionary architecture primarily from the perspective of systems -
The basic premise of the post was that the architecture of software systems is similar to biological systems. With growth, evolution happens by splitting a single into specialized sub-systems.
As complexity grows it is inevitable that eventually a system needs to be broken up into smaller cooperating sub-systems. Because evolution. Each sub-system can also independently specialize and break up even further in the future.
Decomposing systems is not a guarantee to success. Complexity does not disappear, but should become manageable. Hence setting correct expectations with all stakeholders is essential. What and how to decompose and specialize is what architecture is all about.
I’m now revisiting some architecture designs that I helped start few years ago and the evolutionary aspect of systems is even more evident.
For one system, we need to design the next generation of the system to be able to handle a magnitude greater scale of requests. The first major generation has lasted many years and met most requirements set during the initial planning phase. As we operated the system, we had multiple learnings based on real world integrations and operational challenges. Multiple minor changes to the internal architecture were made over time. I’m also using the term system, rather than service since we operate multiple loosely coupled services as a holistic system. Each of those has changed independently based on its responsibilities, state stored and operational incidents. In addition to scale, we need to handle advanced requirements around sharding, resiliency, and other feature requirements. Some internal settings of the system are becoming configurable by clients.
In this project, the public API of the system has not changed much over the years. However internally there have been many changes. I always recommend spending more time designing a good public API. Internal implementation choices change frequently. I’ve always seen that a good API is minimalistic for some core functionality with the least configuration or options possible. Like an iceberg the visible layer hides a lot of complexity beneath the surface.
When designing the system in the first generation, we were aware of many concepts that we are looking to solve now. However those architecture goals were not a priority at the early stages since they would add undesired complexity. Now the requirements are validated based on feedback from customers and additional investments can be justified. Like engineering, the business side also needs time to understand impact after release of a new product or feature and get more clarity on the broader vision. Engineering does not work in isolation and for its own sake. Ultimately engineering needs to solve business requirements.
As specialized sub-systems are broken out of a single system, constraints are often broken. For example, having relational database transactions does not typically work for high-scale systems.
When designing sub-systems immense care should be taken to understand how the parts interact with and depend on each other. One of the most profound quotes I recently came across is-
A system is never the sum of its parts; it’s the product of their interactions.
- Russell Ackoff
This lecture explains the concept from a management perspective, but it applies to software architecture.
Each sub-system (or part) in a system must have a clearly defined role and interaction model with others. If the roles, data ownership, dependency graphs are not clear then the architecture becomes brittle and change is hard. Recoverability of data is another area that I’ve been working on recently which has exposed shortcomings in some systems. As the baseline for operational maturity increases it does become more expensive to build new software. This should be solved by platformization of common concerns across services.
External change is not the only driver for evolution. As technology changes quickly, internal frameworks and platforms that you depend on can quickly go out of date. Continuous investment is always needed to stay on top of changes. Try and choose the minimal set of dependencies to solve a problem. Be aware of the risk involved in adopting shiny new technologies which might not be supported for long, unless they provide a discrete business advantage.
Another major system re-architecture I’m involved with is driven primarily around team interactions. I’ll cover the forces driving that change in another post.