Team Topologies is a book that lays out an approach to organize teams so that they can effectively produce software. In this post, I’ll provide a summary of the book along with personal commentary based on my observations in the industry.
Book: Foreword
The book sets up the context around the importance of design and evolution of two areas - organization design and architecture of systems. This book intends to show the inter-relationship and impact of the roles of the engineering managers and architects. Over the years, software complexity and dependencies have increased. Managing cognitive load on teams and hence driving productivity with well designed team interactions is the goal of the book. Essentially this book revisits Conway’s law for a modern approach. Specific team designs and interaction models are proposed in the book to streamline delivery.
Book: Part 1: Teams as the means of delivery
Book: Chapter 1
The authors recommend that we should stop treating teams as a collection of interchangeable individuals. Individuals and teams are a complex ecosystem. Individuals are not “resources” that can be swapped out without affecting the balance (for better or worse) of a team.
Conway’s law states that a system architecture mirrors the communication structure of the organization. To get work done, engineers will try and take the shortest path. If it is not easy to work with another team across org boundaries, then often enough new layers are added in systems to isolate human conversations. In many products, the design is disconnected across various parts of a single application indicating different team ownership. This book advocates for an organization design that requires restricting cognitive load on a team with reduced and explicit conversations so that teams can be independent and effective.
Book: Chapter 2
The “Reverse Conway maneuver” is a strategy by which the organization chart is explicitly designed based on the desired system architecture. Need microservices; then have independent teams with individuals with all the required skills. Having a central database team will lead to a monolithic shared database, which will become a central blocker for changes.
Good software architecture principles are-
Loose coupling - components do not hold strong dependencies on others
High cohesion - components have clear bounded responsibilities to a strongly related domain
With these principles in mind, it should be possible to create teams with clear responsibilities. Communications (ie dependencies) within teams (and the architecture) will be reduced and explicit. With fixed domains, the cognitive load on each team is fixed allowing them to build expertise over the long term.
Organization leaders are typically Type A personalities. Getting leaders to shift ownership to others is a hard people problem.
Book: Chapter 3
Teams matter more than individuals. Teams of 5-9 individuals working on a single shared goal can be highly effective. This size is based on Dunbar’s Number and is the size of a group which can have high trust relationships.
Teams take a long time to become effective. After formation, it takes time to understand each other and become effective in the problem domain. Long-term decisions over longer horizons will be made by the team once it is established. The book recommends small sized but long lived teams. Each team owns a problem domain that is right sized from a cognitive load perspective. Well established teams can then set up Team APIs which are how other teams can interact with this team. This includes rituals, practices, processes around code, versioning, wikis, communication etc. As an example, we are building processes to capture architectural decisions. This helps us communicate changes across multiple teams.
Teams will need to evolve organically as changes are needed like architecture. A team will need to pick up experimental work as there is new research required into related areas. It is not clear how this should be managed, add to the existing team mandate, create a new team or split a team? These are non-trivial questions that require understanding the business environment.
Book: Part 2: Team Topologies that work for flow
Book: Chapter 5
This chapter covers the 4 team types along with the rationale behind their design.
Stream-aligned team - Scope is aligned to the flow of work from (usually) a segment of the business domain. Teams are empowered to build and deliver customer value quickly and independently without requiring hand-offs to other teams. This should allow the teams to achieve master, autonomy and purpose for the area they own. Most teams in an company should be stream aligned.
Enabling team - It is composed of specialists in a given technical domain and they help stream-aligned teams quickly become experts. An enabling team should be a short term dependency for a stream aligned team.
Complicated Subsystem team - This team owns a part of the system where significant mathematics/calculation/technical or specialist expertise is needed. With a specialist team, the cognitive load on the stream aligned team is reduced.
Platform team - This team provides internal services to reduce the load on stream aligned teams. A shared platform is used by multiple stream aligned teams. The focus is on providing a smaller number of services of high quality and maturity. Platforms abstract away lower levels of the stack like infrastructure, network, or cross cutting concerns like logging, metrics, etc.
In mid to large size organizations, larger logical team types can be composed of groups of these 4 team types. As an example, a logical platform team can be built of multiple smaller internal teams that are of other types like nested platform teams. We have various platform scopes; company wide or organization wide to promote reuse.
Other team types are-
Communities of practice or guilds which are individuals that group together across various different teams to share practices or information.
Architecture teams are recommended to be structured as virtual teams or as an enabling team and not a standalone group. Architecture decisions are not imposed on other teams in this model. Rather they are proposed in a collaborative fashion.
The book recommends various mechanisms to refractor traditional team types (like support teams, product teams) to these models.
This chapter also recommends a minimalistic platform design called - the thinnest viable platform. It cautions against the platform teams driving platform requirements without explicit product requirements and thus over-engineering. Instead the platform should always serve the needs of consuming applications and services.
One open question I have is how to achieve standardization or collaboration across various stream aligned teams so that the combined product is cohesive for a customer. Just having a common UI toolkit is not enough for a consistent experience. Similarly API designs are often fragmented if common patterns are not followed. Complete autonomy of stream-aligned teams has such cons, which are not addressed in the book.
Book: Chapter 6
For effective flow of teams it is important to remove hand-offs and empower smaller independent teams. It is recommended that software and system boundaries are aligned to the capabilities of a single team which makes ownership and sustainable evolution feasible. In practice I’ve seen teams owning many services. The services themselves are at various levels of maturity (early development -> sustaining -> sunsetting) and hence the work profile of teams is fairly complex to estimate.
Splitting monoliths into subsystems based on fracture planes is recommended. A fracture plane is a natural seam in the software system of which 8 were identified-
Business domain aka bounded context - Each bounded context can be a separate subsystem.
Regulatory compliance - Credit card processing systems, PII handling systems are usually separate from other systems that require fewer controls.
Change cadence - Some functionality can evolve faster based on requirements.
Team location - Code reviews across distributed time zones is not fun!
Risk - Core critical systems are usually separate from new experimental features.
Performance isolation - Performance requirements may vary for core customer facing experiences vs other non-core experiences.
Technology - unclear
User personas - Different feature sets are usually available for different personas and hence can be built separately.
Book: Part 3 Evolving Team Interactions for Innovation and Rapid Delivery
Book: Chapter 7
The 3 interaction models between teams are
Collaboration - working closely together for a defined period of time to discover new things (APIs, practices, technologies, etc.). Collaboration is chatty communication, which is required for new innovation and discovery. Both teams share responsibility and have higher cognitive load during this time.
X-as-a-Service - one team provides and one team consumes something “as a Service”. Here collaboration is minimal. The service boundary should be well documented and implemented to meet the needs. Support or extensibility should be available. Ownership boundaries are clear and cognitive load is low since the API should abstract away complex details.
Facilitation - one team helps and mentors another team to clear impediments. This is the primary model for an enabling team.
The team topologies and preferred interactions modes are captured below-
Over time, a team should work with other teams based on the typical interaction models. It should be possible to use awkwardness in team interactions to sense missing capabilities and misplaced boundaries. This feedback loop is critical for continuous evolution of the organization boundaries.
Collaboration is likely required to understand the APIs for a new service boundary. Thus collaboration is required to bootstrap a platform team.
Book: Chapter 8
There are certain triggers for the evolution of teams.
Software complexity for a single team - With success, products grow larger with new feature requirements. At some point a single team has too many tasks although the backlog is massive.
Delivery cadence is slower - Team velocity slows down over time. Output has slowed down due to tech debt or due to wait time on complex dependencies.
Multiple services rely on a large set of underlying services - Reuse of existing services and systems becomes more and more challenging over time. There may be too many compliance checks or cumbersome platform requirements.
Systems thinking needs to be applied to optimize for fast flow across the entire organization, and not just in small parts. With feedback loops and a culture of continuous change it should be possible to improve over time.
Book: Conclusion
Beyond the teams and interaction models, the authors acknowledge that there is a larger set of requirements for success.
A healthy organizational structure - individuals should be empowered and safe to effect change.
Good engineering practices - test-first design and development of all aspects of the system and implementation.
Healthy funding and financial practices - A strong business model
Clarity of business vision - short and long term goals with a clear mission.
Summary
The book is a fairly easy read. There are takeaways for sections and chapters as well that simplify the material well. I do like many ideas in the book, though not all concepts are directly applicable to the context I work in. Having said that, there is no perfect model to solve all complex problems. By adopting (and tweaking) these concepts to my organization, I think we can gain a lot of value. Fundamentally we need to bridge the distance between management and architecture and let both influence each other on an ongoing basis which will in itself improve efficiency. Just being thoughtful about these designs means that we will have a known rationale for a change and can then learn and improve on future decisions if things do not work out.
Book references
The Five Dysfunctions of a Team: A Leadership Fable was cited as a major influence on this book. I have two copies of this book and so it's about time that I read this.
The Devops Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations