At the beginning of November a client asked me to join their Cloud Panel and talk about cloud transformation. This article is based on that presentation. You can find the slides on Slideshare.
This is the second of a two part series. The first one probes the questions around the organisation and effective collaboration. Here we explore how we can handle the rising complexity and manifest learnings as internal products using a platform team.
Remember: Efficiency
Previously, we argued that the reason to move into the public cloud is to reduce complexity. We want to focus on the things that make us stand out as a company. We want to innovate fast and build better products of higher quality. We want to experiment.
We cannot do that, if we are stuck maintaining and patching our own Kubernetes cluster all day.
The obvious option is to reduce complexity by replacing such non-essentials with products. E.g., we use Google’s Kubernetes Engine instead of maintaining our own Kubernetes.
This frees up resources and allows us to be more efficient.
The DevSecOps-Full-Stack-Rockstar-Team
Ok, so we are running on the cloud. We are building efficient engineering teams. Practices like Scrum suggest building autonomous teams. Teams, that can own their respective area or product.
Being self-reliant implies that we need a lot of skills in our teams. Let’s create a shopping list for a REST-ful service running on the Google Cloud Platform
- We are building an API. Thus, Kotlin, Ktor and related backend technology
- Containers? Sure! Kubernetes, Docker, Istio
- “DevOps”, off course. Gitlab Workflows for CI/CD, Terraform, Terratest, monitoring, tracing
- Storage: BigTable and Redis for caching
- Testing: Gatling or K6
- and let’s not forget security on every level
We could go on and on. Instead of focusing on the essentials, i.e., building the API, we end up knee-deep in side-projects.
“Uh, AWS released a new feature for CloudWatch. Let’s check that out…”
Autonomous teams are great, but
As much as I encourage having autonomous teams they come at a price. If we do not pay attention, we end up with a setup illustrated by the following diagram.
We see three teams: A, B and C. Each is staffed with end-to-end experts. Backend engineers, cloud engineers, security experts - all can be found in each team. And as a side-note: try finding all these experts in the current market!
What happens next should not come as a surprise.
Each team faces the same challenges:
- code needs to be built
- services need to be run
- data needs to be stored
- and so on.
Since each team is able to work on its own, they each come up with their own solution to each problem. One team uses MySQL to store their data, the other prefers PostgreSQL. One team uses Github Actions, while the other sets up Gitlab for CI/CD.
The challenge is right there. Different solutions to the same problem. Knowledge is not shared between the teams. Maintenance becomes a big concern. In the end, efficiency drops. Teams spend too much time tinkering with aspects that are not related to their product. The next diagram illustrates this.
Each team works on its product, but spends significant amount of time on other things.
What can we do?
Enter the platform team
Most organisations arrive at this point. Initially everything is fine. We have a single team of experts, but then we try to scale out to multiple teams. Now we need some way to reduce the complexity for everybody. This is where usually a platform team is introduced.
Platform teams consist of experts of the supporting technology. For example, a platform team may include Github Actions experts or security engineers. The goal is to take the proven solutions from the feature teams and offer these as platform products. The next illustration visualizes the idea.
Let’s explain using an example. The platform team offers mature buildpacks for the teams. In addition, the platform team provides a Gitlab instance and a set of well-maintained Gitlab pipeline templates. Each team relies on the CI/CD product provided by the platform team. We do not re-invent building software.
The same approach works for persistence, too. The platform team creates a set of secure Terraform modules for PostgreSQL and MongoDB. The teams reference these modules, reusing the knowledge baked into the modules.
We reduce complexity for every team, as illustrated by the following image.
Teams can focus on building great products. They do not spend significant amount on non-product tasks.
But, the non-product part is not reduced to zero. There are a couple of reasons for this. First, still the teams have to use, setup and integrate against the platform products. This does not go away magically. Secondly, we want the teams to work on non-product tasks. No, this is not a contradiction, as we will see in a couple of lines below.
Treating feature teams as customers
This sounds easy, but it is actually hard to get right. We should be aware of two things:
- Platform teams are not fix-it-fast tiger-teams
- and the feature teams are the platform team’s customers.
The tiger-team trap
The members of the platform team are experts in their fields. It is tempting to reach out to the platform team, whenever needed. This can be a political or power issue depending on the organisation. If the platform team members are hijacked for non-platform work, then platform development will suffer. And consequently the other feature teams that are waiting for new platform products.
Clear ownership can help here. The responsibilities of the platform team must be clear to everybody. The platform team is in that sense like any other feature team. The only difference is, that the platform team’s customers are internal. Their customers are only within the organisation.
Building for the customers
The platform team has a clear customer-producer relation. Feature teams use the platform products. Consider the next illustration.
The feature teams may have feature requests. They may have change requests. However, it is up to the platform team to plan, prioritize and implement their very own backlog. Stability in planing is as important to the platform team as it is to other feature teams. Introducing a platform team makes little sense, if the organisation cannot guarantee this working mode.
But there is also the fact that the platform teams builds products for its customers!
Didn’t we already discuss this? No. The platform team builds internal products for the feature teams. It must not be an ivory tower building abstractions and products that nobody wants or needs. If the feature teams don’t like the platform products, avoid using them or work around them, then the platform team must go back to the drawing board. They must include the feature teams in their planing and product design.
Again, the platform team treats the feature teams as we would treat any other customer.
The hidden 2-speed-IT
Now that we have established the way platform teams can work efficiently, let’s discuss a problem often encountered, when “special” teams are introduced.
The goal of a platform team is to build efficient platform products.
The technology around these products are usually modern and associated with “DevOps” culture and mentality. E.g., the platform teams works on GitHub Actions and Serverless deployment pipelines. These tools tend to be in the spotlight of developer attention.
Compare this to a feature team. They might be using SpringBoot and React. Great frameworks, but nothing that will break the Twitter timeline - at least not at the time of this writing. FOMO is a thing.
Everybody wants to work with Kubernetes, because having that in a CV is a career booster, at the moment. But these exciting tools are owned by the platform team. Again, we end up with a form of 2-speed-IT.
The platform team owns the cool new technology and the feature team is trapped in “only” delivering business functionality.
Innovation on all levels
We need not end up with a toxic 2-speed-IT setup. First of all, the platform team does not arise out of nothing. Let’s consider the following illustration.
When we start our cloud journey, we only have feature teams. We do not know what our best practices will be. We have to try different approaches to the same challenges. Only after some time, a couple of months or so, we know what our approach is and then we introduce a platform team. So, the platform team is not an alien part of our organisation. It arises as part of our development.
The other thing that will help avoiding a 2-speed-IT is to allow innovation on all levels. Again, let’s consider an illustration.
Suppose, the platform team offers “CI/CD products”. These could be buildpacks and Gitlab templates for Node and Kotlin.
Team A wants and needs Go-lang for their development.
Instead of waiting for the platform team, team A goes ahead and builds what they need. They create a buildpack and a CI/CD template and continue developing. Once they are content with their solution they can offer it to the platform team. Finally, the platform team decides if the want to offer Go-lang tooling as part of their platform. They decide if they want to take ownership.
The same inner-source approach can be applied to all platform products. If a team needs a change or extension, then they are allowed to drop a pull-request to the platform team. Everybody is allowed to innovate.
Conclusion
Transforming an IT organisation onto the public cloud can be a daunting task. It involves architectural and technological changes. But even more critical: changes to culture, organisation and processes.
The good news is, that we do not have to transform in a Big Bang. We can adopt the cloud step-by-step as illustrated by the next diagram.
We can discuss on every level if we need further transformation. And only then execute the transformation step-by-step.
In the end, everything we do is about efficiency. And that means we need to keep complexity needs in check. We need mechanisms like a platform team to reduce accidental, superfluous complexity.
As we have discussed in these two short articles: people are key.
If we want change, then we need to include everybody. We should be open to ideas and insights. Only then will we improve and succeed. Adopting an open or hidden 2-speed-IT approach will prove to be a bottleneck and should be avoided. If we are transparent and let the best ideas win, then everybody is engaged.
We end up with a better organisation on every level.