Platform or Function-as-a-Service (FaaS) teams are the perpetual go-betweens of the enterprise IT world. They provide a platform that is used by other applications or services within the IT infrastructure and therefore having many more dependent end-users than just those of a single application team. If platform teams are successful then they can try to scale by adding people, adding hardware and adding bandwidth which might provide relief to success but they cannot always easily change and adapt their architecture or working model to account while their service is ‘in the air’. When there is pressure to keep delivering the service then there is less and less incentive to make changes that introduce uncertainty and potential downtime. Therefore these teams can quickly become victims of their own success. Platform teams are grateful to have customers but can become paralysed by their inability to fix any of their longer term problems because of the risk associated with doing it.
So when we come to introducing change to platform teams how closely is both architecture and culture important to the success?
Also how do we approach change in the face of Agile, Scrum and CI/CD. How do we use up-to-date frameworks and technologies that could break everything?
The Product vs the Platform
So what is a platform? Let’s start with a definition:
A platform team builds a product that is used by other teams to build or enable their products.
Examples of platform teams could be a centralised database management team (which provides databases and services), an ETL service team (which provides data migration or data quality services) or an Identity and Access Management team (which provides authentication or authorisation models for teams).
As we have seen, these teams could have a different sets of requirements in comparison with application teams – and these teams can sometimes have a harder job of thinking themselves working in a more Agile manner, or using Scrum or even integrating CI/CD principles and other tools into their delivery mechanism. This comes about through operational overhead and the pressure to deliver to their clients but also through architectural decisions made even years earlier.
Platform teams exist because of architectural as well as business decisions made in the organisation to bring about the centralisation and retention of specific knowledge, provide licence-cost reduction through centralisation or to increase the application of enterprise-wide standards in use of technology. They provide a good next step for any enterprise seeking to create a specific flavour of service in their organisation.
A well-established platform team may become to see themselves as living outside of the norms of the rest of the organisation. They create a service which they are responsible for and therefore they are masters of their own destiny. This happens for many reasons but it could be because of legacy architectural decisions, it could be because the team doesn’t think of themselves as developers or maintainers of a system and therefore more as “infrastructure people”. However any team that produces a consumable service and uses a flexible configuration to do this (in the form of data, scripts or even schedules) should firstly seek to control that configuration through source control, seek to test it for quality and functionality (automatically) and also automatically deploy this configuration to the target systems.
In this document we’ll discuss the main points to address in understand what it is a platform team is trying to achieve and how we can view a platform as a software product that needs controlling and automating in the same way that any application does.
Before you can discuss any technical solutions for delivering the product it’s vital to define what the product is. Therefore:
- Know your product. It’s important to have a clear product vision which defines your platform or FaaS design. Also define the service which your customers will consume and make sure everyone in the team knows it.
Once you have your product defined then decide how we will know if it changes:
- Clearly define how you’re going to measure improvements – specify a manner in which you can derive benefit from implementing Agile, Scrum or CI/CD techniques.
This could be by counting incidents, outages, mean-time-to-recovery (MTTR) or any other quantitive measures that will help you get a feel for any improvement. Thirdly:
- Ensure you know the difference between Automation of your platform and Automation in your platform. If your product is a self-service portal for customers then that is an automated platform (automation is in your platform) but if you patch this product manually then your platform is not automated (your platform deployment is not automated and could be unreliable).
Next step – go for it:
- Don’t delay in starting! Start somewhere, start small and keep moving in every sprint.
It’s important not to make big changes straight away but make changes every sprint and keep dedicated to improving the platform through investing in it.
When is a platform a platform?
If you still have difficulty imaging your platform team as a product then consider one of the many consumer services we consume such as Spotify, Apple Music, Netflix or Amazon Prime. These are some of the many complex music and television delivery services which we consume via apps and smart devices. These are of course enormously complex and scalable software systems which require continuous maintenance and upgrading. These are platforms with millions of customers and yet they can roll out changes immediately and safely across thousands of servers. How do they do this? Organisation, automation and architecture and dedicated professionals who are rightfully proud of the infrastructure they provide. So how do you get that feeling of ownership and pride in your platform teams?
Know your customer
Platform teams are delivering a service to one or more internal parties who themselves will have consumers. Successful platform teams can come under a large amount of pressure in terms of ops overheads – they need to service their customers needs and requests. Consequently successful platform teams find less time to invest time in improving their own service.
While it’s always important to ensure that customers are happy right now it’s also important to improve the service towards the customer in the future. If there is no time and effort investment in the platform itself then the service itself degrades and customers become less happy.
Sometimes then it’s important for platform teams to say to customers “You need to be patient with us while we upgrade our service, we think you’ll be happy with the results” and then provide proof (through measurements) that things have been good and are getting better.
But we don’t need to change…
Here are some typical reactions that a platform team when challenged to improve their service through changing their way or work or through automation.
“We already have automation” – Remember not to confuse your product with your platform. Just because you have a self-service solution for your customers this doesn’t mean that your platform is automated. When you make changes to your platform do you have confidence that they will always work? Do you keep your configuration and scripts in source control? Do you run quality checks on them as part of a Continuous Integration process? Do you run unit tests against these changes? Can you automatically deploy to your infrastructure with your intended changes? If you can answer Yes to those questions then congratulations you’re already at a good level of CI/CD with your platform. If not then think again about what it means to automate your Platform.
“We are too busy with ops” – Ok but what if you didn’t have to be as busy with ops? What are the causes of your ops overheads? Have you analysed the root cause of the overheads? Do you know where you can automate these overheads away? Can automation of deployment or testing help? If you don’t analyse your issues for patterns then you won’t be able to understand, improve and grow your platform.
“We have a big migration project that we need to do now. After that perhaps we have time for automation” – Your platform has to grow and change over time, customers make new demands upon it and you want to serve them as best you can. If you decide to prioritise new features over stabilising and automating your existing platform then you are going to build existing technical debt into the new system. This is an avoidable disaster in waiting because if you are adding a new chunk of functionality without automation baked-in then you actually risk increasing your ops overheads rather than reducing them.
“Our customers put us under too much pressure” – Well what will your customers say when you’ve added more functionality but the quality of that service is no better than your existing service? Your customers will thank you in six month’s time if you have not only improved you baseline service efficiency but also added new features which are automatically testing and deployed at a high level of quality.
“We don’t have the skills to do this now” – Then find the time to grow the skills – prioritise this for every sprint and ensure that you value the skills you have in your team.
“We don’t have the tools to do this now” – The tools are there for you if you want them, all you need to do is ask for them or even just find them yourself and start doing it.
“Ok but perhaps we just don’t want to change.. we like things the way they are” – Then ask yourself what happens if you don’t continuously improve your service. Is it going to get better by itself? If you are happy are your customers happy? Do you talk to them regularly and do they give you honest answers? Is it because you are comfortable with the way of working or happy with the way things are going?
These questions can be difficult to answer and sometimes they can be difficult to ask. If you’d like to help your platform team improve then make sure you have the necessary tools. Some people aren’t comfortable with automation because essentially it could mean less work for them. Automation is mainly however about improving the quality of the product and the reliability of the service. If a person is invested in the idea of the product then it’s a simpler proposition.
Scrum vs Continuous Delivery
CI/CD requires a conscious effort on the part of the team to commit to improvement. It takes time and energy to make it happen and that energy needs to come from the whole team and not just a part of it.
Scrum and Agile can help solve organisational challenges in the face of complex technical challenges. By combining these powerful principles and methods you can improve the customer experience for your platform and improve the feeling of ownership within the team to the product.
Platform products might not be in-and-of-themselves interesting or ‘sexy’ products but they often exist for sound business reasons which aren’t necessarily appreciated by those who consume them or even those who work on them. We have discussed how platform teams can motivate themselves to keep improving their platform and service for the benefit of the wider organisation.
I’m interested to hear about thoughts and specific challenges which organisations have encountered during the implementation of continuous delivery in extant teams. Note that I didn’t mention the word ‘cloud’ once but we can go there too..