The Human Component of DevOps: Conway's Law

The Human Component of DevOps: Conway's Law
Image generated using RunwayML

I've been part of large and small DevOps implementations, none of them were particularly painful regardless of the size of the team. Everyone seemed open to change and seasoned about best practices in software development, simple and efficient solutions seemed to work perfectly in an elegant orchestrated puzzle where every piece fit in its space. But one project changed this.

I was part of a transformation project in a large retail bank where modern software development practices were not being followed and it seemed like the technical team was not aware of any of them, even though they had very senior engineers. It was a very complex setup with a team that combined 4 staff-augmentation companies with the bank's own staff and a large team of 200+ people. This has been, by far, the most painful DevOps implementation I've been part of, I don't usually have to convince engineers about the best approach for building software, it's well known and, to some extent, common sense, but it wasn't only that.

Things like having a simple git workflow and applying quality gates based on code-quality metrics seem to be good things to have embedded and automated into your software delivery pipeline, but not here, I faced a lot of resistance. Trunk-Based Development?, don't even mention it, it was "too dangerous".

It took over two weeks of workshops, discussions, negotiations, talks, explanations and a lot of patience to finally reach somewhere. But once everything was agreed:

  • Branch-per-environment strategy
  • Sonarqube to be used to assess code and block merges if quality was not up-to-the-mark
  • Iterative test coverage increases every sprint until 80% was reached

And many other items that are not worth going through.

We implemented a fully fledged CICD platform using GitlabCI, Sonarqube, Openshift and some other herbs in almost no time, we tested it with sample applications as well as small real application, it built and deployed to Openshift in minutes instead of hours, rolling it out was not as hard as we thought but as the teams started to use it on a day to day basis, we faced one more challenge:

The teams structure, needless to say we got hit by Conway's Law.

💡
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure -- Melvin Conway

The teams were just too big and we thought it was common sense for each team lead / product owner to slice them further by sub-domain or solution areas but this did not happen so the systems they produced were also big and bloated. Each team produced, in the end, their own monolith catering for their own domain.

To add to this, most of the product owners didn't have too much experience in Software Delivery and the tasks were not sliced or planned in a way that minimized developers overlap, so, we got each and every developer on the teams stepping on each other's hose in one way or another, leading to huge conflicts in the version control system, let's also remember they had to maintain one branch per each environment they deployed to, that is development, staging, pre-production and production. A total of 4 branches that had to be consistent and in sync.

Because the code base was big and the teams were also large, coordinating integration testing was a nightmare, stopping deployments to the development or staging environments meant to stop people from merging their pull requests. On a small team, producing a small microservice it's not a big deal, but on a larger team with some engineers working offshore, it's much more difficult to coordinate, adding one more complexity layer to an already complex process. Now, imagine if we were working on a hotfix that must be back-propagated from pre-production or production to the test branches and then to all of the feature branches. Can you see how a few apparent inoffensive bad decisions make everything go wrong?

Proper team organization and better tasks breakdown not only makes it easier to manage the whole program from a people perspective but also facilitates more efficient Software Development operations.

I don't like to say it but... we told you so...

Since the beginning of the workshops and discussions we were advocating for a simpler branching strategy becase software operations at large scale can already get quite complex and there's no need to add more complications with an overengineered branching approach. The same problems they feared when we mentioned "trunk based development" were the same problems they were having after some time using the approach they chose.

It seems counterintuitive but a complex people setup doesn't imply we also need an equally complicated tech setup. The simpler you can keep your operations, the better your teams will run.

After this project I joined another team building a system for encouraging internal collaboration inside the company and it was a very small team (3 frontend and 2 backend engineers) with a very lean DevOps setup (Staging / Production environments with Trunk Based Development) and the number of issues we've had in 4 months of development can be count with only one hand's fingers, actually half a hand.

If I had to redo it from scratch

Better teams design

I would stress a bit more on the overall architecture since the beginning and use it as an input for the organizational structure design, each team should be responsible for a specific area or service of the product and build microservices to implement the functionalities.

This alignment between organizational and architectural boundaries simplifies communication and enhances focus, leading to more maintainable and scalable systems.

Encourage cross-functional teams

To further simplify operations and enhance agility, teams should be cross-functional, each team should have their own developers, testers, devops and architects, empowered to manage their own services from design to deployment. This setup reduces dependencies and integration challenges, as teams can develop, test, and deploy independently of others while having periodic sync-ups to ensure the overall vision doesn't drift away

Better communication with the rest of the world

This is a very specific point to this project. The team building the mobile app (backends and apps) was isolated from the rest of the company, this means it was only development, product and overall delivery management. Infrastructure was handled by an external team sitting in a different floor with different requirements. This was also the case for security, acceptance testing and deployments to production, everything other than development was governed by other teams which made the delivery super painful. I would have enforced having a representation from all the teams we depended on into our team, defining a better communication interface with the rest of the technical teams within the company.

Simplified DevOps processes

We should have probably spent more time convincing the client's technical leadership to have a simpler approach and ran more complex workshops to showcase the kind of problems and complications that can arise with such complex strategies. Sometimes having less control and more flexibility in your processes leads to more efficient processes.

By redesigning the organizational structures to be more in harmony with the architecture of the software they develop, companies can create systems that are not only simpler and more efficient but also more aligned with business goals. This strategic alignment helps in minimizing complexities and maximizing the benefits of DevOps practices, leading to smoother operations and more resilient software systems. This approach not only addresses the direct implications of Conway's Law but also leverages it to enhance organizational effectiveness and software quality.