Goldilocks SaaS control planes
I find that reading the Building Multi-Tenant Architectures book leads to a lot of nodding in agreement, with the odd raised eyebrow. One moment that struck me relatively early on is that Tod Golding is not letting the reader get carried away with the features of the SaaS system they want to build – the control plane of the system is fundamental to actually having a sustainable product. We’re in violent agreement there. What led to the raised eyebrow is the way in which the reader easily can go down a rabbit hole of overshooting the non-functionals of the control plane by building it with the same technology and mindset as the application plane itself.
Of course, there are cases where the control plane and its various components deals with massive volumes or has challenging latency requirements. Those cases will require a carefully-crafted high-performance control plane, sure… Most likely, there are pockets of demanding functionality in the control plane (e.g., the hot path through identity management, application observability, etc.), but most of the control plane deals with relatively small volumes of data with infrequent changes. Not everyone intents to scale to the heights of large-scale business-to-consumer (B2C) SaaS, so building a control plane implementation that meets those needs out of the box is gross overkill.
Now, there needs to be a control plane, and the handoff between control and application planes needs to be well-enough defined to allow implementation changes in the future… And that’s probably it (until the system grows). The initial implementation doesn’t need to be built in the highest-performance programming language, use the fanciest storage technology when the users and tenant volumes don’t justify them in the control plane. In fact, early on, I would argue that simplicity and diagnosability are the values to cultivate above all.
As an example, for the entirety of my involvement in the SaaS rapid solution delivery platform at my last job, we were able to make do with human-readable files in a Git repo and some scripting as its main control plane. Heretical as that may sound, it was enough for current and projected scale. Was it fancy? No… But the desired state of the system was right there for the entire team to see when there were problems, plus the pull requests on the repo gave us relatively sophisticated review and historical auditing functionality without major up-front software development. I wouldn’t necessarily recommend the approach for everyone, but it does illustrate that you can often meet control plane your needs without spending a colossal amount of money or completely capitulating to a third party’s system (a.k.a., “do what the cloud provider tells us to do”).