When it comes to budgeting for cloud software, it's important to have some solid data about the cost of deploying a "zero-feature" update, the likelihood of encountering latent bugs, and the level of effort required for simple developer overhead and housekeeping. While there's some good data and solid advice out there from the Standish Group, as I mentioned in a recent article, I haven't seen any data that's particularly modern or really focused on the harsh realities of cloud software development.
Why can't we just extrapolate statistics from 50 years of on-premises software development? Cloud development really is different:
- It's loosely coupled. Web services act as components. That loose coupling is a huge benefit, but it means that components you rely on may evolve without you knowing about it. All of a sudden, there are new, often unspecified behaviors that, though they may not be bugs, certainly will contribute to them.
- Cloud code is multilingual. It's not unusual for a single application to leverage four or more languages. This means there's no tool for comprehensive debugging in your own application, let alone the other Web services you may depend on.
- Cloud code tends to be poorly documented, as I wrote recently. In fact, the Clean Code guys actually advocate no comments at all.
- Logging and troubleshooting data is skimpy, and typically has to be enabled for (brief) dedicated periods.
- Finally, the system is a moving target, not a fixed one. Rules and workflows seem to evolve endlessly. System administrators can change thresholds, constraints and allowable values in ways that make code misbehave.
What this means is that deploying a "zero-functionality" release just adding some debug statements, for example can trip across a lot of bugs, meaning hours of hilarity for your developers.
While the generalities of this article hold true for almost any cloud platform, the specifics here are based on Salesforce.com experience. I welcome commentary and amendments to this content based on Amazon Web Services, Microsoft Azure, Google and other cloud application development experience.
What's the Probability of a Bug Evolving?
The idea here is that, at T0, the system runs properly and all unit tests pass. At T1 thru TN, a sys admin changes the configuration of the system(s) that may cause new behaviors to come in. What are the relevant configuration changes? Sys admins can do a surprisingly number of things that provoke system issues, even when there are no changes to code, including modifications to the following:
- Field constraints
- Picklist values
- Record types
- Page layouts
- Workflow thresholds and formulae
- Workflow field updates
- Validation rules
- Lead and case routing rules
- Formula calculations
- Field permissions
- Object permissions
- User roles
- User profiles
- User groups
- Custom Settings
- iFrame "drop ins"
- Creation of new objects
- Installation or upgrade of add-ins (installed packages)
Sign up for CIO Asia eNewsletters.