Identify where you are spending low-value time on repetitive tasks -- code deploys, disk space cleanup, process restarts -- automate those first, and reinvest the time freed up to work on further orchestration. While orchestration code needs to be thought through carefully and can have bugs and unforeseen consequences, you will find that as more of your system is provisioned and adapts through orchestration, fewer issues will arise.
As you get more orchestration in place, you can look for tasks you may or may not set alerts for because it's too much work for one person to stay on top of. Removing unused capacity, for example, is often a quarterly process when done manually, but if you have automation configured you can, for example, check all your VMs to see which have not been logged into in a certain amount of time, and snapshot and stop them according to a predefined policy. This frees up capacity, which then frees up time in the never-ending search for spare capacity.
Not all orchestration has to happen without human intervention -- you might require human intervention for major orchestration events -- but consider whether it's easier and more repeatable to answer an SMS that says "DC1 is down, shift services to DC2 [Y/N]" and then have the rest happen via predefined automation, or to execute a manual process to perform the same task.
Orchestration can also be made proactive; if you have your entire topology monitored you can use automation to shift capacity or make routing decisions well before an issue is encountered. If on Monday mornings you get a spike of ERP system usage, then proactive orchestration can add more application servers ahead of time instead of waiting for performance thresholds to be crossed.
Network self-healing has been a "nice to have" goal for many years, but increasing data center complexity is making it more necessary and increasing sophistication in tooling is making it much more achievable.
Sign up for CIO Asia eNewsletters.