Foundation Topics
The Well-Architected Framework
As previously mentioned, AWS, Microsoft Azure, and Google Cloud all have their own well-architected frameworks (WAFs) as guidance, broken up into essential cat-egories. I strongly recommend AWS’s guidance for workload deployment (and for preparing for the AWS Certified Solutions Architect – Associate exam; exam questions are based on the Reliability, Security, Performance Efficiency, and Cost Optimization pillars). AWS is in constant contact with its customers to evaluate what they are currently doing in the cloud, and how they are doing it. Customers are always asking for features and changes to be added, and AWS and other public cloud providers are happy to oblige; after all, they want to retain their customers and keep them happy.
New products and services may offer improvements, but only if they are the right fit for your workload and your business needs and requirements. Decisions should be based on the six pillars of the AWS Well-Architected Framework: Security, Reliability, Performance Efficiency, Cost Optimization, Operational Excellence, and Sustainability. Evaluating these pillars is necessary for both pre-deployment architecture design and during your workload solution’s lifecycle. Each of the WAF pillars is a subdiscipline of systems engineering in itself. Security engineering, reli-ability engineering, operational engineering, performance engineering, and cost and sustainability engineering are all areas of concern that customers need to keep on top of.
The AWS Well-Architected Framework has a number of relevant questions for each pillar that each customer should consider (see Figure 2-1). The end goal is to help you understand both the pros and cons of any decisions that you make when architecting workload successfully in the AWS cloud. The Well-Architected Framework contains best practices for designing reliable, secure, efficient, and cost-effective systems; however, you must carefully consider each best practice offered to see whether it applies. Listed best practices are suggestions, not decrees; final decisions are always left to each organization.
Figure 2-1 AWS Well-Architected Framework Security Pillar Questions
Before AWS launches a new cloud service, a focus on the desired operational excel-lence of the proposed service is discussed based on the overall requirements and priorities for the new service and the required business outcomes. There are certain aspects of operational excellence that are considered at the very start of the prepare phase, and there are continual tasks performed during the operation and manage-ment of the workload during its lifetime. However, one common thread is woven throughout the operational excellence pillar: performing detailed monitoring of all aspects of each workload during the prepare, operate, and evolve phases. There are four best practice areas for achieving and maintaining operational excellence:
- Organization: Completely understand the entire workload, team responsibili-ties, and defined business goals for business success for the new workload.
- Prepare: Understand the workload to be built, and closely monitor expected behaviors of the workload’s internal state and external components.
- Operate: Each workload’s success is measured by achieving both business and customer outcomes.
- Evolve: Continuously improve operations over time through learning and sharing the knowledge learned across all technical teams.
After operation excellence has been addressed, the next goal is to make each workload as secure as possible. Once security at all layers has been consid-ered and planned for, workload reliability is next addressed. Next up is the performance efficiency of the workload; performance should be as fast as required, based on the business requirements. How do you know when there is a security, reliability, performance, or operational issue? To paraphrase: “There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.” Ultimately, the answer to what we don’t know always comes back to proactive monitoring. Without monitoring at every level of the workload, technical teams will not be able to monitor the operational health of each workload and won’t have enough information to be able to solve existing and unexpected problems. Let’s look at the goals of each pillar of the AWS Well-Architected Framework in brief.
Operational Excellence Pillar
Operational excellence isn’t a one-time effort that you achieve and are done with. Operational excellence is the repeated attempts to achieve greater outcomes from the technologies you choose and to improve your workload’s operational models, procedures, principles, patterns, and practices.
Once security, reliability, performance, and cost have been addressed, you will have hopefully achieved a measure of operational excellence for a period of time, perhaps six months, perhaps longer. Then AWS will introduce a new feature, or a new ser-vice that looks interesting; this will send you back into the testing and development phase once again. Perhaps you will find that one of the recently introduced new features will vastly improve how a current workload could be improved. This cycle of change and improvement will continue, forever. Take any new or improved fea-tures into account. Operational excellence guidance helps workloads successfully operate in the AWS cloud for the long term.
Operational excellence has been achieved when your workload is operating with just the right amount of security and reliability; the required performance is perfect for your current needs; there is no waste or underutilized components in your application stack; and the cost of running your application is exactly right. Achieving this rarefied level of operation might seem like a fantasy, but in fact it’s the ultimate goal of governance processes that have been designed by a Cloud Center of Excellence (Cloud CoE) team within an organization that is tasked with overseeing cloud operations for the organization. The Cloud CoE, and the culture of operational excellence it implements, is the driver that propals improvements and value throughout your organization. But getting there takes work, planning, analysis, and a willingness to make changes to continuously improve and refine operations as a whole.
Changes and improvements to security, reliability, performance efficiency, and cost optimization within your cloud architectures are best communicated through a Cloud CoE. It’s also important to realize that changes that affect one pillar might have side effects in other pillars. Changes in security will affect reliability. Changes in improving reliability will affect cost. Operational excellence is where you can review the entire workload and achieve the desired balance.
NOTE Operational Excellence design principles include organize, prepare, operate, and evolve. These best practices are achieved through automated processes for operations and deployments, daily and weekly maintenance, and mitigating security incidents when they occur.