Application
This unit describes the skills and knowledge required to design and implement fault tolerant and scalable workloads to achieve high availability in a cloud environment.
The unit applies to cloud computing architects, cloud developers, cloud engineers and those engaged in designing and implementing cloud computing solutions for a business. It applies to individuals in Information Communications Technology (ICT) professions involved in systems design and systems architecture.
No licensing, legislative or certification requirements apply to this unit at the time of publication.
Elements and Performance Criteria
1. Identify high-availability requirements | 1.1 Determine reliability, recoverability and service levels required for application 1.2 Determine cloud infrastructure according to business needs 1.3 Identify level of shared security responsibility models according to business needs |
2. Evaluate architecture availability | 2.1 Review architecture of traditional multi-tier web application in non-cloud environment and identify high availability requirements 2.2 Identify any single points of failure 2.3 Estimate recovery objectives for multi-tier web components and for overall architecture 2.4 Determine components that must scale vertically and the potential impact on system availability 2.5 Document architecture review findings according to business needs |
3. Design cloud-based architecture for high availability | 3.1 Design equivalent architecture for high availability using cloud services 3.2 Identify and remove single points of failure as required 3.3 Estimate recovery objectives for each component and overall architecture 3.4 Determine components that must scale vertically and the potential impact on system availability 3.5 Document architecture design according to business needs |
4. Implement cloud-based architecture for high availability | 4.1 Implement architecture design in cloud environment 4.2 Demonstrate connectivity between resources at all tiers 4.3 Monitor and measure availability of resources 4.4 Simulate failures of component and confirm that infrastructure is fault tolerant 4.5 Simulate resizing components likely to impact performance and measure availability impact 4.6 Compare and document simulation findings according to documented design |
5. Finalise cloud infrastructure | 5.1 Adjust and improve availability of architecture according to simulations as required 5.2 Confirm, seek and respond to feedback with required personnel 5.3 Obtain final sign off from required personnel |
Evidence of Performance
The candidate must demonstrate the ability to complete the tasks outlined in the elements, performance criteria and foundation skills of this unit, and to:
design and implement at least one fault tolerant cloud infrastructure on a cloud platform resilient to networking, compute, storage, database and data centre failures
design and deploy automated infrastructure scaling for at least one business need
simulate failures of at least one component and demonstrate is fault tolerant.
In the course of the above, the candidate must:
use cloud management console, software development kits or command line tools
define, monitor and record resource availability in cloud environment, including:
reliability
recoverability
service levels
scalability.
Evidence of Knowledge
The candidate must demonstrate knowledge to complete the tasks outlined in the elements, performance criteria and foundation skills of this unit. This includes knowledge of:
industry technology standards used in cloud computing solutions and services
current industry standard hardware and software products, their general features, capabilities and application, including storage technology
different cloud cost models as they relate to scalability of cloud infrastructure
definitions, functions, features and uses of different cloud infrastructure resources as they apply in cloud architecture to high availability, including:
fault tolerance and single points of failure
reliability as defined by mean time to failure (MTTF), to repair (MTTR) and between failures (MTBF)
recoverability as measured by recovery time (RTO) and recovery point (RPO) objectives
service level agreements (SLAs)
vertical and horizontal scalability
testing and debugging techniques, including techniques to avoid single point failures
tools and techniques to measure availability impact
features of cloud services, including differences between built-in fault tolerance and infrastructure designed for fault tolerance
purpose and features of load balancing and autoscaling as related to improve availability within cloud environment
techniques, methods and industry standard metrics used to monitor performance of cloud resources.
Assessment Conditions
Skills in this unit must be demonstrated in a workplace or simulated environment where the conditions are typical of those in a working environment in this industry.
This includes access to:
cloud vendor service provider
cloud managed database service
information and data sources required to design and implement cloud infrastructure
integrated development environment (IDE)
specific requirements and industry standards, organisational procedures and legislative requirements, including business and functionality requirements, as required
internet and web browser
secure shell (SSH) or remote desktop protocol (RDP) client to connect to cloud-hosted instances
data to gather information from to determine output and user requirements, including user access and business protocols.
Assessors of this unit must satisfy the requirements for assessors in applicable vocational education and training legislation, frameworks and/or standards.
Foundation Skills
Oral communication | Uses listening and questioning techniques to articulate complex concepts and requirements using industry language for intended audience |
Reading | Interprets complex technical and operational documentation to determine and confirm job requirements |
Problem solving | Uses a mix of intuitive and formal processes to identify key information and issues, evaluates alternative strategies, anticipates consequences and considers implementation issues and contingencies Uses knowledge of context to address common problems in cloud computing applications and cloud-based environments |
Self-management | Demonstrates a sophisticated knowledge of principles, concepts, language and practices associated with cloud computing and the digital world and uses them to troubleshoot and understand the uses and potential of new technology |
Sectors
Cloud computing