ICTCLD502
Design and implement highly-available cloud infrastructure


Application

This unit describes the skills and knowledge required to design and implement fault tolerant and scalable workloads to achieve high availability in a cloud environment.

The unit applies to cloud computing architects, cloud developers, cloud engineers and those engaged in designing and implementing cloud computing solutions for a business. It applies to individuals in Information Communications Technology (ICT) professions involved in systems design and systems architecture.

No licensing, legislative or certification requirements apply to this unit at the time of publication.


Elements and Performance Criteria

ELEMENTS

PERFORMANCE CRITERIA

Elements describe the essential outcomes.

Performance criteria describe the performance needed to demonstrate achievement of the element.

1. Identify high-availability requirements

1.1 Determine reliability, recoverability and service levels required for application

1.2 Determine cloud infrastructure according to business needs

1.3 Identify level of shared security responsibility models according to business needs

2. Evaluate architecture availability

2.1 Review architecture of traditional multi-tier web application in non-cloud environment and identify high availability requirements

2.2 Identify any single points of failure

2.3 Estimate recovery objectives for multi-tier web components and for overall architecture

2.4 Determine components that must scale vertically and the potential impact on system availability

2.5 Document architecture review findings according to business needs

3. Design cloud-based architecture for high availability

3.1 Design equivalent architecture for high availability using cloud services

3.2 Identify and remove single points of failure as required

3.3 Estimate recovery objectives for each component and overall architecture

3.4 Determine components that must scale vertically and the potential impact on system availability

3.5 Document architecture design according to business needs

4. Implement cloud-based architecture for high availability

4.1 Implement architecture design in cloud environment

4.2 Demonstrate connectivity between resources at all tiers

4.3 Monitor and measure availability of resources

4.4 Simulate failures of component and confirm that infrastructure is fault tolerant

4.5 Simulate resizing components likely to impact performance and measure availability impact

4.6 Compare and document simulation findings according to documented design

5. Finalise cloud infrastructure

5.1 Adjust and improve availability of architecture according to simulations as required

5.2 Confirm, seek and respond to feedback with required personnel

5.3 Obtain final sign off from required personnel

Evidence of Performance

The candidate must demonstrate the ability to complete the tasks outlined in the elements, performance criteria and foundation skills of this unit, and to:

design and implement at least one fault tolerant cloud infrastructure on a cloud platform resilient to networking, compute, storage, database and data centre failures

design and deploy automated infrastructure scaling for at least one business need

simulate failures of at least one component and demonstrate is fault tolerant.

In the course of the above, the candidate must:

use cloud management console, software development kits or command line tools

define, monitor and record resource availability in cloud environment, including:

reliability

recoverability

service levels

scalability.


Evidence of Knowledge

The candidate must demonstrate knowledge to complete the tasks outlined in the elements, performance criteria and foundation skills of this unit. This includes knowledge of:

industry technology standards used in cloud computing solutions and services

current industry standard hardware and software products, their general features, capabilities and application, including storage technology

different cloud cost models as they relate to scalability of cloud infrastructure

definitions, functions, features and uses of different cloud infrastructure resources as they apply in cloud architecture to high availability, including:

fault tolerance and single points of failure

reliability as defined by mean time to failure (MTTF), to repair (MTTR) and between failures (MTBF)

recoverability as measured by recovery time (RTO) and recovery point (RPO) objectives

service level agreements (SLAs)

vertical and horizontal scalability

testing and debugging techniques, including techniques to avoid single point failures

tools and techniques to measure availability impact

features of cloud services, including differences between built-in fault tolerance and infrastructure designed for fault tolerance

purpose and features of load balancing and autoscaling as related to improve availability within cloud environment

techniques, methods and industry standard metrics used to monitor performance of cloud resources.


Assessment Conditions

Skills in this unit must be demonstrated in a workplace or simulated environment where the conditions are typical of those in a working environment in this industry.

This includes access to:

cloud vendor service provider

cloud managed database service

information and data sources required to design and implement cloud infrastructure

integrated development environment (IDE)

specific requirements and industry standards, organisational procedures and legislative requirements, including business and functionality requirements, as required

internet and web browser

secure shell (SSH) or remote desktop protocol (RDP) client to connect to cloud-hosted instances

data to gather information from to determine output and user requirements, including user access and business protocols.

Assessors of this unit must satisfy the requirements for assessors in applicable vocational education and training legislation, frameworks and/or standards.


Foundation Skills

This section describes those language, literacy, numeracy and employment skills that are essential to performance but not explicit in the performance criteria.

SKILL

DESCRIPTION

Oral communication

Uses listening and questioning techniques to articulate complex concepts and requirements using industry language for intended audience

Reading

Interprets complex technical and operational documentation to determine and confirm job requirements

Problem solving

Uses a mix of intuitive and formal processes to identify key information and issues, evaluates alternative strategies, anticipates consequences and considers implementation issues and contingencies

Uses knowledge of context to address common problems in cloud computing applications and cloud-based environments

Self-management

Demonstrates a sophisticated knowledge of principles, concepts, language and practices associated with cloud computing and the digital world and uses them to troubleshoot and understand the uses and potential of new technology


Sectors

Cloud computing