Z Tech is a technologist, senior programme director, business change lead and Agile methodology specialist. He is a former solutions architect, software engineer, infrastructure engineer and cyber security manager. He writes here in his spare time about technology, tech driven business change, how best to adopt Agile practices and cyber security.

Build into the Architecture the Gathering of Usage Metrics

No cloud architecture, in my opinion, is complete without a design for monitoring and logging the performance of the cloud. This plan (generally speaking, the same metrics for performance can be used) should also include the gathering of usage estimates. Not only will these metrics be used as input into the calculations for TCO, but they can also be used by the business for determining success of products and marketing initiatives. For example, a metric for gathering the amount of incoming data can be used by the business to determine the amount of usage by users of a platform for posting user comments.

Everything has a cost

This is worth repeating: Everything in a third party hosted cloud such as Amazon’s or Microsoft’s, has a cost. From data transfer to the internet, to storage of snapshots, to the number of DNS queries received, everything will have a cost.

The best way to account for these costs is to first design the cloud, and for each component, understand how much it will be used. That will lead to numbers that can be used as input in the calculations for the TCO.

Understand How the Provider Costs their Services

In essence, when you are moving to a third-party cloud like AWS or Azure, what you’re doing is renting their services and systems which in turn enable your products, services and systems. Meaning, you will be using systems and services, provided by someone else, for which that someone will charge a fee. Those systems and services will be used by you, in turn, to build products, services and systems that you will offer to your customers and clients.

Thus, what you have actually paid for is a partnership with the cloud provider and, much as it is in life, it is always important to understand your partner to make that partnership a success. As the partnership is transactional and fee based, to build a successful partnership, that does not lead to nasty surprises, it is important to understand what the provider counts as transactions and how it calculates the fee for them. This understanding broadly speaking should cover:

What unit is being measured
The charges for that unit
How usage charges may vary upon use. For example, the first 1TB of data is charged at a certain amount but the next 1TB of data usage is charged at a lesser amount
How different components within the service have different and charges applied to them. For example, different types of storage medium have different charges based on their retrieval speed. Such as, new logs may be stored on SSDs for faster retrieval, but older logs can be stored on magnetic tape, as they are less frequently used and have a lower storage cost.

Understand the Pricing Strategy Options

Most cloud hosts provide various options on payment structure – fully up front, partially up front, payments based on server up time, or payments based on compute cycles. This is where knowledge of how the provider charges for it’s services helps – knowing that the provider charges for uptime based on server type and method of payment, it is possible to save costs of around 30% or more by paying for a certain type of pricing strategy (based on the virtual instance) paid for in a certain way (fully upfront).

Understand Your Requirements and Design

It’s important to understand how your design of the cloud will be using the cloud providers services. Only having a high level understanding of the services being used and their costs tends to lead to inaccurate TCO estimates based on incomplete information.

How the services are used in the design has a large influence on what the actual TCO costs will be. I always try and gain an understanding of how the design works, how it will be using the cloud providers systems, the utilisation amount of each component, and then apply the cloud provider’s charges.

The Cost of Computation

For example, recently, a client of mine asked me to design an architecture and TCO estimate of a small pilot on AWS for an online web-based application that would be used internally within the company for improving communication flow. For this, I had to not only understand the user requirements of the solution, but also the usage requirements too:

How many users will be using the system simultaneously to determine what type of server to run the pilot on and any peaks/troughs
The availability requirements of the system – 24 hours a day, including weekends?
The type of data that will be communicated and it’s size?
Whether internal usage means data communication to/from the internet.

The Cost of Storage

To accurately estimate this important part of the costs, and also one of the larger cost components, it’s important to understand how the system is designed to use the storage component and how the cloud provider charges for this component.

The first part of of calculating the TCO for storage is to calculate how much live storage, that is in use by server is required.

The next part is about backups. Cloud providers use a system of incremental backups, whereby only the changed data in between backups, is backed up and stored. Thus, we have to understand how much data will users be inputting into the system and what type that data will be. This is usually not easy to calculate as most organisations do not track data consumption metrics. However, there are a few approaches to estimating this:

Talking to departmental/team leads to better understand daily usage habits and extrapolate from there, or by looking at incoming data network traffic sizes if that is tracked.

Looking at requisition details, if kept, to see the quanity of additional storage purchases and dividing the time between them to estimate daily storage increases.

Databases (DB) sometimes have their read/wrote I/O tracked, therefore speaking to DB admins provides reasonable estimates to the amount of data used by DB systems.

The cost of Networks

We also need to know about the network related costs of our usage. Networks is a large part of the tenancy of third party clouds and all cloud providers charge for it. The network usage we need to know for TCO costs are:

What is the type of data that will be transferred and what is the quantity of that data
How much data will be transferred into the cloud
How much data will be transferred out of the cloud
How much data will be transferred in between cloud regions, i.e. internally within the cloud but between different regions

But this isn’t the only costs in our cloud network. Again, it’s important to understand all components in the design and how those components are charged for by the provider.

Most providers also charge for services that support data transmission – not just the transmission itself. We know all architectures makes use of DNS and DHCP as well as other services such as load balancing etc. Cloud operators (such as AWS) break these network enablers down into the following chargeable services

Charges per DNS query

Charge per static public facing IP address used, with a possible different charge rate for static IP addresses purchased but not allocated for use.

Monitoring, logging and alerts for our cloud platform to keep an eye on its general health.

CDN for content distribution

load balancing for lessening the load on systems and efficiency

Refining and Updating the TCO

My approach to maintaining the TCO is to, quarterly or monthly, update the calculations based on the data being collected for TCO estimation. This means that as part of TCO estimation, you need to also design, from the start as part of designing the solution architecture, how you will maintain and refine the TCO and the metrics that will have to be gathered for that.

Ten Metrics for Maintaining a TCO Estimate

There are many metrics that can be gathered to be input into an estimate – they depend mainly on your use case. However, there are 10 metrics that I believe generally help refine TCO estimates in most cases:

CPU utilisation: Some providers give the option of charging by the amount of computation power used. Additionally, this helps in monitoring the capacity available in the system and helps ensure that the usage of servers is efficient – not too low or high
Memory Utilisation
Storage Utilisation
Network Traffic: As discussed above, network usage can contribute significantly to the cost of a tenancy on a third party cloud. Monitoring this to gain a more accurate figure for network requirements helps gain a more accurate TCO
The number of DNS queries
The number of IP addresses used, and how many IP addresses are acquired, but not assigned, to live servers. Most cloud providers have a separate rate for IP addresses assigned to server instances that are not in use (i.e. turned off)
The incremental changes, in GB, of backups
The number of custom monitoring requirements the cloud provider will be giving
The amount of data transferred to the CDN
The amount of data transferred to the internet from the CDN