- Setting up Auditing & Logging of Files/Objects Using Native Windows File Server Tools - 16th October 2020
- Designing Key Performance Indicators (KPI) - 15th July 2020
- DDOS Attacks and Website Hacking - 6th July 2020
What do apps & websites run on?
Standard data centre and cloud infrastructure concepts & components
This is the first article in our series looking into the inner workings of 'the Cloud" and the data centres that host them. With the proliferation of apps and websites, and with seemingly everything being connected to ‘the cloud’, internet use is becoming more critical to our daily lives - to the extent that some are thinking of classifying the internet as utility, just like electricity and gas. As almost everything about our day to day lives - and the broader scheme of human culture - can be found on the internet, I thought it would be a good idea to examine and explain what powers the cloud and what our apps and websites run on. In this blog post, I would like to help the reader understand what actually comprises the cloud, and when we say “something is hosted on the cloud”, what are the devices and components that that something is hosted on.
Apps, websites and all technology services ultimately run on servers, storage and networks. This infrastructure, be it in a private data centre, or a 3rd party one (i.e. the cloud), usually have similar, industry standard, concepts, components and design. In this blog, I will cover what makes the cloud – what are the concepts and components behind the apps we use and the websites we visit. In subsequent postings, I will cover a common design of the infrastructure that these apps and websites run on and how these components are put together.
Before we begin, here are two basic definitions we will be using throughout this blog:
- Server: It is a device or program/software dedicated to providing services to other programs (or the users who in turn are using the programs), referred to as ‘clients’. Servers are sometimes referred to as hosts (because they ‘host’ the service we connect to).
- Data centre: A physical place (an entire building or a small room) that stores all the server, storage and network assets (i.e. devices). You have to keep the machines that power technology somewhere – that somewhere is called a data centre.
It is worth noting that apps, websites, internet services – everything, ultimately runs on servers, storage and networks. These three are the most basic building blocks of the internet and our technology.
What follows are the components and concepts that underly our technology and private as well as public data centres.
Segregated sections of the network, which have communication enabled for devices within the same zone. Devices within a zone can also communicate with devices in other zones, however, for security purposes, especially in AWS, this communication is sometimes disabled by default and has to be enabled through the Firewall, security groups and through routing tables (further details on these concepts are available in our documentation and we will cover them in later tutorials).
The purpose of zones is to reduce the risk of a network being compromised by segregating services into logical groupings that have the same communication security policies and requirements. The most common implementation of network zoning is through IP subnets. Zoning in 3rd party clouds is not usually a physical implementation of segregation. Rather, it is a logical implementation with the IP subnetting implemented through network virtualisation and virtual LANs. Zoning security concepts are implemented through, security and network devices such as configurable switches, Firewalls and Security Groups.
The zones listed below are some of the most common and widely used in enterprise architecture. Please note that, as zoning is defined as a logical grouping of services under the same policy constraints and is driven by business requirements, when a new set of business requirements or policy constraints are established, a new zone(s) may be created to meet those requirements with either new or existing services moving into that zone.
Production (Prod) Zone
Contains all live, customer facing servers, applications and systems which are strictly managed and configured and, through policies and standards, not changed without substantial testing, signoff and approval from all technical and business stakeholders.
Development (Dev) Zone
Contains all systems which are being developed. These are non-customer facing versions of systems in the Prod zone.
A separate zone solely for testing may also be used if there is
- a difference in the environment configuration of the Dev zone/systems and Prod systems
- If the testing, such as load testing, would take over system resources and disable other on-going development work
Demilitarised Zone (DMZ)
Demilitarised Zone (DMZ): A zone that is exposed (i.e. interfaces and communicates with) an untrusted network, such as the internet. The DMZ adds an additional layer of security through segregation. The DMZ sits between the internal network and the untrusted network acting as a gateway. Services within the internal network are safe from the external and untrusted network as they only communicate indirectly to them via services within the DMZ. The communication between the DMZ and the internal services is limited and tightly controlled. The concepts behind the control are:
- Devices within the DMZ can only initiate communication with each other and the external network.
- They cannot initiate connections to the internal network. Instead, devices in the internal network must initiate the connections (i.e. pull the data from the DMZ). This is done in order to reduce the risk of a compromised DMZ service/device connecting to another device on the internal network for malicious intent.
- The internal network is behind a firewall and, preferably, the DMZ too is also behind a firewall. The firewall (explained below) will then monitor, control and if necessary block, communication.
The most common devices and services within a DMZ are:
- Web servers
- Email/messaging servers
- FTP servers
- VoIP servers
Prod Database (DB) Zone
Prod Database (DB): A zone containing production DBs that are used by our website. Because DBs have data transfer and connection protocols that are unique and different to most other types of servers, for security and efficiency purposes (such as firewall and load balancing), DBs are sometimes placed in a zone exclusive to DBs.
Data Staging Area
A zone containing servers that are exclusively used for transmitting data. Because the data can be of a large quantity, the devices used (for e.g. network fibres, storage devices and their capacity, etc.), their configuration and the configuration of the zone in general is set for the most efficient transmission of data.
A firewall is a network security system that monitors incoming and outgoing network traffic and permits / blocks data (packets1)A small unit of data) based on a set of security rules. Its purpose is to establish a barrier between different network zones in order to block malicious traffic like viruses and hackers.
Firewalls are entire systems (i.e. a combination of software and hardware), that traditionally examined packets and allowed or prohibited their transmission through the firewall based on a set of rules. The rules comprise of source and destination IP addresses, protocol and packet type.
Next-generation firewalls (NGFW) combine the traditional firewall with additional functionality such as packet inspection – they examine the data itself within the packet and determine if it should be allowed through or not. Most enterprise firewalls are NGFWs.
An internet gateway is a horizontally scaled, redundant, and highly available set of servers and software that enable communication between a zone (usually the DMZ) and the internet. An internet gateway
- Provides a target for routing (i.e. routing tables) for internet data traffic
- Performs network address translation (NAT) for servers that have been assigned public internet IP addresses.
An internet gateway supports IPv4 and IPv6 traffic and is meant to alleviate availability risk or bandwidth constraint for network traffic.
A load balancer is a sophisticated system (set of servers, software and configurations) that enables one source of data (for e.g. a user), to connect to a server out of many similar servers that is determined by the load balancer to be the optimum server to connect to. This is all done transparently to the user without any significant delay.
In modern day, distributed, systems, there are can be multiple servers that underlie one service. For e.g. there can be many DB servers, each holding a copy of a DB, for one DB service. The load balancer, according to it’s algorithms and configuration, determines which particular DB a user connects to, when it initiates a connection to the DB address. The user will have one address for the DB, which actually connects to the load balancer and the load balancer then determines which DB server the user should connect to and forwards data onwards.
When load balancers are used, they always sit between the user and the server and forward data back and forth. They are critical components in efficient IT systems.
A Content Delivery Network (CDN) is a geographically distributed group of servers which work together to provide fast delivery of Internet content. A CDN holds a copy of the data requested by a user in a geographic region serving that data to that user, and all other subsequent users in that region. This is done so that those users don’t have to connect to a server in a region further away that is having to repeatedly transmit the same data, thereby increasing server load, user queuing duration, and data transmission times. Data transmission times in particular is saved as the CDN server is located physically closer to the user as they are within the same geographic region, whereas the server hosting the original data may be further away in another region.
Domain Name System (DNS) is what is used to translate human readable internet addresses to machine readable addresses. Humans access information online through domain names, like z-tech.io or bbc.co.uk. Web browsers interact through Internet Protocol (IP) addresses (a series of numbers such as 22.214.171.124 for google.com). DNS translates domain names to IP addresses and DNS servers are machines dedicated to answering DNS queries
A domain registrar (full name: Domain Name Registrar) is a service (usually a separate, dedicated business) that handles the reservation of domain names.
I hope the above description of the most common components of the infrastructure that underlies apps and websites has shed some light on what goes on behind the scenes. In future postings, we will cover how these components fit together.
References [ + ]
|1.||↑||A small unit of data|
This was a helpful article. It helped me understand the cloud better. Thanks.
Pingback: How components combine to create the cloud –