What is Fault Tolerance in a virtual world? Sometimes I wonder, what
does this ‘fault tolerance’ or a ‘zero-downtime’ means to people at large? How
does it make a difference in our lives? Do we really need to know all the
technology jargon? How things function? Probably not. Folks! We don’t need to
know all of it. But, we can definitely cherish the benefits coming over to us
with the magic words ‘Abracadabra’, what this technology is providing us with ‘a
glitch free’ access to our everyday applications that we use. Whether, it is a ‘Facebook’
or a ‘Twitter’ application we are working on or trying to access our email
messages, or a mobile applications running on a server hosted far off in a data
center or a banking application across the globe. It doesn’t matter, even, if we
are digging deep into ‘Big Data’ – performing data mining task or using
analytics for businesses, social media marketing, political campaigns, philanthropic
activities, astronomical data crunching exploring the universe on a high power computing set up
extracting information from silos of data or databases running with the support
of virtual machines with no down time. Or, it could be any other possible task,
we can think off that require uninterrupted access. It might astonish some of us
that a lot of jugglery is happening behind the scene that we would have never
noticed or come to know. A huge number of swapping around of VMs (Virtual Machines)
in a fraction of a second that occurs within the network in a data center or
different interconnected data centers or in a cloud or in the clouds of clouds
– ‘The Cloud’ – a larger cloud formed with multiple integrated clouds
constantly working in real time to provide us an environment with a zero-down
time.
Well, everything in a virtualized environment is automated
linked to one another with pre coded instruction derived from ‘n’ number of
algorithms designed to perform millions of permutations and combinations a human
factor could think to make this happen. Friends! You know the irony, we
build a technology to eliminate ourselves. This is the price we pay to build
and design an automated system that will take care of itself with a minimal
intervention from us. Sometimes, I think, so far the down time we all have
experienced in past can definitely be blamed on a person who did not do his/her
job carefully. Else, we would have never thought of a fool proof system that
will take care of itself. In fact, an automated system that can fix itself
before we even come to know. The amount of energy, resources,
efforts, and technology used, algorithms running, information documented and
processed, security checks, cross overs, swapping of machines and much more
happens just to provide us an environment, where our single key stroke may not
go waste either. Or a frustrated devil may not come out of us to hit the
monitor or a keyboard on a smallest glitch that can cost us, a million dollar worth
of loss at work. It could be a trigger to ‘an opportunity cost is an
opportunity……?’
Well, coming back to the concept
of virtualization floating around for a while, from a leading technology
organization. True! We cannot miss it and the very first name that pops up ‘VMWare’.
Am I right? To be, honest, I think these folks have really pushed their brand
name hard on us. It reminds me of the word ‘Windows’ in early 80’s and how it evolved
in our everyday life. This may not have
reached that magnitude as such. But, somewhat we can relate to it that way. Alright!
If not everyone, at least some of us for sure. Question is, how did they come up with this idea and what
was the objective behind it? My guess is that our industry was struggling with the
traditional methods of handling hardware and software failures. And, these guys
turned out be the lucky ones!! Alright!! Intelligent ones:).
Today, I believe, VMware has made our lives easy by developing this component
of ‘fault tolerance’, which is now, widely used in the enterprise businesses
these days to prevent application disruption due to hardware failures. They
have designed it, to be associated with mission-critical enterprise
applications that can be very expensive and disruptive to businesses. Whereas,
traditional solutions that address this problem through hardware redundancy or
clustering are complex and expensive. If we compare, VMware high availability
(HA) that addresses server failures by automatically restarting virtual
machines on alternate servers. We will find that VMware’s ‘fault tolerance’
things takes the entire ball game of high availability to the next level. What
it does, it completely wipes out downtime due to hardware failures with
simplicity, across all applications, regardless of operating system. Isn’t it
amazing!!
Basically, it provides operational continuity and high levels of uptime to an information technology’s infrastructure environment, with simplicity and at a low cost. Let’s try to understand how it works, so all of us can get hang of it? It works with existing VMware’s high availability (HA) or (Distributed Resource Scheduler - DRS) clusters and can be simply turned on or turned off for virtual machines. When applications require operational continuity during critical periods such as month end or quarter end time periods for financial applications, the fault tolerance feature can be turned on with the click of a button to provide extra assurance. The operational simplicity of this ‘fault tolerance’ component is embedded in the vSphere and is a big life saver and cost at times.
High availability is commonly is understood as a method to ensure a resource is always available. But, the fact of the matter is that the resource may get affected with a few minor downtimes. For instance, with Hyper-V also has high availability feature. Because, in the event a host fails, the guest operating system just stop. And, it doesn’t give enough time to migrate the up and running state to another host. Thus, it results into a minor downtime. Irrespective of the technology owners, it the same scenario with VMware High Availability (HA). Despite of, VMware’s vMotion capabilities it cannot be used because, the host stops right at that time itself. And, it leaves us with no live memory to move the guest OS. Thus, we lose the in-memory application state with high availability.
Basically, it provides operational continuity and high levels of uptime to an information technology’s infrastructure environment, with simplicity and at a low cost. Let’s try to understand how it works, so all of us can get hang of it? It works with existing VMware’s high availability (HA) or (Distributed Resource Scheduler - DRS) clusters and can be simply turned on or turned off for virtual machines. When applications require operational continuity during critical periods such as month end or quarter end time periods for financial applications, the fault tolerance feature can be turned on with the click of a button to provide extra assurance. The operational simplicity of this ‘fault tolerance’ component is embedded in the vSphere and is a big life saver and cost at times.
High availability is commonly is understood as a method to ensure a resource is always available. But, the fact of the matter is that the resource may get affected with a few minor downtimes. For instance, with Hyper-V also has high availability feature. Because, in the event a host fails, the guest operating system just stop. And, it doesn’t give enough time to migrate the up and running state to another host. Thus, it results into a minor downtime. Irrespective of the technology owners, it the same scenario with VMware High Availability (HA). Despite of, VMware’s vMotion capabilities it cannot be used because, the host stops right at that time itself. And, it leaves us with no live memory to move the guest OS. Thus, we lose the in-memory application state with high availability.
On the other hand, ‘fault
tolerance’ mean we don't lose the in-memory application state in the event of a
failure such as occurrence of a host crash. If we see ‘Fault Tolerance’ is much
stronger than high availability in a virtual environment. But, it forces us to maintain
two copies of a virtual machine, each on separate hosts. In the event of a
change in the state of memory and device status on the primary host, these
changes get automatically recoded and are replayed simultaneously, on the
secondary copy of the VM copied earlier.
Currently, only VMware vSphere
has this fault tolerance capabilities, but it only supports a single logical
processor on the VM is supported. Fault tolerance also has very high network
requirements, but it provides the capability for a fault tolerant solution that
results in no downtime, even if a host fails.
No comments:
Post a Comment