My DevOps Journey and How I Became a Recovering IT Operations Manager

Devops

Nov 15, 2017

My DevOps Journey and How I Became a Recovering IT Operations Manager

DevOps Challenges

I managed an eight-person team that supported data integration

tools for a Fortune 100 tech company. One of the tools we supported

was adopted by every business unit to integrate a large number of

applications, database, and information repositories together. Over

a period of 7 - 8 years the number of production integration

applications grew to over 800. During the years of scaling multiple

environments we faced several challenges:

Managing the large number of servers
Maintaining performance
Ensuring high availability
Keeping up with user support

We hosted the integration platform on Oracle/Solaris servers

that could handle the load of 20 - 30 integration applications

each. The first performance challenge we faced was the integration

platform repository. All integration application deployments are

performed by using the integration platform’s administration tool

that stored each application’s configuration into a database

repository. As the number of applications grew, eventually the

performance of the administrator repository starting impacting the

time to deploy new applications and the time it took to start up

any application. The solution was to break the single domain into

business unit domains, each with a smaller sized administrator

repository. But this introduced a new problem: a significant

increase in the number of hosts needed to run the multiple

instances of the administrator. When virtualization technology was

introduced into Solaris via Solaris Zones, we were able to reduce

the number of physical hosts by running each domain administrator

instance in a different zone on a single physical host.

The next challenge we faced was upgrading the data integration

platform. To perform an upgrade the entire production environment

needed to be taken down in its entirety since the platform would

only run if all nodes ran the same version of the platform. To

complicate matters, even though the integration process engines

were supposed to be forward compatible with newer versions of the

integration platform, we were required to have all process engines

tested by the owning business units before the upgrade. It was an

impossibility to get all the BU’s to test their applications in a

narrow timeframe so we would have a completely tested set of

production apps when the upgrade would take place.

Finding the right tools

The method we chose to work around this was to build out a

completely new production environment with the latest integration

platform and migrate apps from the old environment as BU’s tested

and cleared their apps for the newer platform version. This spread

the upgrade cycle out over several months, was extremely wasteful

in hardware resources, and added a huge management burden on my

team. All of this kept our upgrade cycles rather long. Even though

there were major upgrades twice a year and monthly patches, we were

only able to do upgrades every three years!

Technology kept advancing, and cloud services started appearing.

Our vendor fielded a private cloud solution that included features

specific to the integration platform. I saw several aspects of its

capabilities that I knew I could leverage to overcome difficulties

we had in managing and scaling our integration environments.

The cloud product had an auto-restart capability for application

failures that eliminated the need to run high availability pairs of

integration processes which immediately reduced my CAPEX by 50%.

That savings more than paid for the cloud product in the first year

of operation.

Another feature of the cloud product was the ability to deploy the

integration platform and integration processes into containers. The

great aspect of this was that each logical machine could run a

completely independent stack of the integration platform components

deployed in an environment. Gone was the requirement that every

node in an environment had to run the same version of the component

stack. Now upgrades could be done on a container by container basis

with no need to field additional hardware, significantly

simplifying and reducing the cost of upgrades.

We also took advantage of script-driven automation tools to create

automated deployment processes. All a developer had to do was to

email his integration process artifact along with a descriptor file

to an email pipe to a process that deployed the artifact to the

target non-production environment and domain. Production

deployments were a little different because instead of

automatically deploying artifacts to production, the artifact was

staged and a ticket was generated for a request for my team to do

the deployment.

This provided a huge boost to productivity - development teams

didn’t have to wait for my team to deploy their apps before they

could begin testing, QA, or UAT cycles. My team also saved

significant time not having to manually configure and deploy 40+

apps per week. We also noticed another benefit of automated

deployments: An almost complete elimination of deployment failures.

Previous to automated deployments, my team had to manually

configure each deployment. By eliminating this step, so were errors

made by my team when re-keying in application configuration

parameters.

DevOps solved the challenges

Not long after we fielded the cloud platform and automated

processes I started hearing the DevOps buzzword. As I started to

learn what DevOps entailed, I saw the

potential to utilize the technologies and tools to make further

improvements in managing all the middleware my team managed. The

further I explored, the more I realized the full impact that

incorporating DevOps could have on an IT organization. In addition

to increasing productivity and saving costs by eliminating lots of

manual processes, DevOps could also:

Automate deployment and configuration of infrastructure
Allow a means to make infrastructure available through self-service
Greater stability of the infrastructure through consistency of builds
Produce higher quality code through automated testing
Reduce service outages by eliminating the main sources of failures
Provide faster feedback to developers reducing the time and cost of debugging
Make deployments and upgrades seamless, eliminating the need to perform them on nights and weekends
Improve coordination and communication between dev and ops teams
Allow IT to rapidly meet new business objectives

It was gratifying to realize, even back before DevOps was called

DevOps, the sort of huge impact that this technology was having on

my colleagues’ productivity. I was sold on the huge benefits that

come from adopting DevOps, and I’m not the least bit surprised at

how quickly DevOps has become a major movement in the IT industry.

A quick survey of DevOps tool providers will turn up a hundred

companies, and the list is continuing to grow. If I were doing the

same project today, I’d be using Linux and a cloud provider like

AWS, and tools like Docker and Kubernetes.

It was gratifying to be one of the early adopters of

containerization and automated deployments, and I can honestly say

they worked like a charm on the very first project we used them on

-- even though it was a very complex and mission-critical set of

enterprise systems. Sometimes you just know you picked the right

technology The only thing I can’t fathom is how I survived with my

sanity intact after so many years as an IT operations manager

without DevOps.

If you liked this article you might also like:

Containerizing a legacy application
Multifaceted Testing
DevOps Best Practices - Immutability
The value of DevOps
Kurbernetes on AWS