Category Archives: Software Delivery

  • 0

Continuous Lifecycle London 2017

Last week I had the honour to speak about ChatOps at Continuous Lifecycle conference in London. The conference is organised by The Register and heise Developer and is dedicated to all things DevOps and Continuous Software Delivery. There were 2 days of talks and one day of workshops. Regretfully I couldn’t attend the last day, but I heard some of the workshops were really great.

The Venue

QEIICC_evcom_partnership-20141007102013918

The venue was great! Situated right in the historical centre of London city, a few steps away from Big Ben, the QEII Center has a breathtaking view and a lot of space. The talks took place in 3 rooms : one large auditorium and 2 — smaller ones. It is quite hard to predict which talks will attract the most audience and it was hit and miss this time around too. Some talks were over-crowded while others felt a bit empty.

Between the talks everybody gathered in the recreation area to collect merchandise from the sponsors’ stands and enjoy coffe and refreshments.

The Audience

The pariticipants were mostly engineers, architects and engineering managers. As it happens too often in DevOps gatherings— the business folks were relatively few. Which is a pity — because DevOps and CI/CD is a clear business advantage based on better tech and process optimization. The sad part is the techies understand it, but the business people still too often fail to see the connection.

The Talks

Beside the keynotes (I only attended the first one) there were 3 tracks running in parallel. I had a chance to attend a few selected ones in between networking, mentally preparing for my talk and relaxing afterwards.

The keynote

The opening keynote was delivered by Dave Farley — the author of the canonical ‘Continuous Delivery’ book. Dave is a great speaker. He was talking about the necessity of iterative and incremental approaches to software engineering while bringing some exciting examples from space exploration history. Still to me it felt a bit like he was recycling his own ideas. The book was published 7 years ago. At that time it was a very important work. It laid out all the concepts and practices many of us were applying (or at least trying to promote) in a clear and concise way. I myself have used many of the examples from the book to explain CI/CD to my managers and employees numerous times over the years. But time has passed and I feel we need to find new ways of bringing the message. I do realise many IT organisations are still far from real continuous delivery. Some still don’t feel the danger of not doing it, others are afraid of sacrificing quality for speed. But more or less everybody already knows the theory behind it. Small batches, process automation, trunk-based development, integrated systems etc. It’s the implementation of these ideas that organisations are struggling with. The reasons for that are manyfold — politics, low trust, inertia, stress, burnout and lack of motivation. And of course the ever growing tool sprawl. What people really want to hear today is how to navigate this new reality. Practical advices on where to start, what to measure and how to communicate about it. Not the beaten story of agile software delivery and how it’s better than other methodologies.

The War Stories

Thankfully there was no lack of both success and failure stories and practical tips. There were some great talks on how to do deployments correctly, stories of successful container adoption and also Sarah Wells’ excellent presentation of the methodologies for influencing and coordinating the behaviours of distributed autonomous teams.

Focus on Security

As I already said — quite naturally not all the talks got the same level of interest. Still I think I noticed a certain trend — the talks dedicated to security attracted the largest crowd. Which is in itself very interesting. Security wasn’t traditionally on the priority list of DevOps-oriented organisations. Agility, quality, reliability — yes. Security — maybe later.

The disconnect was so obvious that some folks even called for adding the InfoSec professionals into the loop while inventing such clumsy terms as DevOpSec or DevSecOps.

But now it looks lke there’s a change in focus. New deployment and orchestration technologies are bringing new challenges and we suddenly see the DevOps enablers looking for answers to some hard questions that InfoSec is asking. No wonder all the talks on security I atteneded got a lot of attention. Lianping Chen’s presentation was focused on securing our CI/CD pipeline, while Dr. Phil Winder provided a great overview of container security best practices with a live demo and quite a few laughs. And there was also Jordan Taylor’s courageous live demo of using Hashicorp Vault for secret storage.

As a side note — if you’re serious about your web application and API security — you should definitely look at beame.io — they have some great tech for easy provisioning of SSL certificates in large volumes.

And for InfoSec professionals looking to get a grip on container technologies here’s a seminar we’ve recently developed : http://otomato.link/otomato/training/docker-for-information-security-professionals/

ChatOps

My talk was dedicated to the subject that I’ve been passionate about for the last couple of years — ChatOps. The slides are already online, but they are just illustrating the ideas I was describing so it’s better to wait until the video gets edited and uploaded (yes, I’m impatient too). In fact — while preparing for the talk I’ve laid out most of my thoughts in writing and I’m now thinking of converting that into a blog post. Hope to find some time for editing in the upcoming days. And if you’d like some help or advice enabling ChatOps at your company – drop us a line at contact@otomato.link

There was another talk somehow related to the topic at the conference. Job van der Voort — Gitlab’s product marketing manager — descirbed what he calls ‘Conversational Development’ — “a natural evolution of software development that carries a conversation across functional groups throughout the development process.” Gitlab is a 100% remote working company and accroding to Job, this mode of operation allows them to be effective and ensure good communication across all teams.

GitLab Dinner

At the end of the first day all the speakers got an invitation to a dinner organised by Gitlab. There were no sales pitches — only good food and a great opportunity to talk to colleagues from all across Europe. Many thanks go to Richard and Job from Gitlab for hosting the event. BTW — I just discovered that Job is coming to Israel and will be speaking at a meetup organised by our friends and partners — the great ALMToolBox. If you’re in Israel — it’s a great chance to learn more about Gitlab and enjoy some pizza and beer on the 34th floor of Electra Tower. I’ll be there.


  • 0

CI/CD for Microservices – Challenges and Best Practices

2625fb74-c674-49e3-b87d-08b6d807c273_749_499

Introduction:

Microservice Software Architecture is a software system architecture pattern whereas an application or a system is composed of a number of smaller interconnected services. This is in opposite to the previously popular monolith architectures in which, even if having a logically modular, component-based structure the application is packaged and deployed as a monolith.

The Microservice architectural pattern while having many benefits (which we’ll briefly outline in the following paragraph) also presents new challenges all along our software delivery pipeline. This whitepaper strives to map out these challenges and define the best practices for tackling them to ensure a streamlined and quality-oriented delivery process.

Microservice Architecture Benefits:

  • Smaller Application Footprint (per Service)

The ‘Micro’ notion of the concept has been getting some justified critic – as it’s not really about the size of the codebase, but more about correct logical separation of concerns. Still once we do split our existing or potential monolith into a number of services  – each separate service will inevitably have a smaller footprint which brings us the following benefits:

  • Comprehensibility

Smaller applications are generally easier to understand, debug and change than complex, large systems.

  • Shorter Development Time

Smaller applications start-up time is shorter, making developers more efficient in their dev-test cycles and allowing for agile development.

  • Improved Continuous Delivery/Deployment Support

Updating a large complex system takes a lot of time while the implication of each change is hard to identify. On the other hand – when working with microservices – each service is (or should be) deployable independently, which makes it much easier to update just a part of the system and rollback only the breaking change if anything goes wrong.

  • Scalability Improvement

When each service is scalable independently it becomes easier to provision the resources adapted for its special needs. Additionally we don’t need to scale everything. We can for example run a few instances of front-end logic but only one instance of business logic.

  • Easier to adopt new frameworks/technologies.

Unlike a monolith we don’t have to write everything in Java or Python (for example). As each service is installed and built independently – it can be written in a different language and based on a different framework. As long as it supports the well-defined API/message protocol to communicate with other services.

  • Allows for a more efficient organizational structure.

It’s been noted by a number of studies that individual performance is reduced when the overall team size grows beyond a specific size (namely 5-7 engineers). (See also the 2-Pizza-Team concept) This is caused by several reasons like coordination/communication complexity, the ‘social loafing’ and ‘free rider’ phenomena. Microservice architecture allows a small team to maintain the development of each service – thus allowing for more efficient development iterations and improved communication.

And now that we’ve outlined the benefits, let’s look at the challenges of this architectural pattern as manifested in CI/CD concerns.

The Challenges of Microservice Architecture (from CI/CD perspective)

All challenges of microservices are caused by the complexity which originates from the fact that we are dealing with a distributed system. Distributed systems are inherently more difficult to plan, build and reason about. Here’s a list of specific challenges we will encounter:

  • Dependency Management

One of the biggest enemies of distributed architectures are dependencies. In a microservice-based architecture, services are modeled as isolated units that manage a reduced set of problems. However, fully functional systems rely on the cooperation and integration of its parts, and microservice architectures are not an exception. The question then becomes: how do we manage dependencies between multiple fast-moving, independently evolving components?

  • Versioning and Backward Compatibility

Microservices are developed in isolation with each service having its own distinct lifecycle. This requires us to define very specific versioning rules and requirements. Each service has to make absolutely clear which versions of dependent services it relies upon.

  • Data partitioning and sharing

Best practices of microservice development propose having a separate database for each service. However this isn’t always feasible and surely is never easy when you have transactions spanning multiple services. In CI/CD context this may mean we have to deal with multiple inter-related schema updates.

  • Testing

While being able to operate in isolation – a microservice isn’t worth much without its counterparts. On the other hand – bringing up the full system topology for testing just one service cancels out the benefits of modularity and encapsulation that microservices are supposed to bring. The challenge here is to define the minimum viable system testing subset and/or provide good enough mockups/stubs for testing in absence of real services. Additional challenges lie in service communication patterns which mostly occur over network and therefore must take in account possible network hiccups and outages.

  • Resource Provisioning and Deployment

In a microservice architecture each service should be independently deployable. On the other hand we need a way to know where and how to find this service. We need a way to tell our services where the shared resources (like databases, data collectors and message queues) reside and how to access them. This brings about the need for service discovery, configuration separation from business functionality and failure resilience in case certain service is temporarily missing/unavailable.

  • Variety/Polyglossia

Microservices allow us to develop each service in a different language and using a different framework. This lets us use the right tool for the job in each individual case but it’s a mixed blessing from the delivery viewpoint. Because our delivery pipeline is most efficient when it defines a clear unified framework with distinct building blocks and a simple API. This may become challenging when having to support a variety of technologies, build tools and frameworks.

Tackling the Microservice Architecture Challenges in the CI/CD Pipeline (and beyond)

Now that we’ve outlined the challenges accompanying the delivery of microservice-based systems, let’s define the best practices for dealing with them when building a modern CI/CD pipeline.

Dependency Management:

Even before looking at build-time dependency management we need to look at the wider concepts of service inter-dependency. With microservices each service is meant to be able to operate on its own. Therefore in an ideal setting no direct build-time dependencies should be needed at all. At the maximum a dependency on a common communication protocol and API version can be in place, with version changes taken care of by backward compatibility and consumer-driven contracts.

In order to achieve this the architectural concepts of loose coupling and problem locality should be applied when splitting up our system into separate services.

  • Loose coupling: microservices should be able to be modified without requiring changes in other microservices.
  • Problem locality: related problems should be grouped together (in other words, if a change requires an update in another part of the system, those parts should be close to each other).

 

      • If two or more services are too tightly coupled – i.e. have too many correlated changes which require careful coordination – it should be considered to unify them into one service.
      • If we’re not in the ideal setting of loose coupling and concern separation and re-architecting the system is currently impossible (for lack of resources or business reasons) then strict semantic versioning should be applied to all interdependent services. This is to make sure we are building and deploying against correct versions of service counterparts.

Versioning:

As stated in the previous paragraph – semantic versioning is a good way of signifying when a breaking change occurs in the service semantics or data schema. In practice this means that any given service should be able to talk to another service as long as the contract between them is sealed. With the MAJOR field of semantic version being the guarantee of that seal. For experimental or feature branches – the name of the branch can be added as metadata to the version name as suggested here: http://semver.org/#spec-item-10

Data Partitioning:

  • If each service is based on its own database then database schema changes and deployment becomes the responsibility of that service installation scripts. For the CI/CD pipeline it means we need to be able to support multiple database deployment in our test and staging environment provisioning cycles.
  • If services share databases it means we need to identify all the data dependencies and retest all the dependent services whenever a shared data schema is changed.  

Testing

  • For a much deeper look at microservice testing patterns look here: http://martinfowler.com/articles/microservice-testing
  • Deployment of individual services should be a part of the end-to-end test to verify successful upgrade and rollback procedures as part of the full system test.
  • End-to-End Tests should be only executed after unit and integration tests have completed successfully and test coverage thresholds have been met. This is because the setup and execution of the e2e environment tends to be difficult and error-prone and we should introduce sufficient gating to ensure its stability.
  • In such a case it may be a good idea to separate integration tests in CI pipeline so that external outages don’t block development.
  • Due to interservice communication reliance on network and overall system complexity integration tests can be expected to fail with higher frequency due to non-related infrastructure or version dependency errors.
  • Integration tests: As stated earlier – the minimum viable subset of interdependent services should be identified wherever possible to simplify testing environment provisioning and deployment.
  • Automated Deployment becomes absolutely necessary with each service deployable by itself and a deployment orchestration solution (e.g Ansible playbook) describing the various topologies and event sequences.
  • Test doubles (Mocks, Stubs, etc) should be encouraged as a tool for testing service functionality in isolation.
  • Coverage thresholds are a good strategy for ensuring we’re writing tests for all the new functionality.
  • Unit tests become especially important in microservice environment as they allow for faster feedback without the need to instantiate the collaborating services. Test-Driven Development should be encouraged and unit test coverage should be measured.

Resource Provisioning and Deployment

  • Infrastructure-as-Code approach should be used for versioned and testable provisioning and deployment processes.
  • Microservices should enable horizontal scaling across a compute resource cluster. This calls for using:
    • A central configuration mechanism in a form of a distributed Key-Value store (such as Consul or etcd). Our CI/CD pipeline should support separate deployment of configuration to that store.
    • A cluster task scheduler (e.g Docker Swarm, Mesos, Kubernetes or Nomad). The CD process needs to interface with whichever system we choose  for implementing scratch rollouts, rolling updates and blue/green deployments.
    • Microservice architectures are often enabled by OS container technologies like Docker. Containers as a packaging an delivery mechanism should definitely be evaluated.

Variety/Polyglossia

It is very desirable to base the CI/CD flow for each service on the same template which includes the same building blocks and a well defined interface. I.e each service should provide similar ‘build’ , ‘test’, ‘deploy’ and ‘promote’ endpoints for integration into the CI system. Additionally the interface for querying service interdependency should be clearly defined. This will allow for almost instant CI/CD flow creation for each new service and will reduce the load on the CI/CD team. Ultimately this should allow developers to plug-n-play new services into the CI/CD system in a fully self-service mode.

Otomato Software ltd. 2016 All Rights Reserved.®


  • 0

Jenkins and the Future of Software Delivery

Are you optimistic when you look into the future and try to see what it brings? Do you believe in robot apocalypse or the utopia of singularity? Do you think the world will change to the better or to the worse? Or are you just too busy fixing bugs in production and making sure all your systems are running smoothly?

Ant Weiss of Otomato describing the bright future of software delivery at Jenkins User Conference Israel 2016.


  • 0

How OpenStack is Built – the Video now Online

Watch Ant Weiss of Otomato provide an overview of the OpenStack CI – probably one of the most advanced Jenkins-based CI infrastructures in the world.


  • 0

Custom deploy process at Utab

Hi! I’m Ilya Sher.

This guest post will describe the deploy process at Utab, one of my clients.

Background – system architecture summary

Utab uses the services architecture. Some services are written in Java, others in NodeJS. Each server has either one Java application or one NodeJS application. The production environment uses (with the exception of load balancing configuration) the immutable server approach.

Requirements for the deploy process

The client specified the following requirements:

  1. Support for staging and production environments
  2. Manually triggered deploy for each of the environments
  3. Health check before adding a server to load balancing in production environment
  4. Option to easily and quickly rollback to previous version in production environment
  5. Simple custom tools are to be used ( no Chef/Puppet/… )

 

Solution background

The Utab’s deploy and other scripts were made specifically for the client. Such custom scripts are usually simpler than any ready-made solution. It means they break less and are easier to modify and operate. No workarounds are required in cases when a chosen framework is not flexible enough.

Also note that custom solution solves the whole problem, not just parts of the problem as it’s often the case with ready-made solutions. For example the scripts also handle the following aspects: firewall rules configuration, ec2 volumes attachment, ec2 instances and images tagging, running tests before adding to load balancer. When all of this is solved by a single set of custom scripts and not by using several ready-made solutions plus your own scripts the result looks much better and more consistent.

We estimate that such custom solution has lower TCO despite the development. Also note that in case of Utab, the development effort for the framework scripts was not big: about ten working days upfront and another few days over three years. I attribute this at least partly to sound and simple software and system architecture.

 

Note that the opinion in the summary above has many opponents. My experience shows that using ready-made solutions for the tasks described above lead to higher TCO. This mostly happens because of complexity of such solutions and other reasons that I’ll be discussing in another post.

The build

  1. A developer pushes a commit to Github
  2. Github hook triggers Jenkins
  3. Jenkins does the build and runs the tests: Mocha unit tests for NodeJS or JUnit tests for Java
  4. A small script embeds git branch name, git commit and Jenkins build number in pom.xml, in the description field.
  5. Jenkins places the resulting artifact in the Apache Archiva repository (only if #3 succeeds)

 

What you see above is common practice and you can find a lot about these steps on the internet except for number 4, which was our customization.

Note that while the repository is Java oriented we use it for both Java and NodeJS artifacts for uniformity. We use mvn deploy:deploy-file to upload the NodeJS artifacts.

For NodeJS artifacts, static files that go to a CDN are included in the artifact. Java services are not exposed directly to the browser so Java artifacts do not contain any static files for the CDN.

 

Custom scripts involved in the deployment process

Scripts that use the AWS API are in Python. Rest of them are in bash. I would be happy to use one language but bash is not convenient for using AWS API and Python is not as good as bash for systems tasks such as running programs or manipulating files.

I do hope to solve this situation by developing a new language (and a shell), NGS, which should be convenient for the system tasks, which today include talking to APIs, not just working with files and running processes.

create-image.py

  1. Creates EC2 instance (optional step)
  2. Runs upload-and-run.sh (see the description below).
  3. Runs configure-server.sh to configure the server, including the application (optional step)
  4. Runs tests (optional step)
  5. Creates an AMI

 

upload-and-run.sh

Gets destination IP or hostname and the role to deploy there.

  1. Pulls the required artifacts from the repository. Default is the latest build available in the repository. A git branch or a build number can be specified to pick another build. If a branch is given the latest build on that branch is used. The information embedded in pom.xml  .
  2. Runs all relevant “PRE” hook files (which for example upload static files to S3)
  3. Packages all required scripts and artifacts into a .tar.gz file
  4. Uploads the .tar.gz file
  5. Unpacks it to a temporary directory
  6. Runs setup.sh (a per role file: website/setup.sh, player/setup.sh, etc) which controls the whole installation.

 

deploy.py

  1. Starts EC2 instance from one of the AMIs created by the create-image.py script
  2. Runs applications tests
  3. Updates the relevant load balancers switching from old-version-servers to new-version-servers.

 

The Python scripts, when successful, notify the developers using dedicated Twitter account. This is as simple as:

   api = twitter.Api(c['consumer_key'], c['consumer_secret'], c['access_token_key'], c['access_token_secret'])
   return api.PostUpdate(s)

Pulling artifacts from the repository before deploying

I would like to elaborate about pulling the required artifacts from the repository. The artifacts are pulled from the repository to the machine where the deploy script runs – the management machine (yet another server in the cloud, near the repository and rest of the servers). I have not seen this implemented anywhere else.

Note that pulling artifacts to the machine where the script runs works fine when a management machine is used. You should not run the deployment script from your office as downloading and especially uploading artifacts would take some time. This limitation is a result of a trade off. The gained conceptual simplicity outweighs the minuses. When the setup scripts are uploaded to the destination server, the artifacts are uploaded [with them] so the destination application servers never need to talk to the repository and hence no security issues to be handled for example.

 

Deploying to staging environment

  1. Artifact is ready (see “The build” section above)
  2. One of the developers runs create-image.py with flags that tell create-image.py not to create a new instance and not to creat AMI from instance. This limits create-image.py to the depoy process only: run upload-and-run.sh  .

    Since “ENV” environment variable is also passed, the configuration step is also run ( configure-server.sh )

    Since there are switches and environment variables that create-image.py needs that none of us would like to remember there are several wrapper scripts named deploy-ROLE-to-staging.sh

 

Deploying to production environment

It’s a responsibility of developers to make sure that the build being packaged at this step is one of the builds that were tested in the staging environment.

  1. Artifact is ready (see “The build” section above)
  2. One of the developers runs create-image.py and the script creates an AMI
    The solution to (not) remembering all the right switches and environment variables is documentation in the markdown formatted readme file at the top level of the repository.
  3. One of the people that are authorized to deploy to production runs deploy.py

 

Rollback to previous version

This can be done in one of the following ways:

 

New deploy

Run the deploy.py script giving an older version as an argument

 

Manual quick fix

When old servers are removed from the load balancing, they are not immediately terminated. Their termination is manual, after the “no rollback” decision. The servers that rotated out of the load balancing are now tagged with “env” tag value “removed-from:prod”.

  1. Change the “env” tag of the new servers to “removed-from:prod” or anything else that is not “prod”
  2. Change “env” tag on the old servers to “prod”
  3. Run the load balancers configuration script. The arguments for this script are the environment name and the role of the server. The script updates all the relevant load balancers that point to servers with the given role.

Rollback script

We never finished it as rollbacks were very rare occasion and the other two methods work fine.

Instances tagging and naming

Instances naming: role/build number

For instances with several artifacts that have build number: role/B1-B2-B3-…, where B1 is primary artifacts build number and others follow in order of decreasing importance.

The status tag is updated during script runs so when deploying to production it can for example look like: “Configuring” or “Self test attempt 3/18” or “Self tests failed” (rare occasion!) or “Configuring load balancing”.

 

Summary

This post described one of the possible solutions. I think the solution described above is one of the best possible solutions for Utab. This does not mean it’s a good solution for any particular other company. Use your best judgement when adopting similar solution or any parts of it.

 

You should always assume that a solution you are looking at is not a good fit for your problem/situation and then go on and prove that it is, considering possible alternatives. Such approach helps avoiding the confirmation bias.


  • -

Openstack CI infrastructure Overview

Openstack is one of the largest OSS projects today with hundreds of commits flowing in daily. This high rate of change requires an advanced CI infrastructure. The purpose of the talk is to provide an overview of this infrastructure, explaining the role of each tool and the pipelines along which changes have to travel before they find their way into the approved Openstack codebase.
Talk delivered by Anton Weiss at Openstack Day Israel 2016 : http://www.openstack-israel.org/#!agenda/cjg9


  • -

Infrastructure As Code Revisited

elica01

One can not talk about modern software delivery without mentioning Infrastructure As Code (IAC). It’s one of the cornerstones of DevOps. It turns ops into part-time coders and devs into part-time ops. IAC is undoubtedly a powerful concept  – it has enabled the shift to giant-scale data centers, clouds and has made a lot of lives easier. Numerous tools (generally referred to as DevOps tools) have appeared in the last decade to allow codified infrastructures. And even tools that originally relied on a user-friendly GUI (and probably owe much of their success to the GUI) are now putting more emphasis on switching to codified flows (I am talking about Jenkins 2.0 of course, with it’s enhanced support of delivery pipeline-as-code).

IAC is easy to explain and has its clear benefits:

  • It allows automation of manual tasks (and thus cost reduction)
  • Brings speed of execution
  • Allows version control for infrastructure configuration
  • Reduces human error
  • Brings devs and ops closer together by giving them a common language

 

But why am I writing about this now? What made me revisit this already quite widely known and accepted pattern? ( Even many enterprise organizations are now ready to codify their infrastructure )

The reason is that I recently got acquainted with an interesting man who has a mind-stimulating agenda. Peter Muryshkin of SIGSPL.org has started a somewhat controversial discussion regarding the future of devops in the light of overall business digitalisation. One thing he rightfully notices is that software engineering has been learning a lot from industrial engineering – automating production lines, copying Toyotism and theory of constraints, containerising goods and services, etc. The observation isn’t new  – to quote Fred Brooks as quoted by Peter :

Techniques proven and routine in other engineering disciplines are considered radical innovations in software engineering.”

This is certainly true for labour automation that has existed long before IAC has brought its benefits to software delivery. It’s also true for monitoring and control systems which have been used on factories since the dawn of the 20th century and which computers started being used for in the 1960-ies.

But the progress of software delivery disciplines wasn’t incremental and linear. The cloud and virtualization virtually exploded on us. We didn’t have the time to continue slowly adapting the known engineering patterns when the number of our servers suddenly rocketed from dozens to thousands.

In a way – that’s what brought IAC on. There were (and still are) numerous non-IAC visual infrastructure automation tools in the industry. But their vendors couldn’t quite predict the needed scale and speed of operation caused by the black hole of data gravity. So the quick and smart solution of infrastructure-as-code was born.

And that brings us to what I’ve been recently thinking quite a lot about – missing visualization.  Visibility and transparency (or measurement and sharing) are written all over the DevOps banners. Classic view of IAC actually insists that “Tools that utilize IaC bring visibility to the state and configuration of servers and ultimately provide the visibility to users within the enterprise“ In theory – that is correct. Everybody has access to configuration files, developers can use their existing SCM skills to make some sense of system changes over time… But that’s only in theory.

The practice is that with time the amount of IAC code lines grows in volume and complexity. As with any programming project – ugly hacks get introduced, logic bugs get buried under the pile of object and component interactions. (And just think of the power of a bug that’s been replicated to a thousand servers) Pretty soon only the people who support the infra code are able to really understand why and what configuration gets applied to which machine.

In the past years I’ve talked to enough desperate engineers trying to decipher puppet manifests or chef cookbooks written by their ops colleagues. Which makes me ask if maybe IAC sometimes hinders devops instead of enabling it…

Even the YAML-based configurations like those provided by Ansible or SaltStack become very hard to read and analyze beyond simple models.

As always is the problem with code – it’s written by humans for machines, not for other humans.

But on the other hand  – machines are becoming ever better at visualizing code so humans can understand them. So is that happening in the IAC realm?

In my overview of Weapons of Mass Configuration I specifically looked at the GUI options for each of the 4 ninja turtles of the IAC and sadly found out that not even one of them got any serious praise from the users for their visualization features. Moreover the GUI solutions were disregarded as “just something the OSS vendors are trying to make a buck from”.

 

I certainly see this as a sign of infrastructure automation still being in its artisanal state. Made by and for the skilled craft workers who prefer to write infra code by hand. But exactly as the artisans had to make way for the factories and labour automation of the industrial revolution – a change is imminent in the current state of IAC. It’s just that the usable visualization is still lacking, the tools still require too much special skills and the artisans of the IAC want to stay in control.

 

Don’t get me wrong – I’m not saying our infra shouldn’t be codified. Creating repeatable, automated infrastructure has been my bread and butter for quite some time and tools like Puppet and Ansible have made this a much easier and cleaner task. I just feel we still have a long way to go. ( Immutable infrastructure and containerisation while being a great pattern and having benefits of its own also relies heavily on manual codification of both the image definitions and the management layers. )

Infrastructure management and automation is still too much of an issue and still requires too much special knowledge to be effectively handled with the existing tools. Ansible is a step in the direction of simplicity, but it’s a baby step. Composing infrastructure shouldn’t be more complicated than assembling an Ikea bookshelf  – and for that new, simpler, ergonomic UIs need to be created.

Large-scale problems need industrial level solutions. Let’s just wait and see which software vendor provides such a solution first – one which will make even the artisans say ‘Wow!’ and let go of their chisel.

And with that said – I’ll go back to play with my newly installed Jenkins 2.0 instance for some artisanal pipeline-as-code goodness. :)