• 0

How OpenStack is Built – the Video now Online

Watch Ant Weiss of Otomato provide an overview of the OpenStack CI – probably one of the most advanced Jenkins-based CI infrastructures in the world.


  • 0

The Number One Symptom Of not Actually ‘Doing Devops’

Category : Tools

apply-now-job-hiring-help-ss-1920So I recently talked to a release engineering team leader at a very well-known american software+hardware company located in California.
They contacted me looking for top-notch build infrastructure engineers and we spent a very interesting hour discussing their technological stack and people processes. On the surface – they are moving in the right direction – automating all the things, using Chef for config management, codifying the infrastructure, using Artifactory for binaries… But one thing struck me in our conversation. The team leader (clearly a very smart and talented guy) said : “I’ve been trying to hire for two years. Good professionals are so hard to find…”
I agreed at first – we all know the market is starving for technical talent.
But reflecting on our conversation afterwards I realized that he in fact showed me the number one symptom of how far their team is from ‘doing devops’.

Yes – hiring is challenging. I’ve discussed this before. It may take some time to find the person whose mindset, values and the overall vibe suite your expectations. Even more so – if you’re low on budget, as the market has become highly competitive. But – for this specific team leader the budget was not an issue. He was failing to hire because of his organisation’s unwillingness to invest in mentoring and development.

He was looking for the ready-made fullstack ninja with a very specific skillset. there is no other way to explain his 2 year long quest.

Can you see how this totally opposes the ever important Devops value of Sharing? So you’ve built a few things to be proud of, you’re technologically savvy, you’re on the bleeding edge. Now is the time to share your knowledge! Now is the time to go hire bright novices. They are out there, they are hungry to learn from your experience and by sharing with them you will build the real devops on your team.

So if you’re failing to hire for more than a couple of months – don’t blame the market. Don’t complain about lousy candidates. Go revise your hiring process and more importantly – the way your team works. You may have all the technology, but it looks like your culture is broken. And with broken culture – even if you eventually succeed in hiring , you’ll have hard time reaping the true benefits of the Devops way – agility, quality, motivation and trust.

And may the devops be with you!


  • 0

Custom deploy process at Utab

Hi! I’m Ilya Sher.

This guest post will describe the deploy process at Utab, one of my clients.

Background – system architecture summary

Utab uses the services architecture. Some services are written in Java, others in NodeJS. Each server has either one Java application or one NodeJS application. The production environment uses (with the exception of load balancing configuration) the immutable server approach.

Requirements for the deploy process

The client specified the following requirements:

  1. Support for staging and production environments
  2. Manually triggered deploy for each of the environments
  3. Health check before adding a server to load balancing in production environment
  4. Option to easily and quickly rollback to previous version in production environment
  5. Simple custom tools are to be used ( no Chef/Puppet/… )

 

Solution background

The Utab’s deploy and other scripts were made specifically for the client. Such custom scripts are usually simpler than any ready-made solution. It means they break less and are easier to modify and operate. No workarounds are required in cases when a chosen framework is not flexible enough.

Also note that custom solution solves the whole problem, not just parts of the problem as it’s often the case with ready-made solutions. For example the scripts also handle the following aspects: firewall rules configuration, ec2 volumes attachment, ec2 instances and images tagging, running tests before adding to load balancer. When all of this is solved by a single set of custom scripts and not by using several ready-made solutions plus your own scripts the result looks much better and more consistent.

We estimate that such custom solution has lower TCO despite the development. Also note that in case of Utab, the development effort for the framework scripts was not big: about ten working days upfront and another few days over three years. I attribute this at least partly to sound and simple software and system architecture.

 

Note that the opinion in the summary above has many opponents. My experience shows that using ready-made solutions for the tasks described above lead to higher TCO. This mostly happens because of complexity of such solutions and other reasons that I’ll be discussing in another post.

The build

  1. A developer pushes a commit to Github
  2. Github hook triggers Jenkins
  3. Jenkins does the build and runs the tests: Mocha unit tests for NodeJS or JUnit tests for Java
  4. A small script embeds git branch name, git commit and Jenkins build number in pom.xml, in the description field.
  5. Jenkins places the resulting artifact in the Apache Archiva repository (only if #3 succeeds)

 

What you see above is common practice and you can find a lot about these steps on the internet except for number 4, which was our customization.

Note that while the repository is Java oriented we use it for both Java and NodeJS artifacts for uniformity. We use mvn deploy:deploy-file to upload the NodeJS artifacts.

For NodeJS artifacts, static files that go to a CDN are included in the artifact. Java services are not exposed directly to the browser so Java artifacts do not contain any static files for the CDN.

 

Custom scripts involved in the deployment process

Scripts that use the AWS API are in Python. Rest of them are in bash. I would be happy to use one language but bash is not convenient for using AWS API and Python is not as good as bash for systems tasks such as running programs or manipulating files.

I do hope to solve this situation by developing a new language (and a shell), NGS, which should be convenient for the system tasks, which today include talking to APIs, not just working with files and running processes.

create-image.py

  1. Creates EC2 instance (optional step)
  2. Runs upload-and-run.sh (see the description below).
  3. Runs configure-server.sh to configure the server, including the application (optional step)
  4. Runs tests (optional step)
  5. Creates an AMI

 

upload-and-run.sh

Gets destination IP or hostname and the role to deploy there.

  1. Pulls the required artifacts from the repository. Default is the latest build available in the repository. A git branch or a build number can be specified to pick another build. If a branch is given the latest build on that branch is used. The information embedded in pom.xml  .
  2. Runs all relevant “PRE” hook files (which for example upload static files to S3)
  3. Packages all required scripts and artifacts into a .tar.gz file
  4. Uploads the .tar.gz file
  5. Unpacks it to a temporary directory
  6. Runs setup.sh (a per role file: website/setup.sh, player/setup.sh, etc) which controls the whole installation.

 

deploy.py

  1. Starts EC2 instance from one of the AMIs created by the create-image.py script
  2. Runs applications tests
  3. Updates the relevant load balancers switching from old-version-servers to new-version-servers.

 

The Python scripts, when successful, notify the developers using dedicated Twitter account. This is as simple as:

   api = twitter.Api(c['consumer_key'], c['consumer_secret'], c['access_token_key'], c['access_token_secret'])
   return api.PostUpdate(s)

Pulling artifacts from the repository before deploying

I would like to elaborate about pulling the required artifacts from the repository. The artifacts are pulled from the repository to the machine where the deploy script runs – the management machine (yet another server in the cloud, near the repository and rest of the servers). I have not seen this implemented anywhere else.

Note that pulling artifacts to the machine where the script runs works fine when a management machine is used. You should not run the deployment script from your office as downloading and especially uploading artifacts would take some time. This limitation is a result of a trade off. The gained conceptual simplicity outweighs the minuses. When the setup scripts are uploaded to the destination server, the artifacts are uploaded [with them] so the destination application servers never need to talk to the repository and hence no security issues to be handled for example.

 

Deploying to staging environment

  1. Artifact is ready (see “The build” section above)
  2. One of the developers runs create-image.py with flags that tell create-image.py not to create a new instance and not to creat AMI from instance. This limits create-image.py to the depoy process only: run upload-and-run.sh  .

    Since “ENV” environment variable is also passed, the configuration step is also run ( configure-server.sh )

    Since there are switches and environment variables that create-image.py needs that none of us would like to remember there are several wrapper scripts named deploy-ROLE-to-staging.sh

 

Deploying to production environment

It’s a responsibility of developers to make sure that the build being packaged at this step is one of the builds that were tested in the staging environment.

  1. Artifact is ready (see “The build” section above)
  2. One of the developers runs create-image.py and the script creates an AMI
    The solution to (not) remembering all the right switches and environment variables is documentation in the markdown formatted readme file at the top level of the repository.
  3. One of the people that are authorized to deploy to production runs deploy.py

 

Rollback to previous version

This can be done in one of the following ways:

 

New deploy

Run the deploy.py script giving an older version as an argument

 

Manual quick fix

When old servers are removed from the load balancing, they are not immediately terminated. Their termination is manual, after the “no rollback” decision. The servers that rotated out of the load balancing are now tagged with “env” tag value “removed-from:prod”.

  1. Change the “env” tag of the new servers to “removed-from:prod” or anything else that is not “prod”
  2. Change “env” tag on the old servers to “prod”
  3. Run the load balancers configuration script. The arguments for this script are the environment name and the role of the server. The script updates all the relevant load balancers that point to servers with the given role.

Rollback script

We never finished it as rollbacks were very rare occasion and the other two methods work fine.

Instances tagging and naming

Instances naming: role/build number

For instances with several artifacts that have build number: role/B1-B2-B3-…, where B1 is primary artifacts build number and others follow in order of decreasing importance.

The status tag is updated during script runs so when deploying to production it can for example look like: “Configuring” or “Self test attempt 3/18” or “Self tests failed” (rare occasion!) or “Configuring load balancing”.

 

Summary

This post described one of the possible solutions. I think the solution described above is one of the best possible solutions for Utab. This does not mean it’s a good solution for any particular other company. Use your best judgement when adopting similar solution or any parts of it.

 

You should always assume that a solution you are looking at is not a good fit for your problem/situation and then go on and prove that it is, considering possible alternatives. Such approach helps avoiding the confirmation bias.


  • 0

git submodules vs. google’s repo tool

Category : Tools

git_logo

I was recently asked by a customer to outline the pros and cons of using git submodules vs. google repo tool to manage multi-repository integrations in git.

There are a lot of articles on the internet bashing each of the tools, but in our opinion – most of it comes from misunderstanding the tool’s design or trying to apply it in an unappropriate context.

This post summarizes the general rules of thumb we at Otomato follow when choosing a solution for this admittedly non-trivial situation.

First of all – whenever possible – we recommend integrating your components on binary package level rather than compiling everything from source each time. I.e. : packaging components to jars, npms, eggs, rpms or docker images, uploading to a binary repo and pulling in as versioned dependencies during the build. 

Still – sometimes this is not an optimal solution, especially if you do a lot of feature branch development (which in itself is an anti-pattern in classical Continuous Delivery approach – see here for example).

For these cases we stick to the following guidelines.

Git Submodules :

Pros:

  1. An integrated solution, part of git since v1.5
  2. Deterministic relationship definition (parent project always points to a specific commit in submodule)
  3. Integration points are recorded in parent repo.
  4. Easy to recreate historical configurations.
  5. Total separation of lifecycles between the parent and the submodules.
  6. Supported by jenkins git plugin.

Cons:

  1. Management overhead. (Need separate clones to introduce changes in submodules)
  2. Developers get confused if they don’t understand the inner mechanics.
  3. Need for additional commands (‘clone –recursive’ and ‘submodule update’)
  4. External tools support is not perfect (bitbucket, sourcetree, ide plugins)

—————————————————————-

Google repo:

Pros:

  1. Tracking synchronized development effort is easier.
  2. Gerrit integration (?)
  3. A separate jenkins plugin.

Cons:

  1. An external obscure mechanism
  2. Requires an extra repository for management.
  3. Non-deterministic relationship definition (each repo version can be defined as a floating head)
  4. Hard to reconstruct old versions.
  5. No support in major repo managers (bitbucket, gitlab) or gui tools.

====================================

 

In general : Whenever we want to integrate separate decoupled components with distinct lifecycles – we recommend submodules over repo, but their implementation must come with proper education regarding the special workflow they require. In the long run it pays off – as integration points can be managed in deterministic manner and with knowledge comes the certainty in the tool.

If you find your components are too tightly coupled or you you’re in need of continuous intensive development occurring concurrently in multiple repos you should probably use git subtrees or just spare yourself the headache and drop everything into one big monorepo. (This depends, of course, on how big your codebase is)

To read more about git subtree – see here.

The important thing to understand is that software integration is never totally painless and there is no perfect cure for the pain. Choose the solution that makes your life easier and assume the responsibilty of learning the accompanying workflow. As they say : “it’s not the tool – it’s how you use it.”

I’ll be happy to hear what you think, as this is a controversial issue with many different opinions flying around on the internet.

Keep delivering!

 

And  – if you’re looking for some git consulting – drop us a note and we’ll be happy to help.


  • 0

Jenkins 2.0: With Jenkins Creator – Kohsuke Kawaguchi (KK)

Category : Tools

Hi all!
We really hope you’ve already registered to the Jenkins User Conference Israel which was SOLD OUT yesterday.
But even if you haven’t – there’s still a chance to hang out with Kohsuke Kawaguchi – the father of Jenkins himself – and other Jenkins fans at a meetup the good people at JFrog are organising the day after the conference.
Here is the link: http://meetu.ps/e/BHkfn/jvVGM/d
Right now there are still 22 slots available.
Go grab yours.

Keep delivering!


  • 0

Discount tickets to Jenkins User Conference Tel Aviv

Category : Tools

jenkinsCIA
We are happy to announce that @antweiss will be speaking about the future of software delivery at JUC Israel 2016

We have a couple of 100 NIS discount tickets to the conference and ticket sale is closing tomorrow!

The first two folks to comment on this post will get the discount code.

Type them comments!!!


  • -

Openstack CI infrastructure Overview

Openstack is one of the largest OSS projects today with hundreds of commits flowing in daily. This high rate of change requires an advanced CI infrastructure. The purpose of the talk is to provide an overview of this infrastructure, explaining the role of each tool and the pipelines along which changes have to travel before they find their way into the approved Openstack codebase.
Talk delivered by Anton Weiss at Openstack Day Israel 2016 : http://www.openstack-israel.org/#!agenda/cjg9


  • -

Infrastructure As Code Revisited

elica01

One can not talk about modern software delivery without mentioning Infrastructure As Code (IAC). It’s one of the cornerstones of DevOps. It turns ops into part-time coders and devs into part-time ops. IAC is undoubtedly a powerful concept  – it has enabled the shift to giant-scale data centers, clouds and has made a lot of lives easier. Numerous tools (generally referred to as DevOps tools) have appeared in the last decade to allow codified infrastructures. And even tools that originally relied on a user-friendly GUI (and probably owe much of their success to the GUI) are now putting more emphasis on switching to codified flows (I am talking about Jenkins 2.0 of course, with it’s enhanced support of delivery pipeline-as-code).

IAC is easy to explain and has its clear benefits:

  • It allows automation of manual tasks (and thus cost reduction)
  • Brings speed of execution
  • Allows version control for infrastructure configuration
  • Reduces human error
  • Brings devs and ops closer together by giving them a common language

 

But why am I writing about this now? What made me revisit this already quite widely known and accepted pattern? ( Even many enterprise organizations are now ready to codify their infrastructure )

The reason is that I recently got acquainted with an interesting man who has a mind-stimulating agenda. Peter Muryshkin of SIGSPL.org has started a somewhat controversial discussion regarding the future of devops in the light of overall business digitalisation. One thing he rightfully notices is that software engineering has been learning a lot from industrial engineering – automating production lines, copying Toyotism and theory of constraints, containerising goods and services, etc. The observation isn’t new  – to quote Fred Brooks as quoted by Peter :

Techniques proven and routine in other engineering disciplines are considered radical innovations in software engineering.”

This is certainly true for labour automation that has existed long before IAC has brought its benefits to software delivery. It’s also true for monitoring and control systems which have been used on factories since the dawn of the 20th century and which computers started being used for in the 1960-ies.

But the progress of software delivery disciplines wasn’t incremental and linear. The cloud and virtualization virtually exploded on us. We didn’t have the time to continue slowly adapting the known engineering patterns when the number of our servers suddenly rocketed from dozens to thousands.

In a way – that’s what brought IAC on. There were (and still are) numerous non-IAC visual infrastructure automation tools in the industry. But their vendors couldn’t quite predict the needed scale and speed of operation caused by the black hole of data gravity. So the quick and smart solution of infrastructure-as-code was born.

And that brings us to what I’ve been recently thinking quite a lot about – missing visualization.  Visibility and transparency (or measurement and sharing) are written all over the DevOps banners. Classic view of IAC actually insists that “Tools that utilize IaC bring visibility to the state and configuration of servers and ultimately provide the visibility to users within the enterprise“ In theory – that is correct. Everybody has access to configuration files, developers can use their existing SCM skills to make some sense of system changes over time… But that’s only in theory.

The practice is that with time the amount of IAC code lines grows in volume and complexity. As with any programming project – ugly hacks get introduced, logic bugs get buried under the pile of object and component interactions. (And just think of the power of a bug that’s been replicated to a thousand servers) Pretty soon only the people who support the infra code are able to really understand why and what configuration gets applied to which machine.

In the past years I’ve talked to enough desperate engineers trying to decipher puppet manifests or chef cookbooks written by their ops colleagues. Which makes me ask if maybe IAC sometimes hinders devops instead of enabling it…

Even the YAML-based configurations like those provided by Ansible or SaltStack become very hard to read and analyze beyond simple models.

As always is the problem with code – it’s written by humans for machines, not for other humans.

But on the other hand  – machines are becoming ever better at visualizing code so humans can understand them. So is that happening in the IAC realm?

In my overview of Weapons of Mass Configuration I specifically looked at the GUI options for each of the 4 ninja turtles of the IAC and sadly found out that not even one of them got any serious praise from the users for their visualization features. Moreover the GUI solutions were disregarded as “just something the OSS vendors are trying to make a buck from”.

 

I certainly see this as a sign of infrastructure automation still being in its artisanal state. Made by and for the skilled craft workers who prefer to write infra code by hand. But exactly as the artisans had to make way for the factories and labour automation of the industrial revolution – a change is imminent in the current state of IAC. It’s just that the usable visualization is still lacking, the tools still require too much special skills and the artisans of the IAC want to stay in control.

 

Don’t get me wrong – I’m not saying our infra shouldn’t be codified. Creating repeatable, automated infrastructure has been my bread and butter for quite some time and tools like Puppet and Ansible have made this a much easier and cleaner task. I just feel we still have a long way to go. ( Immutable infrastructure and containerisation while being a great pattern and having benefits of its own also relies heavily on manual codification of both the image definitions and the management layers. )

Infrastructure management and automation is still too much of an issue and still requires too much special knowledge to be effectively handled with the existing tools. Ansible is a step in the direction of simplicity, but it’s a baby step. Composing infrastructure shouldn’t be more complicated than assembling an Ikea bookshelf  – and for that new, simpler, ergonomic UIs need to be created.

Large-scale problems need industrial level solutions. Let’s just wait and see which software vendor provides such a solution first – one which will make even the artisans say ‘Wow!’ and let go of their chisel.

And with that said – I’ll go back to play with my newly installed Jenkins 2.0 instance for some artisanal pipeline-as-code goodness. :)


  • -

Microservice Development Toolkits part 1 – Otto

codfresh

Otomato is happy to announce we’re collaborating with Codefresh on their great continuous delivery for docker containers solution. (If you’re using docker in your development/production or only thinking of doing it – go check out http://codefresh.io. ) Our collaboration is mostly focused on industry analysis and docker development ecosystem research with the goal of identifying pain points and providing effective solutions. With quite a bit of evangelism on top. The linked post is the first fruit of that. ottologo

Microservice Development Workflow with Otto

 


  • -

Have you heard of Wajig?

We love productivity tools of all kind and believe that software should be making our lives easier. For some reason many software products we have to work with don’t really feel like their creators agree. They may have great architecture, reliability and functionality but it looks like usability is snapped on as an afterthought.
One example of this is apt – the Advanced Packaging Tool used on Debian/Ubuntu Linux. While a great tool in itself, for some reason it provides a number of different user interfaces for different purposes.

To install/remove/upgrade/download a package there's the apt-get tool:

apt-get is a simple command line interface for downloading and
installing packages. The most frequently used commands are update
and install.

Commands:
   update - Retrieve new lists of packages
   upgrade - Perform an upgrade
   install - Install new packages (pkg is libc6 not libc6.deb)
   remove - Remove packages
   autoremove - Remove automatically all unused packages
   purge - Remove packages and config files
   source - Download source archives
   build-dep - Configure build-dependencies for source packages
   dist-upgrade - Distribution upgrade, see apt-get(8)
   dselect-upgrade - Follow dselect selections
   clean - Erase downloaded archive files
   autoclean - Erase old downloaded archive files
   check - Verify that there are no broken dependencies
   changelog - Download and display the changelog for the given package
   download - Download the binary package into the current directory

 

But then if you want to search or query for packages you have to remember that a different utility is used – apt-cache:

apt-cache is a low-level tool used to query information
from APT's binary cache files

Commands:
   gencaches - Build both the package and source cache
   showpkg - Show some general information for a single package
   showsrc - Show source records
   stats - Show some basic statistics
   dump - Show the entire file in a terse form
   dumpavail - Print an available file to stdout
   unmet - Show unmet dependencies
   search - Search the package list for a regex pattern
   show - Show a readable record for the package
   depends - Show raw dependency information for a package
   rdepends - Show reverse dependency information for a package
   pkgnames - List the names of all packages in the system
   dotty - Generate package graphs for GraphViz
   xvcg - Generate package graphs for xvcg
   policy - Show policy settings


And what if we want to list all the files installed by a specific package? Well we’re supposed to remember that there’s a special utility for that – apt-file. And it’s not installed by default…

Figures we weren’t the only ones puzzled by this multitude of interfaces. Wajig is targeted at exactly that – providing a unified interface to all package-related functionality on Debian/Ubuntu. Or as stated on the site:
“Wajig is a simplifed and more unified command-line interface for package management. It adds a more intuitive quality to the user interface.  Wajig commands are entered as the first argument to wajig. For example: “wajig install gnome”. Written in Python, Wajig uses traditional Debian administration and user tools including apt-get, dpkg, apt-cache, wget, and others. It is intended to unify and simplify common administrative tasks. … You will also love the fact that it logs what you do so you have a trail of bread crumbs to back track with if you install something that breaks things.”

Another nice thing – wajig knows when root privileges are needed and takes care of that, so we don’t need to rerun a command each time we forget to prepend it with sudo.

We still need to play with wajig a bit more to decide if it performs on all its promises but it definitely is going into a right direction.

Wishing you all user-friendly software and a good week.