Redhat Summit 2016 Day 2

2016-07-06 · 3728 words · 18 minute read

Conferences · Technology

containers · docker · devops · selinux · capabilities · openshift · jboss · tpm · security · cloud

A bunch of attendees opened the day with a 5K run in the morning; a nice route cutting down to the waterfront, past the baseball stadium, under the bridge to Oakland, and back up third street; the misty morning gave nice views of the bridge as we ran, and I may have been the third person home. Which arguably speaks to the general fitness level of the people at the conference, rather than my running prowess.

Morning keynotes

Paul Cormier - President of Technologies and Products

Unfortunately due to an tag fail I had to nip back to the hotel and come into this one half way through. Focused around the changes in infrastucture (cloud and hyrid cloud) and application stacks, with a view to the lifecycle management.

This is much less of blue-sky talk than the opening keynotes; heavily on message for OpenShift and OpenStack providing hybrid capabilities; the idea that future is hybrid models rather than pure cloud.

Another strong message: collaborative culture is critical, both within and between orgaisations; Paul argues that free software communities are experts in culture and therefore have an edge in this area.

(I would suggest that closed sort companies consistently have better numbers on, e.g. women working on development would suggest that the free software world is not as flash at collaboration and good culture as it would like to think.)

“Companies that do not embrace this model will be left behind. They will not be able to keep up.”

Given that I missed the opening part of this keynote I may have missed something that made it feel like it was a wonderful thing, but the half I did see was a bit blah.

Burr Sutter - Director of Developer Experience

Burr kicks off by describing a traditional lifecycle: set up of your workstation, request a VM by ticket, wait, repeat for each migration. Then on the ops side there’s a queue of tickets from developers, patching, production problems… Neither side particularly enjoys working with the other, it’s slow and painful.

So Burr wants to show us how to get to production faster using OpenShift as container management tooling, hooked into a Jenkins-driven CI/CD pipeline. His live demo will bring together a movile application built with a number of components that look like the sort you find in a typical modern app stack: VertX event bus, EAP 7, BPM suite, node.js, and Postgres DBs (the last of these as stateful containers).

The new JBoss Developer (and Eclipse it’s based on) will mirror the OpenShift configured production stacks with no developer config work required - you select the stack and away it goes and pulls in your container for you. This is a really valuable change in my world - there’s still a lot of effort that goes into getting a floor of developers using the same environment setup.
Change some code.
git commit.
git push.
The pipeline is in Jenkins, and kicks off an S2I build in OpenShift. This all looks very familiar to me, but I will note that they have the prettiest Jenkins pipeline animation/dashboard ever.
The QA pipeline being shown includes both the functional and non-functional testing elements (such as performance and security tests).
Eventually a mobile web game is extruded and we’re invited to play along. While we do so, assets are updated and pushed, changing on the client as they roll out of the blue/green pipelines.
“Your business analyst is a first-class client of your DevOps pipeline” - via the rules engine and writing or updating rules. Rule updates are checked into git and go through the same pipeline as code changes; this is less a function of OpenShift and more a function of the BPM stack they’re using. The demo shows this by changing the rules of the mobile
Ansible and Ansible Tower are used to spin up OpenShift nodes in the deployment to provide node scaling when you run out of pods.
The Postgresql pod has a “persistent volume claim”, which tells OpenShift that it needs to move the data with the pod when e.g. a migration across nodes happens, which is nice, but probably upsets some container purists.
Canary builds are demonstrated; asset changes are staged and then rolled out to steadily increasing proportions of the audience.

This was a great demo/keynote, and a lot braver than you’d normally see, with banks of laptops and a cross-functional team on stage kicking off the various code and rule changes while Burr spoke. A lot more compelling than a canned video or a long talk.

Container Security - As Explained by the Three Pigs

Dan Walsh

When should I use containers vs virtual machines?

First Dan invites us to think about different isolation models in terms of housing:

Standalone homes (seperate physical machines) - the most secure option.
Duplex homes (virtual machines) - more secure than containers “anyone tells you any different, they’re a liar.” Dan backs this assertion by noting the chain of things that have to be broken to get from one VM to another on the same physical node: first the guest OS, the the virtualisation, then the hypervisor host. He notes there have been only two breakouts for KVM in 8 years, both of which can be blocked by selinux.
Apartment building (containers). In an apartment building, the front desk is the SPOF; if you can convince the building supervisor to let you in, you can go anywhere. In containers, the kernel is the SPOF. If you can break the kernel, you can roam freely.
Hostel (the traditional Unix model of services on the same machine). There’s no seperation, essentially. Pivoting is trivial.
Sleeping in a Park (disabling selinx while hosting many services on the same machine). Dan is a fan of selinux.

The benefits of containers are the resource management and efficiency, and so on; while they increase security compared to the traditional model, Dan asserts their security is “good enough”, not great.

Which platform should host my containers?

The House of Straw: roll your own container management solution. Do you really want to be wholly responsible for everything?
The House of Sticks: Community distros like Fedora, Ubuntu, sitting close to upstream. Good, but “not perfect”.
Bricks: RHEL is “the most secure kernel”. “Also it pays my salary. If you’re running CentOS you’re taking money out of my pocket.”

“Run the latest kernel” does get you the latest security fixes, but also the latest vulnerabilities. Dan argues a trailing kernel with backports of security fixes is the sweet spots. That is… an argument.

While I appreciate some of this is on-message for a vendor, it was probably the weakest section. I agree the rolling your own is a stupid approach, but that’s as much to do with the money and effort wasted on ego-driven software development as anything else. I didn’t feel like I got a compelling, objective argument about why I should pick RHEL over CentOS or Fedora; something like the rate of closing CVEs, or the overhead involved in the upgrade treadmill for distros that EOL every year would be a lot more compelling.

How do I ensure container seperation?

“Containers do not contain” - because kernel vulnerabilities, including local ones, allow for a takeover of the machine. That’s a very big attack service. Dan didn’t mention it, but the culture in the kernel community of basically treating local vulnerabilities as uninteresting because you can’t stop them easily really doesn’t help here. It’s not hard to trawl through security discussions on linux-kernel and find people arguing there’s little point worrying about local vulns.

Do you care?

You should be using good security code on the code you run, anyway. And containers are more secure than classic shared-service hosting. So you’re already significantly better off.

Treat contained services like regular serices: drop privs, treat root in the containers as root outside the host. Apply good standard practises.

An argument Dan didn’t make: while getting root to pivot can be really useful, especially in genuinely shared-service hosting arrangements, if you’re attacking an application to extract credit card data (for example), protecting root is completely uninteresting: once the attacker owns the application, it’s game over. I dream of attackers stupid enough to be more interested in a root pivot than in hijacking bank transfers from an app vulnerability.

Why don’t containers contain?

Not everything is namespaced yet: kernel filesystems, cgroups, selinux, and so on are not namespaced, so they are all pivot points to own the container host’s kernel.

Overview of security within containers

Read only mount points for the kernel filesystem. Processes in containers shouldn’t be writing to /proc, for example; this eliminates whole classes of vulnerability.
Capabilities - use the capabilities to take away the mount command, even from root, as well as many others (modify clock, change audit, probe modules, and so on). We get a brief history of capabilities[^capabilties].
CAP_NET_ADMIN and CAP_SYS_ADMIN. THe former removes networking capabilities, so let the container setup the network and not let anything in the container. The latter is a catch-all for lazy kernel programmers to put “privileged operations” into the capabiltiies system. You can pretty much safely remove it from most containers. And that removes mounting (amongst other things).
Namespaces: provided some security: the PID namespace and network namespspace limit your ability to attack the rest of the system, sniff traffic, and so on.
cgroups: device - “should have been a namespace” - locks down which devices will show up in a container. Removing access to devices dramatically reduces the attack surface.

SELinux

“I pledge allegiance to SELinux…”
LABELS LABELS LABELS.
Type enforcement: locks container processes to reading and execing /usr, and can only write to container files. If they break out they can’t reach /home, /etc, and so on.
MCS Enforcement: Multi-category security (based on multi-level security). This lets you refine it further by saying each container and each container filesystem are MCS-labelled so that containers can’t access one another’s containers.
SECCOMP: shrinks the attack surface on the kernel by eliminating syscalls, dropping 32-bit syscalls, and block old weird networks (it is very unlikely your container needs access to AppleTalk syscalls).
User name spaces: maps UIDs into container space. The container “root” is actually global UID 5000. It’s a great idea, but doesn’t work on the filesystems, jut the process table. That means you need one filesystem per container. Al Viro is blocking a sane implementation from the kernel, so work is focused on ugly user-space hacks.

How do I find a house?

Linux 1999 - where did you get software?
- AltaVista, rpmfind.net, download and install.
- Grabbing random crap off the Internet and installing it.
- Or, you know, using apt-get (which was introduced in 1998). You don’t enhance the point of your presentation by painting a picture that just isn’t true; quite the contrary, it makes it hard to take other points seriously.
“Red Hat to the rescue” in 2003.
- Well, Yellow Dog actually. That’s why yum starts with a “y”.
- So see my last point above. Don’t make me think you’re prone to making shit up, or I can’t trust anything you say¹.
You can pull down random crap from the Internet now in the form of docker containers. Yay!
- Large scale studies (i.e. download docker.io and scan all the containers for known vulnerabilities) show public repos with 30% of images containing vulnerabilities.
DevOps means that developers have more responsibility for fixing security in their images.
- Which is a bit of a problem - are developers going to be focused on keeping up with CVEs and other issues that don’t relate to building stuff?
- The answer: trust, but verify. Make sure you’re using a secure base for builds that can be updated, and build scanning into your test suite.
- “RHEL certified images are the answer!” Sure, whatever you say.
- More interesting: Atomic vulnerability scan: an openSCAP-based tool for scanning and analysing filesystem images and containers and identify vulnerabilities. Can work with active containers. Can also plug in to other scanning systems (such as Black Duck).
- Hopefully will be available in RHEL 7.3.

Introducing Simple Signing

GPG-based signing.
Select which authorities you trust.
Multi-party signing.
Offline verification of images for disconnected networks.
Working with CoreOS on a common signing mechanism via the OCI.

Contrast with Docker’s signing mechanism, which is basically a lock-in play with Docker’s services.

Who Manages Your Container Environment?

Unsurprisingly the answer at a Red Hat conference is “OpenShift.”

OK, that’s a slightly unfair summary, because rolling your own container management solution is crazy unless you’ve got a lot of money and time to burn. From then it’s a question of which management layer makes most sense.

Community Standards

Red Hat are working with OCI (the Open Container Initiative) to avoid competing standards and lock-in.

Free Stuff

We got a copy of the Three Little Pigs container colouring in book, which is pretty sweet. You can find a copy here, and you can see it’s predecessor, the SE Linux colouring book.

Delivering Trusted Clouds

“How Intel and Red Hat integrated solutions for secure cloud computing.”

Steve Orrin - Federal Chief Technologist, Intel Steve Forage - Senior Director, Cloud Solutions, Red Hat

The Drive to Cloud

There are a variety of reasons given: speed, cost, quality, etc, etc.
“The 2% who say “don’t know” are remarkably honest”.
The US government has had mandates for going to cloud; some of those have gone well, some not so much.
- Subscription based procurement is attractive here.
The Cloud RAMP defines a common set of accreditation for cloud providers to ceritfy once, accept any government workload.

Key Challenges in Cloud Deployments

Increased data challenges - double digit growth in network traffic and data volume (40% increase year-on-year).
Data protection.
Business agility - actually taking advantage of cloud performance; if IT is no longer your bottleneck, does the rest of the business take advantage?

Key Security Challenges

Attacks on the infrastructure.
The focus of attackers is around breaking the virtualisation layers.
- In that vein, Intel and Red Hat are seeing co-tenancy attacks.
Regardless of direct attacks on the virt layer, or side channel attacks, it’s about being able to privot from poorly-secured VMs to well-secured VMs.
- Your security is now in part a function of the weakest VMs you are co-resident with on a public cloud.
- You wouldn’t have let an employee host their poorly-secured WordPress blog in your datacentre, but now you’re hosting your stack alongside an unknown number of them.
Visibiity is a challenge: cloud providers often obfuscate their runtime. You may not know where your data is, or that the vendor has implemented the standards and patters they say they have.
Most people have one degree or another of regulatory compliance in place: PCI-DSS, APRA, ENISA, HIPAA, NIST, and so on.
Commingled regulatory requirements add an extra layer of challenge.
One response to these challenges is continuous monitoring, rather than periodic audits; after all, if you’re aiming at the intent (say, no data breaches) with the mechanism (auditing or compliance).
Data use controls are going to be a huge challenge. Thinking about HIPPA requirements vs the desires of people using big data.
So how do you verify and trust the environment? How would I prove compliance?

Building Blocks for Trusted Cloud Architecture

TPM + TXT

Trusted Execution Technology (since 2010). Uses the TPM to verify the system boot integrity: BIOS/EFI, Kernel, etc.
Post 2010 the TPM has NVRAM that can be used to tag the device (sever location, for example).
The secure root of trust in the hardware is the root of these systems.

Trusted Compute Pools

Allows TPM and TXT information (attestation) to be bubbled up the stack. You can refuse to run if the BIOS, bootloader, kernel, Xen/KVM, etc have been tampered with.
This can be plugged into a chain of trust for other security tools and compute platforms.

Cloud Integrity Technology.

Allows trust to be used for the VM layer for, say, pooling nodes.
Trusted Geo-location arrived with version 2 - this is stashable in the NVRam.
Hardware acceleration of encryption (AES-NI) has been around for years now. “If you aren’t using encryption pervasively, shame on you. And if someone says it’s too slow, shame on them. Call them out.”
Hardware RNG (DRNG). Adhere’s to NIST standards for high speed number generator.
Latest chips have added acceleration for asymetrc encryption (the ADOX and ADCX instructions), so you can e.g. accelerate elliptic curve encryption.
As of CIT 3.0 you can have workload integrity on OpenStack. You an encrypt a workload (e.g. image) and can configure OpenStack such that only nodes of the correct spec can install and run a workload, which will then be hardware supported by the TPM/TXT mechanism.

What Red Hat are Delivering

RHCI (Red Hat Cloud Infrastructure) includes the individual CIT items into Satellite, Cloudforms, RHEL, RHEV, Ceph, Gluster, and OpenStack.
Hence you can get a trusted boot all the way up to OpenStack, for example.

Solutions for Trusted CLouds

These are solutions where Intel & Red Hat have been working with specific hardware vendors to be able to deliver vertically integrated stacks that take advantage of these features.

NEC Black Box Solution Stack

A cloud-in-a-box solution.
Intended as something that can be deployed by solution providers with = solution in a box for hosting providers.

CRSA

Mostly devoted to federal government contracts, e.g. hosting for Department of Homeland Security.
Red Hat and Intel worked to provide a FedRAMP: High rating that can run classified data.
Makes use of the TXT/Trusted Boot capability.
For FedRAMP High and DOD Level 6 you need around 400 controls to be in place.
Ceph for the storage, OSP Director, and RHEL 7 as the foundations of the stack.
OpenStack virtualisation.
CloudForms to manage the on-prem and public cloud.
Looking to add OpenShift by August.

Summary

Trust, but verify - that’s a recurring theme.
Verification is continuous.
If time-to-market is critical, don’t do science projects. Buy not build.

Migrating to JBoss and OpenShift

Nenad Bogojević and Fabrice Pipart - Amadeus Diógenes Rettori - Red Hat

A real-world case study of a migration of a business critical application. IT for the travel industry - they operate their own datacentres and write their own code.

Their ecommerce application is backs many airline booking systems - “chances are you booked your flight here with our back end”. Before the migration is was running on Java 7 and WebLogic 11, processing a billion searches a month, in a highly available, clustered environment, with 400 developing on it. Seven million lines of code supporting dozens of webapps, hundreds of EJBs, WAP, mobile, and 300 open source libraries (some of which are so dead you can only find the source with the Wayback Machine).

EAP on OpenShift

OpenShift and JBoss will autodiscover other cluster members deployed in the other pods.
The RHEL, EAP, OpenJDK components of the EAP image are all maintained by Red Hat, although you can build your own if you want.
Drivers & limitations of the migration project:
- Don’t rewrite functional code
- No customer impacts
- Provide tools for people.
From Windows to Linux; from WebLogic to JBoss; package in containers.
The main problem with the Windows to Linux migration were slashes, case sensitivity, and monitoring. Essentially this was pretty simply
WebLogic to JBoss. Straightforward except for EJB differences, JNDI conventions (WebLogic vs standard), Taglib behaviour, and classloader behaviour.
- They already had an external session management mechanism, which is probably the hairiest part of a migration.
In the end, only 500 lines of code were changed.

Containers

Base images relied on the upstream and using the standard images: Apache was pulled from the Collections, and JBoss was pulled from the OpenShift image, with a couple of tweaks.
There were a few iterations of the container layout until they hit one they were happy with:
- First iteration: an OpenShift router, Apache in one pod, and JBoss in another. Everything will just work - clustering, failover, and so on.
  - The probllem for Amadeus is they make heavy use of rewrites in Apache, which means the application logic is partially in Apache.
  - This doesn’t really marry up with this pattern - effectively every deploy is a deploy of both container types.
- Second iteration: Put Apache and JBoss in the same pod.
  - They treat them as a single deployment unit anyway from a code and lifecycle perspective.
  - Also, they ditched the JBoss cluster mechanism - they keep sessions in their internally developed Context Server project, which is deployed in a seperate pod.
- Third iteration: Conway’s Law kicks in.
  - Divvy the deployments up based on the way the organisation works.
  - A web pod with Apache & Jboss.
  - A backend pod with JBoss providing HTTP/JSON services.
  - A pod with Context Server.
- The third (and final) iteration makes deployments easier by lining them up with business units.
Farms as OpenShift Projects, for multi-region deployments.
Deployments - blue/green deploments today, tomorrow (with OpenShift 3.3) for intelligent canary and A/B testing.

Developer Experience

Production environment in seconds, not hours.
Rapid iterations.
Shiny new components and tools.

Proposed Solution

Windows/Linux/OS X laptop.
The ADK (Amadeus Dev Kit).
Browsers.
IDE: Eclipse, IntelliJ, etc.
Source control: git.
Vagrant and VirtualBox; the initial experiment was to re-run Vagrant every time, but it’s so slow and painful that was abandoned.
The new approach is a long-running VM:
- There’s a docker container provided with the build tools.
- Deplyment to OpenShift on the developer machine, which lives in the long running VM.
- This is as quick as you could want.
Ghosting is used as a container on the local machine to provide stubbing.

Q&A

Were the developers delighted? How long did it take? About 50 of the developers are on the new toolset, while the team iterate on improving the toolset. “It was more or less a no-brainer for them to use it.”
Is there any measure of satisfaction between the devs and ops? Not really. The ops teams do like the fact they have a box with a contract around monitoring and otherwise don’t care about the contents of the box.

Seriously. This is the worst thing at single-company conferences: people saying things that just aren’t true to fit a point. It’s generally a problem talking to vendor reps generally: if you’re a pre-sales engineer, for example, telling me things that just aren’t true, I can’t trust that you’ll ever tell the truth. Just don’t do it. ↩︎