Red Hat Summit 2018 Summit Day 3

Opening the keynote at 8:00 instead of 8:30 was an interesting choice for the final day. Who does that? And worse, who stops serving breakfast early? Inhumane, I tells you.
Keynote
I came to the keynote at, uh, 8:30 so I missed the boilerplate, but still in time for the game demo.
Game Demo
This main point of this is that RH will multi-cloud all the things.
- RH Container Native Storage: Hosted georeplicated gluster. It's cool, but if you know Gluster you know how it works; the containerisation and management is pretty slick. I remain to be convinced the actual deployment will be as slick as the demo.
- AI/Tesorflow with data in the CNS.
- RH SSO: Lets you front multiple identity providers across your cloud.
Hot failover of logic and data across clouds. Essentially RH are pitching that with the right bits of their stack (or, more accurately, their packing of the bits that make up the stack) you can be independent of particular cloud providers, using them as a spot market, and moving data and compute from one cloud to another when drive by cost or availability. It's a pretty compelling message if you want clound-native type applications and don't want your cloud provider to have the same hold over you that, say, IBM have over their zOS customers.
This was impressive, and is the main thrust of a lot of the Red Hat messaging this week.
Introducing AMQ Streams: Data streaming with Apache Kafka
David Ingham, Paolo Patierno
AMQ includes a bundle of use cases: broker (Apache ActiveMQ), interconnect/routing (Aoache Qpid dispatch router), and now streaming (Apache Kafka); there's also the Online version which is an OpenShift based managed service.
What is Kafka?
- Pub/Sub messaging?
- Streaming data platform?
- Distributed, fault=-tolerant commit log?
Concepts:
- Messages are sent to and recieved from a topic.
- Topics are partitioned - shareds.
- Worked is performed on the partition; the topic is a virtual option.
- Messages are written into only one partition.
- Partitions are based on the key.
- Ordering is fixed.
- Retention.
- Based on size or message age.
- Compacted based on message key.
Kafka knows nothing about consumers; topic partitions are distributed amongst the brokers, with a "leader" and one or more "followers" providing the resilience. The subscribers only interact with the lead partition. On broker failure, the follower partitions will hold an election to determine the new leader.
Why Should You Use AMQ Streams?
- Horizontally scalable.
- Message ordering guarantee at partition level.
- You need to be careful when adding partitions, since you can end up with the same key hashing to multiple partitions, creating overlaps.
- Message rewind/replay to the limit of expiry.
- Application state can be reconstructed via replay.
- Combined with compacted topics Kafka can be used as a key-value store.
Kafka on OpenShift
- Based on OSS project called Strimzi.
- Provides Docker images for running Apache Kafka and Zookeeper.
- Uses the operator model under Kubernetes.
Challenges:
- Kafka is stateful:
- Requires a stable broker identity.
- Discovery between brokers.
- Durable state.
- Post-failure recovery.
- The above is true for Zookeeper.
- StatefulSets, PersistenVolumeClaims, and Services help but don't solve all the problems.
It's still not easy, though.
Goals:
- Native, easy deployment on OpenShift.
- Provisioning.
- Topic management.
- Remove the need for the Kafka command-line.
- Better integration with applications running on OpenShift.
- Microservices, data streaming, event sourcing.
This uses the Operator model for Kubernetes.
- An appication used to create, configure, and manage other complex applications.
- Contains specific domain/application knowledge.
- Controller operates based on input from Config Maps or Custom Resource Definitions.
- User describes the desired state.
- Controller applies this state to the application.
- Watches the desired state and the actual state, taking appropriate actions.
- Allows per-user control over which operations are permitted, too.
- Can deploy Kafka and Kafka connect (with S2I support for different connectors).
- ConfigMap specifies number of nodes, Broker config, Healthchecks, and Prometheus metric.
- Ephemeral or Persistent storage.
- The ConfigMap can be used to reconfigure the cluster, including rolling upgrades.
- Also for deprovisioning.
- Managed through the OpenShift console.
There is also a Topic Controller managed via the ConfigMap operator mechanism:
- A ConfigMap describes the topic, including replicas, cleanup policy, partitions, and so on.
- OpenShift will create and manage the topic.
I've got to say that the coolest thing for me from this session wasn't Kafka (although I'm totally down with an event-based view of the world), but the story around Operators: being able to as-a-service anything you can deploy via Kube is a pretty Big Thing, and probably the next big (potential) game changer for kube-managed infrastructure. It's one thing to be able to, say, use kube to auto-deploy and auto-scale a DB cluster, but you were still falling back on having to manage all your day two ops (like backups, for example) through your traditional mechanisms, with ad-hoc integration/automation.
Being able to wrap all that logic into a kube operator gives you (yet another) shift in the container story: now you're not just taking the pain out of deployment, scaling, and reclaiming resources, you're taking the pain out of your day 2 ops: when you autoprovision a DB cluster, you can autoprovision your snapshotting and backups. More importantly, if you're a vendor or packager you can build that logic into what you ship, so you can provide (e.g.) something equivalent to the RDS experience anywhere.
OpenShift in a multi-datacentre environment at La Poste
Guilhem Vianes, André Enquin, Jafar Chraibi
A quarter million employees, 17,000 retail outlets, 44 countries, and 24 billion euros in revenue.
- National delivery of mail.
- Newspaper delivery.
- Access banking services.
Five divisions:
- Mail & parcels.
- International delivery.
- Post Offices.
- Banking.
- Services & Technology.
- Reference partner for e-commerce.
- 328 applications.
- 130 projects.
- 4 data centres.
- 1,000 servers.
- Spending half a billion euros between now and 2020.
- IaaS: VMWare; PaaS: OpenShift.
- Started automating in 2015 with vSuite; in October started building OpenShift on VMWare.
- Prefer to build out-of-the-box with minimal customeisation.
ITaaS task force: an IT department inside the IT department:
* Replatform legacy appications, e.g. LAMP and Java.
* Deploy new applications.
OpenShift CI/CD: GitLab, Jenkins, Selenium, sonarqube, checkmark, and HP Quality Centre.
Seperate build and run clusters. Security reporting is via the OC client to collect information about the images and shows the level of compliance.
Deployment will be block if there is no automated testing.
Two new datacentres: they must be isolated except for a with a low-latency network (Cisco ACI). Total of 3 DCs and a public cloud provider; the old datacentre is two rooms with common replication. OpenShift is in all sites.
Ansible and Tower are used to configure and deploy the IaaS services. Application deployment became a challenge because they could no longer rely on the shared data storage of the original. Instead Jenkins is used to drive which cluster(s) an application is delivered to; this does not address the problem of routing to the correct datacentre or persistant storage.
Principles:
- Applications are responsible for cross-DC resilience.
- no Infrastructure service/SLA to provide cross-DC replication.
- Applications can use distributed sared case.
- The application is deployed on 2 datacentres with Jenkins, using affinity placement labels to target the right DC.
- Applications have affinity for services in the same DC for better performance.
- APIs are exposed via a per-DC gateway.
- An external LB balances cross-site traffic in a global load balancer pattern.
- Quotas and limits based on high-level sizing (S, M, L, XL) on the projects.
- The automation requirement means that teams can trivially update, e.g. when Drupal needed to be patched, they were able to do so in a few hours for all ten production instances.