LCA 2019 Day 1

2019-01-21 · 2884 words · 14 minute read

Conferences · Technology

lca · lca2019

LCA is in Christchurch this year; the location in Christchurch dovetails nicely with the stereotypes, being flat, open, and full of creeks and nice gardens and parks.

Christchurch scores in the swag department: a hessian sack that smells nice, not full of leaflets that I’ll end up throwing away, and a Raspberry Pi Zero, complete with a pre-loaded SD card, which is both appropriate and neat anyway. It also scores an acknowledgement to Ngai Tahu. Accomodation is close to the conference, and you can also score a cheap week long gym membership at the uni rec centre, which is a nice bonus.

Conference Opening

We are encouraged to obey the pacman rule; if you haven’t run into that before, groups would do others a kindness to leave a gap in conversational circles so there’s somewhere to join conversations.

(I have one of those old person reveries where I am left gobsmacked by the fact that a conference can afford to give away what would, in my childhood, been unimaginable compute power.)

The traditional “have you bothered showing up?” prize is announced; Chromebooks will be given to people who bother showing up to the morning sessions.

Kathy Reid

Kathy is the current Linux Australia president, and notes that it’s the 20th anniversay of the conference, and thanks the successive volunteer teams who have put on what is “one of the best regarded technical conferences in the world.” She would like to encourage us to joing Linux Australia - it’s free!

Charity Raffle - Digital Future Aotearoa

This year’s charity is Digital Future Aotearoa, and the price is a Lego Mindstorms kit. Digital Future Aotearoa is interested in closing the digital divide in New Zealand; in the first year in Christchurch an illustration of that was that in their first year, there was a 70:1 difference in uptake from students in their coding clubs depending on whether they lived in the west or east of Christchurch. It turned out the difference was the number of families with Internet in the home.

Other initiatives include the she can code and {code club}/Aotearoa; finally there is Code Club 4 Teachers to help teachers in the classroom.

Finally, they are running workshops for children of attendees.

Chaining Some Blocks Together

Josh Deprez

A game-inspired programming language and associated web-based IDE written in Go.

Shenzen Go - draw diagrams and write code. You can draw diagrams consisting of blocks, which pass messages between one another; each node(block) is logic, and the lines show the flow of information.

Shenzen Go programs in, pure Go out. Everything is a Go concept. Each node is a goroutine, and each arrow is a channel. Producing pure Go allows for interop whith other go.

Write code where code makes the most sense.
Run n copies by tuning a parameter.
Take advantage of pipelines without even thinking.
you can seew the Go and visual representation side-by-side.

“Conference Driven Development” - features tend to be delivered in the run-up to conferences. Giving a presentation is a dopamine hit, leading to a rush of development. Another dopamine rush are the watching, starring, and forking on git.

Hard ideas:

Writing a “web app"in Go with GopherJS (>2,000 lines of Go). A JS framework would have been a lot easier.
Communicating over gRPC Web. It turns out this is a challenge, too - a lot of the frameworks are little-known.
Adding generics (1,000 lines of Go). This makes it a lot easier to implement the node intercommunication, because otherwise the type-incompatibility of different node input and outputs would leak the underlying Go types into the experience.

Josh notes that while people are interested, but feels like he’s long way from a tipping point where many people use and contribute to it.

Python++

Jan Groth

(Speaker note: grey text on a black background on a projector are… not ideal for readability.)

Python 3 is now more than ten years old. It’s an eon in computer terms. So why is Python 2 still around? Python 2 goes end of life this year. There are not more bug fixes or security updates from 2020 onwards. Move on!

Tip 1 Even though most sysadmins aren’t writing tremendously version-dependent code, but you should use venv or pipenv to build your code anyway; it will make your life easier in the long run.

Tip 2 Use the coding and naming conventions for Python. It takes a little up-front effort to learn, but it makes it easier for yourself and others later.

Moreover, when picking filenames, choose names which are descriptive. File names are not movie spoilers.

When your code is complex, while comments are important, the structure of your code is important. Extract code into methods if it doesn’t logically belong together.

Beautiful Code is a matter of style, and, in Jan’s view, simplicity. If there are many ways to implement a piece of code, consider than it is probably best for your code to be pythonic, concise, and to make use of the built-ins. Don’t use a counter when you can use range; don’t use a loop when you can use a list comprehension.

Tip 3 Comments should explain the why you are doing a thing, not what you are doing. The code is the what.

Tip 4 Classes: people often don’t use classes, but they can make your code easier to understand in the long run. A class is a blueprint for objects; an object encapsulates state and behaviour.

Classes make re-use easy.
Sub-classing.
Classes are the door to a whole new world.

Tip 5 Use the right tool for the job. Using an IDE really helps a lot. Removing unused imports, reformatting to python standards, using breakpoints effectively.

This was a good talk - and perfect for an occasional Python programmer like myself.

Lunch

Being an idiot, I forgot to pack my laptop charger and had to dash off to a big box store to get a new one; since it was a little distance away, I tried out a Lime scooter. This created the entertaining-to-me vignette of Matthew Garret explaining to a small crowd why e-scooter hire is a horrible black hole of terrible companies doing terrible software and hardware, puntuated by a black Trans-Am \exercising it’s voice synth in the background, while, in the foreground I merrily signed up for the Lime application¹ and plugged in my details and credit card.

The Lime scooter I made off with was an interesting experience. They have a much higher deck and centre of gravity than my eMicro Condor, leading them to feel quite a bit more precarious, a sensation not helped by the fairly mediocre brakes. Other than that it was a good experience blatting around Christchurch’s flat streets and roomy footpaths/cycle lanes.

Distributed Storage is Easier Now or THe Joy of Spreadsheets

Josh Simmons

“This,” Chris notes, “is the first time I have introduced a presentation by someone who is married to me.”

“This is an ode to boring and uncool technology,” notes Josh. He was going through his grandfather’s estate and found in his paperwork a note urging “Don’t discard old techniques.” This resonated with him.

So why spreadsheets - spreadsheets are familiar to everyone. Batteries are very much included - the “standard library” is use for maths, scientific functions, finance, you name it.

They’re maintainable-ish; most importantly you don’t need a programmer to maintain a spreadsheet.

They’re super extensible: formulae, macros (whch are Turing complete), data connectors. And they integrate with other office applications - email, databases, whatever.

My Journey with Spreadsheets

Josh is a CFO for the OSI, which means he is responsble for annual budgets, financial projections, scenario planning, and so on.

Josh could give us many examples, but the one he’s going to show in the time he has available, is one he calls The Generator: mail merge with custom attachments. Those attachments might be acceptance letters, certificates, forms, or all manner of things; it takes a spreadsheet as a data source, and then merges this with a set of templates, and sends the mail with attachments. This is abou the equivalent of 0.6 of an FTE.

The Generator also has dialogue-driven workflow, including presets and previews. Is this going too far?

Maybe, Josh allowed, when he found himself writing a function called leftJoinSheets(), that was going too far. But other people have gone further.

But wait, there’s more! 2048 in spreadsheets! Monopoly in spreadsheets! A 3D game in spreadsheets!

But back to the serious point: when thinking about chosing the right solution to problems, bear in mind that we can chose well-known technologies. And bear in mind that a spreadsheet is great if you need people who aren’t programmers to be able to maintain.

How much do you trust that package?

Benno Rice

“Everyone knows npm is bad! Why is that!?”

It installs tonnes of packages!
Sometimes they disappear!
Sometimes they have malware!
It’s a central point of failure!

Because these things really only happen to npm, right?! This is a supply chain problem - a term pulled from traditional manufacturing. Supply chain attacks have gained prominence in the last few years, most notably as a result of the Bloomberg story claiming that Apple and SUpermicro had been compromised.

(Pity that story is almost certainly bullshit.)

Sabotage isn’t the only possible problem; lack of maintenance, defects, and unavailability can cause you serious problems, too. And while the supply chain used to refer solely to hardware, the same concepts map onto software: do you understand the provenance of all your third party software? Do you think about the fact your compiler or language runtime are third-party components? Or that third parties have third parties?

And that’s before you come to the malicious stuff - hackers and other adversaries who may be trying to break into your code, or break into someone else’s code via your code. You don’t even need to break into the software or repos - you could simply rely on confusion around “color” and “colour” in the name of packages.

So why is npm mentioned so often? JavaScript relies on composition and libraries heavily, so the dependency tree is larger, yielding a larger supply chain; Electron is another reason, it’s incredibly popular. And, lastly and perhaps most importantly, contempt culture: a bunch of people like excuses to shit on JavaScript.

So how do we stop this?

Support the maintainers. Recognition, like exposure, does not pay the bills.
Process: Some people (nerds) hate process. But you should have some.
- When you select a thing, have some process around understanding what and why you’re pulling in dependencies.
- Have a process to stay current.
- Have a process to review and audit third-party code.

Benno gives an example of flagging included libraries as known-good (checked, audited) and flagging them unknown for further scrutiny by the security team.

So it’s not just npm. Everyone has this problem, and if they haven’t yet, they will. When you see these problems, you shouldn’t point and laugh; you should understand what went wrong and whether it could go wrong for you.

Load Balancing Demystified

Murali Suriar and Laura Nolan

Murali and Laura created this talk to fill a gap around “doing load balancing” up and down the stack with modern applications. This is a “here are all the tools, and here are how they interact, and the things you should think about.”

LB failures are often dropped requests.
It’s always in your serving path.
Huge impact on the performance and resilience of your application.

Let’s start with Superb Owls. You’ll probably have a DNS record that points to an IP address, with edge routers which advertise your network to the Internet via BGP (containing that address). This doesn’t give a lot of availability; you’ve only got one server.

This gives some availability (DNS and your network are seperate, for example).

A simple improvement in performance and availability could be adding another server; this is a step function up. In this setup you then update the DNS record to round-robin the two addresses. Depending on the client obeys the spec - which is not a given in the real world! - you’ll have about half the load on each server. Unfortunately, when one server fails, half your clients will have timeouts.

In this world, we pull the IP address out of the DNS records; it will take time to propogate (depending on the TTL). Most people either have very short (10 seconds or less) or long (1 day plus) TTLs.

Long TTLs: users will take a long time to see changes.
Short TTLs: higher DNS load, higher client load latency, more likely to notice any DNS outages or other problems, many clients don’t obey very short TTLs.

This gives some load distribution, but not load balancing.
Very minimal high availability. Requires extra automation or manual intervention, and takes time to propogate changes on failure.
Flexibility: allows operators or automation tools to make changes, but the effect is delayed and uneven.

An answer to the failover problem is to put a load balancer in front of your servers; you can go back to a single A record in the DNS. The network load balancer abstracts away the backends from the inbound traffic. It is, however entirely ignorant of the application itself. Network load balancers hash of a network address; a common choice is a has of a 5-tuple. The algorithm has to be careful to preserve state, since TCP tends to interpret packets being re-ordered by a load balancer as packet loss. Which causes the stack to slow down.

This gives us a lot more power. We can change the back ends in a way which is completely invisible to the users. We get:

Good load distribution.
Good availability.
Good flexibility.

However, it doesn’t give you a load balancer with any understanding of the load distribution or any content awareness.

This is all great, but gives you no resilience if you lose a datacentre. So maybe you set up a second datacentre. There are a number of ways to do this; one is anycast which is “a whole other rant”; another is multiple A records, one per datacentre in the cluster. The problem with this is you go right back TTL problems; a solution to this is to allow each datacentre to present all the IPs in the A records; normally they present only the relevant address, but will lift up others if the other datacentres fail (I feel like that’s an inadequately clear).

Running short on time: there’s a lot more in the slides. You should go read about them, as they cover areas such as Layer 7 balancers. You should think about which things matter to you out of the menu of topics.

Prometheus Demystified

Simon Lyall

Intro

Prometheus is for metrics: name + timestamp + value
Single daemon which connects to exporters via HTTP; 10 - 15 seconds is typical.
Stored on local disk.
Exports an API.
The longer the metric the more likely it is to cause problems.

Getting Data

Exporters:
- Gather metrics from source.
- Expose http endpoint.
- Around 100 different ones available.
Applications can also expose metrics of their own.
- Your applications should expose them.
Prometheus can gather metrics at every level of the stack, and you should: hypervisor/cloud, VM, container, k8s, load balancers, app servers, etc.

Problems:

You can be pulling thousands of metrics per server.
There can be overlaps of metrics with slightly different labels and values for the same thing.
Some cost money - for example, pinging CloudWatch every 10 seconds will cost you a fortune (running into the thousands per month).
Many applications aren’t instrumented.

So what should you get?

Small: standard exporters, black-box the edge; texfile via node_exporter for anything you’re especially interested in.
Medium: many standard exporters.
Large: instrumented code, federation and summaries.

Service Discovery

Small: static discovery.
Medium to Large: template it all.

Alerting

Alerting is hard.
It’s hard to find good templates.
There’s a lot of trash on the Internet, sadly.
Not going into detail, but you should think about:
- Alerts feed into an alert manager (for example, amtool).
- Silences are important for maintenance.
- Labels are critical once you’ve got more than one team.
- Look at PagerDuty / Victorops / Opsgenie / Pagertree once it starts getting important.
  - Opsgenie and Pagertree have a free tier.

Storage

Problem: it’s a random write heavy workload.
Read quesries may run against large amounts of data.
TSDB is very good up to a point, but:
- No resilience.
- Hard to backup.
- Can corrupt.
- Replacements are new and hard to run.
If you’re:
- Small, backup regularly and rollback in the event of an outage.
- Medium: backup regularly and run two instances in parallel.
- Large: loog at Thanos, M3, InfluxDB.
- Federate to scale.

Display

The built-in dashboard is… there.
The API can be used by other tools. Grafana is the only one Simon has found.
- If Grafana doesn’t do it, you’re shit out of luck.
- Very well-known and well-tested.
- Talks to Alertmanager, as well.
- Downside is there are a lot of dashboards, but most of them aren’t that great quality-wise.
  - Sometimes they are buggy, too.

Pleasingly enough it seems to use only a minimal set of permissions and does not, say, attempt to hoover up contacts or the ability to poke around other applications you may be running. ↩︎