linux.conf.au 2015 Day 4
Keynotes
Cooper Lees - Facebook
- Facebook open source all their infrastructure code (btrfs, HHVM, etc, etc).
- 100s of GitHub reports, 1000s of contributers.
- 10x growth in pull requests in the last 2 years.
- Started out doing code dumps, but they've been learning.
- Facebook are a mercurial shop.
- Top of rack SDN (Facebook networking - FBOSS); part of a general push to open source the datacentre.
- Netowrks should be like modern servers: fully automatible, infrastructure as code.
- There's a local company using FB DC.
- opencompute.org - the key goal is to seperate switching hardware and software, commoditise both.
- Generic x86 core running CentOS.
- A specialised, open ASIC for the switch ports.
- 16-32 40GE ports pasted off this design.
- Fully managed/automated via Puppet.
- FBOSS is standard daemon that implements all the switching software stack.
- Open sourcing Real Soon Now.
Carol Smith - Google Summer of Code
- GSoC is an apprenticeship-type project, sponsored by Google, to work on an open source project so they learn how to do programming in the real world.
- As well as school there are contacts and references.
- The projects, of course, benefit.
- Opens in Feb, with projects submitted in March. Matching between organisations and students is completed by late April.
- Last year was the 10th anniversary.
- Growth outside traditional tech countries is continuing - there were more students from India than the US last year.
Mark McLoughlin - Red Hat OpenStack
- An enumeration of infrastructure tends.
- Big companies need to not be the elephants in the room.
- Everyone wants a piece of the DevOps/IaaS/PaaS action.
- Even Telcos.
- Telcos are 'special'.
- ...but they're feeling squeezed. And they need to be more responsive.
- The Telco Datacentre:
- Expensive, proprietary kit. All of it, not just the specialised kit.
- Years to comission and decomission.
- "Network Functions Virtualisation" - a telco trend to move to API driven infrastructure.
- ...but they still want to be 'carrier grade'.
- Telcos are converging on a common data centre: OpenStack, Open Daylight, Open vSwitch, KVM, Puppet.
- This has lead to the Linux Foundation's OPNFV - an effort to build a reference Telco architecture.
- Why Open Source:
- Telcos are doing co-opertition.
- Diversity drives innovation.
- Sustainability - you don't become locked in to maintaining properitary stacks.
A call to arms: do stuff people care about! Make money! Build the foundation of free society!
OFConnect: An OpenFlowSDN Library For Everyone
- codechix - support women in technology.
- Deepa and Ramya have been working in the code for some time.
- Problems: lots of duplication, many implementations are actually tighly coupled to particular hardware, and poor support for multiple versions.
- OFconnect abstracts out the OpenFlow channel.
- This talk was a lot cleverer than I am. Very good in-depth explanation of the guts of the software.
- Integration is "an interesting problem".
- TODO: IPv6, TLS, Scale to 256K connections, profile and benchmark.
- Contributions welcome.
- That's one hell of a learning project.
8 Writers in Under 8 Months: From Zero to a Docs Team
- "This does not necessarily represent the hiring practises of Rackspace as a company".
- This is how Lana fixed the problem of only one person trying to do too much documentation.
- Noodling is a sport where you catch catfish with your bare hands.
- Four in Australia, four in the US.
Building Fast (Without a Terrible Culture)
- Lana spends a lot of time thinking about culture and asking her team if they're happy.
- "There were originally more swears in the slides".
- "Management by meme". Everyone gets a meme when they start.
- It sounds silly, but the effect is interesting.
- Birthday themes are a thing.
- Memes = in-jokes = bonding through shared culture.
- Work/life balance is critical. Flexible culture is important. People have lives outside that intrude on work time.
- Remote work cultures require a certain amount of lulz in online meetings.
- IRC is awesome.
- Staff are humans, not robots. People have issues, feelings. Be flexible.
- Me: "It's just business" is bullshit.
Hiring Great People Fast
- Work your networks, both ways. Your great jobs probably aren't going to arrive on LinkedIn. Ditto staff.
- Theoretically it shouldn't be that way, but it is.
- Always Google potential hires.
- Talk to friends. Meet people outside interviews.
- "The culture thing always comes first. I can train your skills."
- Find great people and then work out how to hire them.
Toolchains and Systems
- Culture is great and awesome, but work must happen at some point.
- Let people grow tools organically, especially when you're growing fast.
- You'll want to get rid of stuff. Don't lock yourself in before you have worked out what you do and don't need.
- Have a vision and work towards it. Evangalise it. Accept it won't happen all at once.
- You need to make people want to do what you want them to do.
- Get the right people and give them space to work on getting you there.
- Processes should be agile - the processes themselves are imperfect and need to evolve.
- You can't maintain peak momentum. People will burn out and rage-quit.
- Concentrate on hardening:
- Let the team gel and sort shit out.
- The same goes for tool chains.
- Accept the slower rate of change.
- (But don't be closed to awesome opportunities.)
Vaultaire
- RRD images "take up a lot of screens to tell you not very much."
- Once upon a time disk space was expensive and we couldn't keep all the things forever, so you'd store your data in lossy fashion - "we had 7.8 servers in a cluster."
- Challenges:
- Don't lose resolution to compaction.
- Store raw points.
- Don't worry about space.
- Don't calculate rates - don't store first derivitives. You'll lose significant anomilies.
- Reducing granularity should be a decision made at analysis time, not when you're storing the data.
Ceph is the answer!
Andrew gives a nice overview of ceph, finishing up with "hey, just feed the data in and make graphs, ha ha only joking." There is, of course, complexity.
- The datapoints are immutable. This is unusual for database problems, where updates are required.
- If you want to do resilience, it's helpfull if actions are idempotent; that is, it doesn't matter if actions are safe to repeat.
- Don't carry state.
"I may not agree with our methodology, but I'll store to capacity the data you apply it to." - Vaultaire.
"I had a really clever idea, which I've learned to be wary of."
- If you add unordered but unique data, ordering the data as at arrives doesn't matter because it's all unique.
- Clients send data to a broker. Clients retry until the broker comes back.
- Many problems (like the two generals problem) disappear when you aren't a problem when the data is idempotent.
Vaultaire v1
* ceph buckets used to uniquely and deterministicly describe the location of data points, which would allow you to take advantage of the ceph CRUSH algorithm to index and locate data with minimal overhead.
* The data itself is serialised.
Unfortunately the ceph cluster burned to the ground.
It fell foul of a few factors:
- You can have an unlimited number of objects.
- They can be an unlimited size.
- As long as they're all 4MB.
- (This is how librados stripes data across disks.)
- The OSDs are actually single threaded when reading or writing.
- Unlimited concurrent writes? Not so much.
This leads to it taking 3.5 minutes to write 1/4 million objects. At this scale, operations per second become the limitation for ceph performance; it appears to scale linearly with the number of OSDs.
Consistency is hard. Sooner or later you have to serialise: in ceph's case, that's when the OSDs need to check the blocks are replicated properly.
Vaultaire v2
All legacy host metrics + OpenStack metering: 2,500 op/s, 1600% CPU usage, and > 32 GB of RAM using. Wut.
- Using maps to maintain data structres. Multiple maps for MVCC.
- This was fine.
- The problem was that alloc and GC were flogging the machine to death.
- -> protobufs make garbage - it's a function needing to convert the protobuf format into the data structure for the higher level language.
- So hashing the data over a fixed set of buckets.
- This was informed by testing all the data into a single bucket, where ceph could swallow all the data with basically no load.
- So putting that data into a scattering of a fixed nuber of buckets makes the worldload look more like a pure write workload.
- There are a couple of types of data: simple fixed points and variable-length extended for e.g. web logs.
- Fixed-offset serialisation format: three 64 bit words: address, timestamp, payload.
- Variable length adds a length field and a data field.
This now has > 1 PB of data stored across < 900,000 objects and requires about 10% CPU.
Consistency vs Availability
- If ceph can't write, ceph will block. This is a feature, not a bug.
- Think of it like the compiler failing your code rather than compiling buggy programs.
- ceph punts availability for consistency.
- Andrew has seen a full-cluster lockup due to infiniband problems - but no data was lost.
Programming Considered Harmful?
- Science vs engineering vs programming. Science is about provable constructs, engineering is about provably meeting requirements, programming is producing code.
- Any monkey can program.
- Code monkeys are less than ideal.
- Dave contrasts Eric Raymond's Real Programmer ("so clever I am terrifying") with Dijkstra's Humble Programmer ("humbly aware of the limits of their own brain, eschews tricks"). No prizes for guessing who wins, with reference to the Dunning/Kruger effect.
Lesson 1: Learning is critical to being able to make a positive contribution.
Lesson 2: Learning is hard. Learn the fundamentals so you can hang on to the hard concepts.
Wise words 1: Don't try to keep everything in your head: keep pointers to where the answers are.
- Complexity needs to be limited.
- Programming, applied mathematics, is one of the most difficult branches of mathematics.
- Testing doesn't prove that there are no bugs, merely that they couldn't be found.
- So the only answer is not to write bugs.
- That's almost impossible.
Conclusion 2: There are no programmers capable of writing bug free software at typical modern program scale.
- Documentation of the goals, the design, is the fix to bad code. Not just band-aids to the code.
Lesson 3: You need a design to understand if your code is fit for purpose.
- Thompson's "Tusting Trust" essay: you can only trust your own code. Dave doesn't even trust that.
Conclusion 3: You can't scale beyond one human brain.
Conclusion 4: Programming is not a viable method of developing large software projects.
Is there a solution?
Guru meditation: gurus, sub-gurus, gurus in training, and so on, all the way down to minions in training.
Knowledge transfer is the most critical process in any large project.
So: if we don't understand problems properly, because they aren't in our skull, we need to work together.
And we need reviewers to take their responsibility seriously: a review should be an apply, a build, a test. You must actually verify it.
Because "thought review" requires the ability to fit the problem space in your head and fully understand it. Which we know doesn't work.
- If people have questions about a patch - it isn't ready.
- Patches must be properly documented - if the documentation and commit don't make it obvious what the code is doing, then it's not ready.
- Context is king.
- None of this is programming: documenting requirements, understanding the risk of taking vs NAKing a patch, these things are neither computer science or programming, they're engineering.
- Engineering assumes things will break. Programming needs things to be perfect. Perfection is impossible. So programming is impossible.