LCA 2016 Day 5

Jesus, people, how hard is it to get to the keynote by 9? And why would you spend all that time and money on coming to a conference only to bail on the sessions? Still, more chances for the rest of us to win the spot prizes.

Keynote - Building the Future

Genevieve Bell

A Brief Introduction

The second time I’ve seen an anthropologist keynoting an LCA, after Biella Coleman in 2010.

Not a talk about “the next big thing”, a talk about what our responsibilities are while we build the future. Studied “Native American history” and “feminist theory”. Got a job by meeting someone in a bar, who then called every anthropologist in a the bay area, asking if they had a “red headed Australian”, so Standford gave him Genevieve’s home number, and offered a free lunch, which is apparently irresistable if you’ve been a grad student at some point in their history.

This lead to interviewing at Intel, which resulted in a job offer. And Genevieve rejecting the job offer. For seven months. While they continued to ask. Then one day she woke up and thought, “stuff it,” in no small part because of her mother’s belief that you should change the world. Even after she explained she was an unreconstructed Marxist and a feminist.

When she turned up, her new boss explained they needed help with two things: women.

“Which women?”

“All of them.”

“All 3.2 billion of them?”

“Yes.”

“What do you want me to do with all women?”

“Tell us what they want.”

“So I made a note in my notebook: women. All.”

“Also we have an RoW problem.”

“What?”

“A rest of world problem.”

“What is the ‘world’ such that you have a rest of world?”

“Well, America is the world.”

Either the “biggest mistake of my life”, or eternal job security. So “I’ve spent the last 17 years explaining everyone else to engineers.”

A Fascination With the Future

A fascination with future has been a longstanding aspect of humanity: Oracles, reading entrails, you name it. We’ve changed how we explore the future, but the fascination remains: science fiction is one. Geneveive shows us some ads from Life in 1957: a self-driving car. Boxes that warm and cool your house. Televisions that hang on the wall.

It doesn’t just tell us about their vision of their future, but their present: they already had traffic congestion and car safety, for example, a desire to spend less time on housework, and so on.

Ten years later, Gordon Moore laid out his vision of the future: computers in homes connected to a network, portable phones, autonomous vehicles, and Moore’s Law.

We have a cycle: New technology is great! It’s awful! Wait, was that all?

But how this happens, how the future evolves is deeply cultural: as Australians, things like ATMs and the Interent had a different (faster) rate of adoption than in other places. The future is uneven, and a view of it depends on your culture.

Data: it’s easy for data to be creepy. Putting a card in an ATM and getting a “happy birthday” greeting was one. How do you know this? What do else do you know? And who are you sharing with them?

“Recently I watched a woman stand on a table to photograph her food - and I was the only person who thought that was interesting.”

Making Sense of the Future of Compute

Asking what the net big thing is misses the point - it’s how our lives adapt to technology, how it affects us, our legal frameworks, our society.

Consider:

Connectivity

Our conversations focus on cost and regulation. Our devices function best with connectivity, but as humans we function best when we’ve got moments of disconnection. We used to hack around this by, for example, holidaying in connectivity black spots. But then 4G came along and we’ve begun to lose that.

We have longstanding social mechanisms - religious observances, holidays and so on, to create different, “seamful” experiences.

But technology is pushing seamlessness, but context matters: you don’t want your TV to migrate to your car or work laptop. You don’t want every photo on your phone backed up to the house hard drive.

Solving that friction is harder than the NBN.

The Internet of Things

A pervasively connected house might not actually feel that good. Fifty billion connected devices. What does it mean to connect those things - what happens when your thermostat gets in wrong? What happens when you’re monitoring your house - your spouse, your children - with your camera remotely?

Some of those are good: the husband who discovered that the laundry does, in fact, need to be done every day if he likes clean clothes.

Conversely - who knows about your toilet habits. Your doctor? Your medical insurance? People quickly pivot from privacy to gossip: you mean my house will gossip about me?

Who has control of these systems? Who makes decisions about what the monitor, what they measure, what is “good” and “bad”? How are they measured.

Big Data

Data scientists like to sat “more data equals more truth”, but it just means more data. The data set may be complicated, inaccurate, incomplete. The average human tells a scores, or hundreds of lies a day. Mostly those are white lies - but it skews the data.

100% of Americans lie in their online dating profiles, in a Cornell studies. Men lie about their height. Women lie about their weight.

But this is an interesting challenge. Researchers like the appearance of understanding the truth - what are the differences between your professed exercise habits and what your fitbit says. But your fitbit might make mistakes. And the reasons for the difference are actually the most interesting.

The early Internet was driven by the idea of transparency, but that’s just an ideology, not an absolute good.

The Misfit Genevieve bought was owned by one company. It - along with her data - has been purchased by another. She has no say in that.

Algorithm

And the algorithms we build around this data - and the algorithms embed a set of assumptions. Most recommendation engines, for example, operate on recommending things that people like you already like: how do you change? How do you reinvent something?

Imagine an algorithm that says, not “I know you like coffee and this is the time you’d normally have coffee, but there’s some public art nearby you should have a look at” - an “algorithm of wonder”.

Some of these are hard choices: you’re in a self-driving car, what’s it’s accident decision tree? Is a particular pedestrian more valuable than people in the car? Who made that call? Do you prioritise the children or adults in the car?

Algorithms are theories, and how you create and test those theories is an important topic.

Security

Security used to be simple: patches on computers. Now it’s medical technologies, steel factories, aeroplanes, your connected house, 23andme data (you couldn’t get to a site unless you could prove you were white).

And security is hard: we lend our devices, passwords, credit cards, you name it. So we go for biometrics: a million fingerprints were stolen in the US last year. How to repudiate your fingerprints?

And then we introduce national security and law enforcement: backdoors, weakening citizen infrastructure.

Bruce Sterling’s “The Internet of THings”.

Privacy

LA rubbish bins now warn you not to chuck personal information in the bin. This has become a fierce debate: Tim Cook says it’s a human right. Other parts of SV say the cat’s already out of the bag. The US government says you shouldn’t have any.

  • Do you own your data?
  • Is it owner by third parties?
  • What can be combined - linking your healthcare history to your credit card history?
  • When you think you’ve given permission for X and it’s being used for Y.

And no, “just have opt-in” is not an answer.

What does it mean to talk about privacy when we talk about smart homes, LIDAR, ubiquitous drones.

Memory

“We should store everything!” But what are the usage models we imagine?

Think about meeting people at a conference where you meet someone who knows you, but you know them. Wouldn’t it be awesome to have a memory prompt so you know who they are.

But what if you remembered everything you’ve ever done. It turns out there are mental illnesses based around that. Humans can’t cope with it.

And there are agendas at work here. What are the consequences of never being forgotten? And what does it mean when Australian governments what mandatory remembering of all your data, but European ones want a right to be forgotten?

Innovation/Disruption

Every innovation has two stories: a utopian fantasy and a dystopian horror. Robots will liberate us from work and kill John Conner.

But we need to ask more questions:

  • What is being disrupted?
  • Who is the beneficiary?
  • What are the impacts? On who?

Our Solutions Need to Be Human

Our innovation needs to consider humans, and we need to be the architects of our own change. We’re already building that future - at Intel there are labs where “2025 is already there”. People who “work in the future and spend the weekends in the past”.

We have this opportunity to make decisions, to make the world better than it was before and be optimistic. We get to craft it - and “I can get to hope I live in a world where my house isn’t spying on me”.

“You have a responsibility.”

Q&A

  • Your job is to explain the world to engineers - do you explain engineers to the world? Absolutely. I treated my interview as fieldwork: I made the engineers my tribe that I was studying. Four years ago I was asked to give a presentation on “us”. The work of explaining what you do, how your tools and terms of the art have made her a better anthropologist. She has taken engineers into the field which has been interesting. Listen to concerns people have about what we’re doing.

This was an awesome, awesome keynote. Absolutely superb.

Preventing Cat-astrophes with GNU MediaGoblin

Ben Sturmfels

GNU MediaGoblin is a publishing system for artists - painters, photographers, presenters. It’s expressly made for a nontechnical audience/userbase. It’s accessed through a web interface.

Unlike many of the publishing systems you may be used to, it supports a variety of media: images, video, audio, slides (PDF), Blender files (displayed natively or in via WebGL)[^1] and so on. Descriptions can be written in Markdown.

Mediagoblin will automagically transcode uploaded content to sensible display formats, which is “the hardest part of the process”. Sure, if you’re technical you might understand it, but it’s a pain.

The video is played back via an HTML+JS player, and you can make it downloadable.

Because it’s AGPL, you can run it on your own server - at home, a VPS, or what have you. Perhaps MediaGoblin isn’t for you - perhaps you don’t care where your data lives, or the narrowly-focused UI most of them have.

But consider: the massive centralised architecture of sites like Instagram and the like, makes it too easy for bad DMCA takedowns, censorship, hostile datamining, and so on. You are in a very weak position in these cases.

And what happens when the lights go out? Companies aren’t good custodians of our cultural heritage. We risk losing our history, our diversity, and our unique perspectives. Do we remember the world without YouTube[^2].

MediaGoblin gives us an alternative which is socially scalable - from personal, to families, artists collectives, galleries, or other communities of interest.

Technical

It’s a collection of Python processes; a Django-ish (but non-Django) front end, and a set of worker processes with manage media transcoding and other long-running jobs.

  • Jinja2.
  • WTForms.
  • Wekzeug.
  • SQLAlchemy.
  • PostgreSQL or SQLite.
  • Celery and Kombu.
  • Gstreamer 1.0.
  • Python 3 support is a work in progress.

There’s also an API - with a pump.io compatible API, so you can use pump.io compatible tools to work with MediaGoblin; you comment on entries, upload media, and read feeds, for example.

Finally there are command-line tools to (for example) bulk-upload files.

Other FEatures

  • Moderation.
  • Tagging.
  • Collections.
  • Geo-tagging support.
  • Visual theming.
  • Per-user licensing preferences
  • Persona/OpenID/LDAP support.

The Future

  • Federation.
  • Finish 1.0.
  • Privacy Features.

It’s about the social experience, not just the hosting, which makes it very ambitious. Federation is a critical part of this; federation allows machine-to-machine interaction to allow your identity and comments to flow from server to server.

  • Retrofitting federation standards (ActivityStreams 1.0) to the relational datamodel has been one of the biggest challenges.
  • Working on standards with the W3C has been challenging but productive.

There’s an Android app - it’s quite a challenge to build Kivi into the Android build system, and is under heavy development.

Deployment is another challenge - mostly because deploying Python isn’t as easy as it ought to be.

Funding

Turning your money into free software: mediagoblin is funded by the FSF and “I’d encourage you to support our good friends at the Software Freedom Conservancy”. Jessica’s work has been funded by donations.

Q&A

  • Does it talk to Kodi? I don’t know what that is.
  • I know someone who’s creating photos of Aboriginal art; this could be perfect for displaying them, but they’re gigapixel images. Can it handle that? The backend is scalable and extensible, so if it can’t do it now, it would be extensible.
  • What are the disk space requirements? A function of the size of the originals.
  • The 3D extension was written for a bounty. Is this a good mechanism for getting plugins? Absolutely.
  • Why is the AGPL 3 license best for this? If you put GPL software online as a service, you don’t have to provide four freedoms, particularly the modifications to the code. The AGPL, on the other hand, requires you to provide a copy of the source your’re running[^3].

Using TPMs to protect users

Matthew Garrett

TPMs

TPMs have a bad reputation because of the association with content encryption, for example. In this context we mean “Trusted Computing Platform”, intended to use to increase the amount of trust you can have in your computer.

In the early part of the century the big concern was that TPMs would be (mis)used to allow others to disctate what you could run on your computer - “Treacherous Computing”. This didn’t happen because of a bunch of reasons which are “essentially uninterstings.”

  • 28 pin package, essentially the same physical and code interfaces and are largely interchangable within the same version (any 1.2 can be used for any 1.2, any 1.0 for any 1.0, but not 1.2 for 1.0).
  • Widely varying processing power, from old-school embedded controllers to ARM cores.
  • NVRAM.
  • GPIO pins to talk to other devices.
  • Crypto (very slowly - 20s+ depending on the model and operation).

So why have a device that’s worse than my CPU in every single way? “I’ll skip that question and come back to it later.”

TPMs have Platform Registers that are used to take measurements - for example, measuring the boot process. The TPM is intentionally very limited in its interactions: it can’t grab information directly, so it requires the co-operation of the bootloader (for example).

Measuring the Boot Process

  • Calculate the SHA1 of data you want to examine.
  • Pass it to the TPM’s PCR, which adds it to the existing SHA1. This means you get a squence of numbers, each of whih depends on a consistent chain of numbers being passed in.
  • You only have couple of choices for breaking this:
    • Break SHA1.
    • Perform exactly the same sequence of writes.
  • Anything else will be noticeably better.
  • With bootloader support, PCRs hold full boot state.
  • If the number is different, something has changed and your security is broken somewhere along the line.

But then what?

How do we talk to the TPM? We ask the kernel! But if the kernel has been tampered with…

This is a pretty fundamental question. To trust the kernel, we have to trust the TPM.

So there’s a third part to the picture: rely on a third party. This is remote attestation. Make sure you are talking to a TPM - at manufacturing time the TPM is flashed to an endorsement key; there is a chain of trust back to the manufacturer. Which is a bit problematic given poor cert management by manufacturers.

You can create an attestation identity key with the TPM - the TPM creates a public and private copy of the attestation identity key. You get both copies but you can’t do anything but store them. It also gives back a verification block, signed by the endorsement key.

TPM Quotes

  • Receive a signed copy of the PCR values, signed by the AIK.
  • If we get back a properly-signed copy of the PCR, we know that the PCR values came from the TPM we think we’re talking to.
  • (Nonces are used to prevent replay attacks.)
  • The kernel cannot interfere with this process - all it can do is to refuse to hand over the data.

This is perfect for cases when the remote attestation system is being used for remote authorisation cases; for example, if a system is asking to join a cluster.

It’s hopeless for, say, your laptop. Because you need to send/recieve the comms to the remote attestation server over a NIC that the attacker might control.

So we need…

Local TPM Use

Not only can the TPM encrypt small amounts of data, it can refuse to decrypt it if the PCR doesn’t match the boot values. This is “TPM sealing”.

For example, recent Windows systems have an encrypted disk with the disk secret encrypted by the TPM. If the boot process has been tampered with, the disk won’t decrypt successfully. If it boots OK, it hasn’t been tampered with.

Hurray!

Unfortunately this means that while your disks are safe from being stolen, but switching on your laptop will decrypt the disk…

So… add a passphrase.

But what if the boot process has been tampered with? What if the encryption prompt is malware? And it looks like my computer booted and crashed. Which is common, because computers are awful. We aren’t very good at them.

Anti Evil Maid

A USB stick that has been prepped with a TPM-sealed set of code that will display known-good information if your computer hasn’t been tampered with. It’s good, but requires a lot of discipling.

An alternative

  • Encrypt a TOTP seed.
  • Encrypt the TOTP seed and seal it to those PCR values.
  • Enrol that seed on a second device.
  • For example, an ANSI QR code that can be scanned.
  • If this works, wither your laptop is clean, or someone has tampered with both the devices. Try not to leave them in the same place.

Problems

  • You can’t change your bootloader, kernel, initramfs, firmware, and so on. Or it won’t boot. Which is awkward.
  • DMA attacks will let you copy the decrypted secret out of RAM. The main protection, the IOMMU, is switched off in most distros.
  • Management Engines: these provide a backdoor - it can lie about the boot values. It’s completely un-auditable, and we have no way of knowing if it’s secure.

Integrity Measurement Architecture.

  • Goes past boot time into veryfying more and more binaries - measuring each executable.
  • The problem is that the PCR value is order-dependent, so any variation in execution order will cause a failure.
  • A workaround is to log the value along with the item that caused the logging event. This allows the remote server to deduce that the execution is correct. (IMA)

Measuring Containers

  • The container disk image can be measured into the TPM. These can be logged in the same way binaries are, via IMA. Thi gives us trustable container images.
  • You can also discover where images were run, based on the IMA logs.
  • This functionality was released in Rkt 1.0 (today).

Q&A

  • Has this stopped anything in the wild? No, but it would stop some of the Hacking Team attacks.
  • How do we trust the Intel hardware? This is a very hard problem. If someone can compromise Intel, none of this will save you, but there are plenty of other attackers.
  • What happens to the PCRs if the user drops to shell? The current process will use everything entered into GRUB, so a manual change to the GRUB commands will cause PCR variation.
  • How robust is the TPM? Have you checked? No, because I’m scared of what will happen. There are moves for software TPMs, on the management engine, which makes the management engine more vulnerable.

secretd - another take on securely storing credentials

Tollef Fog Heen

  • A project for storing and distributing secrets.
  • Works for fastly, hiring in technical and non-technical roles.

What is the problem?

  • Code may be secret or public.
  • Configuration may be secret or public.
  • Credentials are secret.

But your applications need all this information to function.

Secrets are in:

  • Code (when you’re young and foolish).
  • Configuration files (until the first time you accidentally publish it to github).
  • A pre-encrypted store, pushed out to all your app servers (until you get tired of pushing it everywhere, or people note all your developer laptops have a copy).
  • An online store.

But there are problems with most stores - complex and/or insecure; require manual work to re-encrypt them (for example, when you update the file with new servers); and updating them is hard. They often have poor support for development environments as comparied to production environments.

Requirements for a fix

  • Dynamic environment support - it needs to be able to add and remove infrastructure support.
  • Central storage.
  • Policy-based access controls, based on (for example) server types.
  • APIs for updating.
  • Hardware bootstrapping.
  • Hands-off/lights out operation.
  • PCI-compliant auditing of the authenticating/authorising machines.

Options as of a year ago

  • pwstore. Provides group-based access to resources.
  • chef-vault. Hooks into chef, uses the chef authorisation model.
  • Hashicorp vault. Wasn’t mature a year ago, and if it had been, we probably wouldn’t be talking about secretd.

Options today

  • pwstore/chef-vault: pre-encrypted, which is a nuisance.
  • etcd: x509 is suffering. It’s too easy to get wrong; e.g. libraries that don’t verify by default.
  • Hashicorp Vault: distributed, complex, TTL on secrets.

secretd

  • go
  • SQL/postgresql
  • ssh
  • tree strucure - you add ACLs on various points of the tree.
  • positive ACLs

secretd is Apache licensed.

The client communicates to the server via SSH and calls a shell, which calls the secretd via a UNIX socket; secretd then calls Postgresql.

The DB structure is a fairly simple relational model. It’s flexible enough people can have write-only, read-only, or read-write access.

There are some limits:

  • Secrets are not encrypted on disk.
  • There are no admin tools or UI.
  • Auditing is limited.
  • Tool integration (puppet, chef, etc) is limited.
  • Enrolment key support - so you can do a first sign-in as a new principal. This facilitates pre-provisioning clusters of machines, for example as a one-time key.

Tollef’s demo uses the pg_virtualenv tool, designed for quick spinup/teardowns of short-lived pg databases.

My thoughts: the core system seems well-thought-out, but the lack of a UI and audit would kill it for me; my professional life involves a situation (like many environments) where access control, including service accounts for applications, are managed by a team external to the dev and ops teams, so “usable by a non-Unix team” is critical.

Also, I doubt it would be feasible to run with non-POSIX environments, which is… fine for some use cases, but not mine; I now end up with a lot of Linux/Unix environments depending on Windows-hosted tooling, because there are lots of Windows tools which can speak to Linux clients, but not the other way around.

Q&A

  • Should I use secretd? No, it’s a prototype.
  • Why postgresql? Because I like its structure and good support for representing trees in the native SQL, unlike SQLite.
  • You seem to be implementing PostgreSQL’s native authentication mechanism in PostgreSQL tables? The features (row-level security) didn’t exist a year ago, and it would also result in a lot of DB users.
  • Why not use an encrypted block device for the at-rest encryption? Because volume-level encryption leaves the secrets in the clear in the backups, the logs, and so on. You need to have the credentials encrypted.

Closing

And then we thanked the volunteers, group hugged, and all started our trips home. I really enjoyed this year - a standout for me.