Selena Deckelmann
Slides at: Slideshare.
-
“Success Engineering” - that clearly will work, then.
-
Plan for the worst. Minimise risk. Fail. Recover, gracefully.
-
“You can’t eliminate risk.”
-
alt.sysadmin.recovery shoutout.
-
Failure is an option. Admit it.
-
The open source world has failure and recovery as a core competency, but perhaps not systematically enough.
-
Dr. Jerker Denrell publishes fantastic papers on the topic from a business perspective. “Predicting the Next Big Thing: Success as a Signal of Poor Judgement.” Looked at people who had predicted Black Swan events, and found there was a negative correlation with general quality of judgement.
-
Try “Everything is Obvious Once You Know The Answer”
-
Whatever, science, blah, onto the entertaining anecdotes!
-
Rats like fibre optic. And we can use stories about this to help inform our planning.
-
Document, Test, Verify is like Stop, Drop and, Roll.
Documentation
- Documentation tools are mostly pretty terrible, and there’s good work that could be here.
- Making time to update documentation when you do stuff.
Testing
- Verify your success criteria. What does success look like, what are you trying to achieve.
- Make sure you actually write tests, however simple, and have a buddy sanity check your work.
- Have a plan: make sure you involve other people with it, too.
- There are no shortage of testing tools, which should be repeatable.
- Do stuff in repeatable shell scripts.
- Have staging environments.
Verify
- What does pg_dump -d actually do? Well, it depends.
- Needed a plan for what to do if things go wrong. Staging environment. And test your rollbacks, not just implementation.
- People are really important. Having a buddy.
Failure to Imagine
- Telling externals they need to tell you when you have a problem is not going to work. Trust no-one.
- Share your stories of failure and talk to a diverse group of people, people who are different to you.
- Sharing lets you head failure off at the pass.
- People who are different to you means outside IT - business, musicians, the construction industry.
- Go and physically look at things you might need to do, don’t just sit in a room.
Reflection
- The post-mortem/debrief.
- Keep a notebook of your work, learn from it.
- Plan to have a post-mortem, even if there’s success.
- Document your plan with a timeline, allocate time, and actually test the plan.
- IRC is great, speaking is better. A headset is great.
- Have a timekeeper and alert people to when you’ve hit your drop-dead point.
- Limit improvements to 1-2 things. An endless list will never be worked upon.
Read the DailyWTF.