First, a word: Josh is a neat speaker. He comes across as Good People. Nice, clear, concise, tell-it-like-it-is style, and great audience interaction, answers at his fingertips. Another on my list of “see this person if you get the chance.” Also, he’s really generous with chat time outside of presentations and very approachable.
On with Josh’s talk…
There’s been a lot of activity in the nonrelational world, with it being hard to keep track of; this talk is about how to choose relational vs nonrelational, understanding some of the things that matter, some that don’t.
Revolutionary database design
There are NO NEW DATABASE DESIGNS. There are only new implementations. “From my perspective the last really new design was in 2000 when they introduced streaming databases”.
New implementations are interesting, but not magically better. A description of CouchDB circa 2007, equally applies to Pick circa 1965.
There isn’t a NoSQL movement. Just because DB (a) doesn’t use SQL mean it works the same as another non-SQL database. Compare, e.g. memcached vs BerkleyDB-XML which are completely different in terms of goals and implementation. the fact they don’t have SQL interfaces is the least interesting thing about them.
Josh’s favourite metaphor is that it’s like having a race with bicycles, planes, and rowboats and calling it the “No steering wheel race”.
Relational DBs are for when you need ACID
Transactions are orthogonal to relational models.
e.g. CouchDB, VSAM, Amazon Dynamo, BerkleyDB are reliable, non-SQL, transactional databases.
SQL with no transactional safety: MySQL MyISAM and MS Access.
Relational DBs tend to have more robust/mature transactional handling than other DBs, simply because of age and requirements for their apps (finance, for example).
One Ring Concept
People look for one DB to rule them all. This is the wrong paradigm. One DB will not store all your data, takes the kids to school, and slices your bread.
IT’S NOT POSSIBLE.
Look for which DB system solves your application problem. This is not necessarily the same thing as your favourite database.
With scalable solutions you may need more than one solution; e.g. PostgreSQL + CouchDB, MySQL + memcached.
Pg + Couch - Josh talked about a solution for document management he’s worked with where CouchDB is used to stuff unstructured data in the short term, pulled our for analysis, sumamrized, retention requirements decided, then stored in Pg for long term sotrage with rich metadata.
MySQL NDB is a distributed object store.
PostgresQL Hstore - up to a gig of storage for arbitary stoage that can be an indexible field (e.g. relational + XML blob).
HadoopDB - Hadoop plus DB.
Relational OLTP DBs
- Transactions: more mature support
- Contraints: enforce data rules
- Consistent: Force structure
- Complex reporting
- Vertical scaling Not so much horizontal, without big admin overheads.
SQL vs No-SQL
Some OO programmers have a hard time switching from their favourite language to SQL.
- SQL is portablish; most no-SQL DBs aren’t.
- Protocols for managing changes over time.
- Multi-application access is a solved problem.
- Many mature tools.
I would add: if you’re too dumb to pick up SQL, you shouldn’t be coding anything important. If you can’t add a language and a paradigm (SQL=asking for what you want, not telling the computer how to do it), you aren’t a real programmer anyway.
- Programmers as DBAs. I would add: personally, this scares the shit out of me. I’ve seen what happens when we let Joe Average Programmer run riot on datamodels and DB admin and it ain’t pretty.
- Lower impedence, less mental context-switching.
Main reason to use SQL DB: Immortal Data.
Unfortunately I dropped a couple of types here due to being distracted by laptop overheating. I love my little IBM X60, but it starts to cook a bit after 6 - 8 hours prolonged usage.
Distributed Key-Value DBs
- Availablility Uber Alles, hugely scalable.
- Fast, simple
- No complex relationships, no reporting.
- Probably better than an ORM; if you’re relying on your ORM to get your database layer right, you’ll probably get better results out of an OOish DB.
- Access by OID
- Multi-dimensional data access
- Semi-structured data, get it in, analyze later.
- Data Mining
- Simple slow replication
- Different DBs do better at different tasks
- Pick for the task, not your favourite flavour.