Relational vs Non-Relational DBs

2010-01-19 · 713 words · 4 minute read

Technology · Conferences

lca2010 · postgresql · Josh Berkus

First, a word: Josh is a neat speaker. He comes across as Good People. Nice, clear, concise, tell-it-like-it-is style, and great audience interaction, answers at his fingertips. Another on my list of “see this person if you get the chance.” Also, he’s really generous with chat time outside of presentations and very approachable.

On with Josh’s talk…

There’s been a lot of activity in the nonrelational world, with it being hard to keep track of; this talk is about how to choose relational vs nonrelational, understanding some of the things that matter, some that don’t.

Mythbusters

Revolutionary database design

There are NO NEW DATABASE DESIGNS. There are only new implementations. “From my perspective the last really new design was in 2000 when they introduced streaming databases”.

New implementations are interesting, but not magically better. A description of CouchDB circa 2007, equally applies to Pick circa 1965.

NoSQL Movement

There isn’t a NoSQL movement. Just because DB (a) doesn’t use SQL mean it works the same as another non-SQL database. Compare, e.g. memcached vs BerkleyDB-XML which are completely different in terms of goals and implementation. the fact they don’t have SQL interfaces is the least interesting thing about them.

Josh’s favourite metaphor is that it’s like having a race with bicycles, planes, and rowboats and calling it the “No steering wheel race”.

Relational DBs are for when you need ACID

Transactions are orthogonal to relational models.

e.g. CouchDB, VSAM, Amazon Dynamo, BerkleyDB are reliable, non-SQL, transactional databases.

SQL with no transactional safety: MySQL MyISAM and MS Access.

Relational DBs tend to have more robust/mature transactional handling than other DBs, simply because of age and requirements for their apps (finance, for example).

One Ring Concept

People look for one DB to rule them all. This is the wrong paradigm. One DB will not store all your data, takes the kids to school, and slices your bread.

IT’S NOT POSSIBLE.

Look for which DB system solves your application problem. This is not necessarily the same thing as your favourite database.

With scalable solutions you may need more than one solution; e.g. PostgreSQL + CouchDB, MySQL + memcached.

Pg + Couch - Josh talked about a solution for document management he’s worked with where CouchDB is used to stuff unstructured data in the short term, pulled our for analysis, sumamrized, retention requirements decided, then stored in Pg for long term sotrage with rich metadata.

hybrids

MySQL NDB is a distributed object store.

PostgresQL Hstore - up to a gig of storage for arbitary stoage that can be an indexible field (e.g. relational + XML blob).

HadoopDB - Hadoop plus DB.

Relational OLTP DBs

Transactions: more mature support
Contraints: enforce data rules
Consistent: Force structure
Complex reporting
Vertical scaling Not so much horizontal, without big admin overheads.

SQL vs No-SQL

Some OO programmers have a hard time switching from their favourite language to SQL.

SQL is portablish; most no-SQL DBs aren’t.
Protocols for managing changes over time.
Multi-application access is a solved problem.
Many mature tools.

I would add: if you’re too dumb to pick up SQL, you shouldn’t be coding anything important. If you can’t add a language and a paradigm (SQL=asking for what you want, not telling the computer how to do it), you aren’t a real programmer anyway.

NoSQL

Programmers as DBAs. I would add: personally, this scares the shit out of me. I’ve seen what happens when we let Joe Average Programmer run riot on datamodels and DB admin and it ain’t pretty.
Lower impedence, less mental context-switching.

Main reason to use SQL DB: Immortal Data.

Database Types

Unfortunately I dropped a couple of types here due to being distracted by laptop overheating. I love my little IBM X60, but it starts to cook a bit after 6 - 8 hours prolonged usage.

Distributed Key-Value DBs

Availablility Uber Alles, hugely scalable.
Fast, simple
No complex relationships, no reporting.

Multi-Value/Object DBs

Probably better than an ORM; if you’re relying on your ORM to get your database layer right, you’ll probably get better results out of an OOish DB.
Access by OID
Multi-dimensional data access

Document DBs

Semi-structured data, get it in, analyze later.
Schema-by-discovery
Data Mining
Simple slow replication

Conclusion:

Different DBs do better at different tasks
Pick for the task, not your favourite flavour.