[Novalug] Subject: Re: NoSQL databases (and Cassandra)

Peter Larsen plarsen@famlarsen.homelinux.com
Mon Aug 30 09:41:07 EDT 2010


On Sun, 2010-08-29 at 20:32 -0700, Jim Ide wrote:
> What makes me nervous about NoSQL databases is that they lack
> a schema to declaratively force/enforce data validation.
> An RDBMS like Oracle or MySQL does this for you - in a NoSQL
> database, data is stored as a string (usually in JSON format),
> and the programmer has to write code to enforce data validity.
> Is there a JSON schema language and validator like XML has?
> 
> Am I missing something?

That's a correct observation. But to be honest, relational databases
suffered from the same problem in the early days in the 80's and early
90's. I remember Oracle's version 6 "introduced integrity constraints"
but once you bought the database you realized, that it was in syntax
only. It wasn't enforced. Worse, when version 7.0 was release it was
actually enforced but they had changed the syntax so you hadn't won
anything by using it in version 6. Oracle was quite the LAST of the
relational databases back then to have constraints so they wanted to
look equal.

Before then there was a lot of issues in the hardware abstraction layers
of the database. Again from Oracle's perspective they went from
"partitions" to tablespaces. I think DB2 stayed at a very physical level
for much longer - "partitions" in Oracle was definitely a 1:1 with the
physical layers. As anyone who's used an Oracle 9/10/11 database will
known, the tablespace layer has been massaged so much now, that even the
DBA has a problem determining where data is actually stored on the
physical layer. Add a local SAN or NAS and you're really out where
things are hard to determine. That was one of C.J. Date's original
points with relational databases - to take away the physical layer so
something working on harddrives would work on completely different
technologies for storage without recoding. As we migrate to SSD disks I
feel that we are proving his points. It makes sense not to code to the
physical hardware for enterprise programming.

And that's where I feel NoSQL really fails. It's basically saying to the
programmer: here's your world in a database - you can continue to treat
the data as if it was a large internal data-structure. Well, that's a
big disadvantage if you ask me; most programmers I know have little
knowledge on how to optimize a search for instance. I can't say how many
times I've seen database code where the developer has a loop through a
cursor looking for data instead of using a where-clause. Because that's
how the developer would do searches internally if (s)he was reading from
a file. The NoSQL seems to suffer from the exact same problem. There's a
huge difference between storing things internally in main memory and
disk/permanent storage. And as my initial email pointed out, it suffers
from not being able to search on anything but the key. I know of very
very few cases where that would be the case in any application.

It's interesting to see that a lot of the "big boys" relational
databases now have memory based options. You can pin tables to memory or
even run the database fully from memory (TimesTen is one of those) and
the DB engine then runs a background job to write changes to disk
asynchronously. Of course it means you need battery backed up memory.
But that's a different talk all together. In other words, the developer
can keep his/hers code and still take advantage of memory based
structures vs. slow disk structures.


-- 
Best Regards
  Peter Larsen

Wise words of the day:
 Turns out that grep returns error code 1 when there are no matches.
       I KNEW that.  Why did it take me half an hour?
	-- Seen on #Debian


> 
> 
> --- Peter Larsen wrote:
> 
> I've with interest followed the "noSQL" arguments over the last year.
> Since I took part of the infamy of SQL I still remember the arguments
> for/against SQL vs. the more traditional databases like Network and
> Hierarchical and even traditional files. 
> 
> And I cannot help but see "NoSQL" as a simple "Polyfile" implementation;
> a simple flat file with an index on top; and the programming effort
> seems to be very file oriented. So far, the NoSQL alternatives I've read
> about are all severely restricted: Single Set, no joins and in most
> cases a single key.  In other words, the equivalent to a single table
> with SQL; there's not really any design needed to create that
> implementation with SQL. Even with SQLight.
> 
> To me, data access to diverse implementations of data storage, through a
> common and standardized language is a big plus. I'm not really "married"
> to relational databases but a common standard to access the data-layer I
> certainly am. Going backwards to an API based access isn't something
> that looks promising to me.
> 
> Your observations about hashes are correct. That's really all the NoSQL
> implementations I've seen has been. But it's still done through a single
> key. Not a biggie with SQL either: select * into :var from table where
> key = :id and I have a single structure in the variable var with all the
> columns in that one table based the key (not pretty SQL but it'll work).
> 
> What it looks to me is that programmers are going to repeat their
> data-layer mistakes into the DB introducing a lot of redundancy. While
> that makes queries easier, it certainly makes updates a mess.
> 
> As some of you know, I've got quite a history with Oracle's DB. Here
> we've been able to break even 1st normal form since 10g. Meaning we can
> retrieve multi-dimensional datasets in "one row" based on one key. Even
> sets that are dynamic in nature. Of course the advantage is quick access
> to a large set of data and reading about the NoSQL efforts that is
> exactly the effect they're going for. I just wonder what happens when
> they need only a subset of their data in other parts of the application.
> They'll end up running into the same problems network/hierarchical
> databases has/had: you spend a lot of time/effort fighting the model
> instead of getting help from it.
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20100830/82802672/attachment.asc>


More information about the Novalug mailing list