[Novalug] Subject: Re: NoSQL databases (and Cassandra)

James Ewing Cottrell 3rd JECottrell3@Comcast.NET
Mon Aug 30 13:37:04 EDT 2010


  It's a fight I never meant to pick. And you made some clever tricks, 
storing both SSNs as integers. Another is storing money in cents or even 
mils or microdollars in order to avoid using floats.

But you picked off the easy cases, null values, proper type, key 
uniqueness. No quarrel there.

Perhaps the really interesting point that you mentioned is where any 
needed application validation should be done. The obvious answer is on 
input. If that is done correctly, then you can "just trust" that the 
database has correct values.

And that should be true no matter what database you are using, even one 
with no schemas. On the other hand, the temptation to
simply edit the ASCII files is greater.

Finally, we might also note that No SQL or ASCII text does not imply no 
schema.

JIM

On 8/30/2010 11:51 AM, Peter Larsen wrote:
> On Mon, 2010-08-30 at 10:53 -0400, James Ewing Cottrell 3rd wrote:
>> On 8/29/2010 11:32 PM, Jim Ide wrote:
>>> What makes me nervous about NoSQL databases is that they lack
>>> a schema to declaratively force/enforce data validation.
>>> An RDBMS like Oracle or MySQL does this for you - in a NoSQL
>>> database, data is stored as a string (usually in JSON format),
>>> and the programmer has to write code to enforce data validity.
>>> Is there a JSON schema language and validator like XML has?
>>>
>>> Am I missing something?
>> Well, using Oracle (or even MySQL)  just to get data validation would
>> certainly be Overkill.
>>
>> And you may need to write validators anyway. SSNs, Usernames, and
>> Telephone Numbers may all be stored as Strings,
>> but they have difference regex constraints.
> ARGH!
> Them are fighting words!!
>
> If all you want is validation there are probably smaller databases than
> Oracle/DB2, agreed. I wouldn't put MySQL in that class though. But when
> it comes to WHERE to do validation - database or the applications, I
> must beg to differ here.
>
> First off, you must remember a database is usually a central unit used
> by not only multiple functions within the same application but a lot of
> times used by multiple different applications too. So you end up with
> the eternal question: where to place the business logic - in the
> applications or in the database?  Well, my answer will ALWAYS be: the
> database. Let it do what it's good at. Let the application do what it's
> good at. And Lets NOT do our work over and over again, and risk
> inconsistent and corrupt data by implementing the same rules differently
> in different places.  I've had enough of my share of migrating away from
> legacy systems that didn't have central enforcement to know what happens
> to your database when you don't centrally manage data validity.
>
> We have different types of constraints (validation rules if you so
> wish): check, unique, primary key and foreign key. A check constraint is
> usually used to validate a single column or the relation between two
> columns. Ie. that a from-date is earlier than a to-date; that a given
> value is present or that a given format is followed. We also have
> data-types to help enforce part of this question - with Oracle you can
> even create your own data types (objects) and use those to represent
> complex data-types. A unique key is only allowing unique column(s)
> values while ignoring null values, and a primary key does the same but
> does not allow null values (when creating a primary key a not-null
> constraint is added). And finally a foreign key enforces that a value in
> a column(s) is present as a primary key instance in another table.
>
> 4 very basic rules that should NEVER EVER be broken. So why would you
> keep redefining them in the application? In particular when it comes to
> referential integrity is the database by far better at the job than any
> application would be.
>
> Domain validation such as SSN and telephone numbers comes back to a
> misunderstanding (In my opinion) that data representation inside the DB
> should look like the output. For instance a phone number is stored with
> ( ) and -, and SSN with - too. When in reality they are just numbers.
> Get rid of the dashes and 1) you save space, 2) validation is easier.
> Now there are cases where it's not that simple, and regex is definitely
> a good way to create easy to enforce validity patterns. And they can
> just as easily be used as part of a check constraint as they can inside
> an application. Ask yourself if a SSN in a DB is only validated/used in
> one place? Hardly. Definitely phone numbers follow that pattern too. So
> we know there are SOME domains that are shared. Put them in the DB. And
> why split our validation patterns around and put some in one place, and
> others in another? No good - so let's define all primary business rules
> at the DB level - it's good at number/data crunching - and let's keep
> the application good at what it is good at, presentation and dataflow
> control.
>
> I've seen developers get confused when the database is in charge of
> validation. And of course with modern applications we want to alert the
> user about validity problems BEFORE we enter the data into the DB
> (usually comes down to developers not knowing how to catch/interpret
> database events). In "modern" (last 15 years) DB development
> technologies we have better help. Because the DB uses a data-dictionary
> to define the validation rules, it's available as metadata for the
> application. This means, that the data model on the application layer
> can replicate the validation rules without any coding at all. So as you
> create your applications they inherit the rules and as a developer you
> "simply" have to define the error messages that makes sense to a user so
> they don't get the DB error code referring to the constraint being
> violated. That's a great improvement over having to recode and share
> validation rules across multiple applications and modules.
>
> The worst that can happen is that bad data is entered into the DB. So
> even if you don't follow a MVC design pattern you would want to get the
> data validated before entered into the DB by the DB. It's a safety
> catch. You can test/validate the DB independently of the
> applications/functions so you know that your rules work and cannot be
> violated. Once your DB becomes inconsistent you loose data and your
> application stop working. With accounting systems I've had to tell the
> board about loosing hundreds of thousands of dollars due to data
> failures in their application. And it's usually traced down to a single
> application that didn't implement the rule(s) in the same way as the
> rest.
>
> So to be clear: I'm not saying that data-validation only belongs in the
> DB. It also needs to be (in part) replicated/used in the applications.
> But it originates from the DB. Change the DB validation, and you should
> be changing the applications too (happens automatically today).
>
> And all of this doesn't even discuss stored procedures and their use.
> Automatic triggers or independent APIs that are all located on the DB
> instead of the application. I'm a firm believer in any business rule
> that deals with getting/moving/checking data belongs in the DB. Business
> rules that deals with presentation, formatting etc. belong at the
> application layer. The application programmer should only need to do a
> simple DB call to move money from one account to another, or to do an
> advanced query into the DB.
>
>
>
> _______________________________________________
> Novalug mailing list
> Novalug@calypso.tux.org
> http://calypso.tux.org/mailman/listinfo/novalug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20100830/6a246d1f/attachment.htm>


More information about the Novalug mailing list