A Few Database/DBMS Definitions

Just a quick post today to inform my readers of several technical definitions that I have written for TechTarget’s WhatIs? site. If you are not familiar with that site, click on the link and investigate it. It is a great way to hunt around and educate yourself on terms that may not be familiar to you.

Anyway, lately I’ve been working with the site to help develop some of their definitions in the data and database management realm. Here are a few of the ones I’ve participated in developing:

Of course, there are many other data and database-related definition up on WhatIs?, as well as tons of other IT definitions. Be sure to check it out if you haven’t already done so!

Posted in DBA | Tagged , , , , , | 1 Comment

The Surprising Things You Don’t Know About Big Data

 

I found this infographic to be informative and entertaining, so I’m sharing it here for the readers of my blog. Let me know what you think… should I post more things like this in the future?

The Surprising Things You DonYou can also find more infographics at Visualistan

Posted in Big Data, Data Growth | 1 Comment

Programmer Makes Excuses, Too!

In our last post (DBA Excuses… and advice to programmers for overcoming them!) we examined some of the bigger excuses used by DBAs to avoid problems and work. But poor excuses are not the exclusive domain of the DBA; far from it! Application developers and programmers rely on their fair share of excuses, too. And if you’re a DBA you’ve probably heard most of them. Let’s break down the top few programmer excuses and see what can be done to avoid them in the future.

The number one programmer excuse is to blame the DBMS.  If you’re a programmer, chances are high that you’ve either said something like the following (or at least thought it): “There’s something wrong with DB2 (or Oracle, or insert your favorite DBMS here)!” The basic mentality is that the DBMS is guilty until proven innocent; the programmer will never run up to the DBA and say “there’s a problem with this horrible code I wrote, can you help me fix it?”

Blaming the DBMS is never a helpful strategy. Oh, yes, in some rare instances there will be a problem or bug in the DBMS itself, but those instances are very rare. Most “database problems” can be tracked back to programming problems. By keeping this in mind at all times everyone will be better off – the problem will get fixed sooner and you will not alienate your DBAs by constantly blaming the DBMS.

Another common excuse is known as the Copied Code Syndrome. As most programmers know, copying working code from one program to another is an efficient way of quickly developing programs. But with database development you have to be careful to make sure that what you copy is really what you need. Here’s how this excuse works: when the programmer is confronted with a problem in his program he simply says “That can’t be a problem because I copied from another program and that program works.”  Well, that may be true, but many things can go wrong with copied SQL. Maybe you copied something that is 95% what you need, but you didn’t modify the code for your purposes. Or maybe something is different about the rest of the code in your program that makes the copied code in effective. Or maybe you aren’t totally sure of what each and every statement and parameter that you copied does?

A corollary to the Copied Code Syndrome is the It worked yesterday excuse. But today is another day, and if it ain’t working today, it ain’t working. Many things can change from day-to-day that can cause working code to become problematic – not the least of which is that code itself. Programmers make so many changes to code as a requirement of their job that sometimes you can just forget that something did indeed change. The bottom line is to work on solutions to problems instead of excuses to deflect blame. Blame is counter-productive to resolving problems.

Yet another excuse bandied about by developers is the Better Mousetrap excuse. The best approach to developing programs using a relational database is to put as much work as possible into the SQL and let the DBMS optimize the access. But there is always that Wile E. Coyote developer who says “But I can do that better in C or Java (or insert your favorite programming language here).” Doing it in SQL puts the work on the DBMS – and there is a much better chance for the DBMS to be bug-free than whatever code you cobble together.

The final programmer excuse I’ll mention today is the Time Is Running Out excuse. This can best be summarized as “There is always time to do it over later, but never enough time to do it right the first time.” Usually this excuse comes to light when you hear that magic phrase “It’s too late in the project to re-write that.”  But the problem caused by the code continues to exist – the programmer just wants some magic to occur to fix it that does not require coding changes. Won’t happen! There is no magic button out there! Sometimes the code has to change to solve the problem.

In the end, the biggest thing you can do as an application programmer is to research and understand any issue before you go running to the DBA.  If your program fails, find the SQLCODE (or SQLSTATE) and any associated reason code and try to fix it yourself first.  If you don’t understand something, read the manual before going to the DBA. There is no reason why everyone shouldn’t have their own set of manuals; most of them can be downloaded for free from the DBMS vendor’s web site.

To conclude, if I had a nickel for every time someone tried to use one of these excuses on me, I’d be a wealthy man. But life does not work that way. So maybe we can all climb back into the trenches and vow to avoid using all of these excuses — both DBAs and programmers… it’ll makes working with your databases a lot easier!

Posted in DBA, SQL | 1 Comment

DBA Excuses… and advice to programmers for overcoming them!

Let’s face it; sometimes it is easier to give an excuse than it is to take the time to answer a technical question. Especially if it’s the third time you’ve heard the same question in the last hour. And dealing with databases is complex and time-consuming, so questions are always coming up. And programmers are always coming to DBAs for help with technical problems. If you’re reading this column right now you’ve probably been on the receiving or giving end of one of these excuses at one time in your life. Let’s see how many of them you recognize.

The number one all-time DBA excuse can be broken down into two words – “it depends.” DBAs are notorious for answering every question with those same two words, “it depends.” And they are right, it does “depend,” but how does that answer help the situation? Oh, it might cause the programmer who deigned to ask the question to slink away, and maybe that is what the DBA was shooting for anyway. But if you are that programmer do not just go away. Whenever the DBA says it depends” do not take that for a final answer… instead, ask the follow up question “on what?”

Any DBA worth their salary will be able to run down some of the issues involved in the matter and point you to the proper manual or code to help you figure out your dilemma. Which brings me to the second biggest DBA excuse, which is RTFM. As everyone knows, this stands for Read The Friendly Manual.

I know, fellow DBAs, it gets old very fast when people keep asking you questions that are covered in the software manuals. But DBAs know the DBMS manuals much better than the application programmers, end users, and systems analysts who are asking the questions. It really shouldn’t be too hard to answer a simple question instead of barking out “RTFM” all the time. At least tell them in what chapter of TFM they can R it! And when you are a developer asking a question of the DBA, have you ever thought about phrasing it like this – “Where can I get more information on <Topic>?” instead of asking to be spoonfed answers… it might make a world of difference in the DBA’s attitude!

Another classic DBA excuse is to blame the DBMS vendor. It is always easier to claim that some piece of sage advice being offered came from the vendor because it lends legitimacy to the advice while at the same time relieving the DBA of any burden of proof. You know the situation I’m talking about. You’re a DB2 programmer (say) and you encounter a problem causing you to go to the DBA for guidance. And then the DBA says something like “Well, IBM says to …” Don’t accept this type of excuse either. IBM is a company and it doesn’t “say” anything. Same goes for Oracle and Microsoft and SAP and <insert your favorite DBMS provider here>.

Instead, find out who at IBM (or Oracle or…) said it and in what context. Or, perhaps no one “said” it but the DBA read it in an article or a manual or a book. That is great, but many times written advice is mangled and twisted when it is repeated orally. When this type of “excuse” is attempted you should obtain enough details from the DBA to seek out the source materials (or person) to make sure it truly applies in your situation.

Another popular excuse is the phrase “It is working as designed.” Well, that might be the case, but if the way it is working isn’t what you need it to do, then you still have a problem. If the DBA tells you that it is working as designed, he is basically telling you that you do not understand the way the DBMS works. And that may be true, but then he should really try to help you find an alternate solution — that does what you want — while also working in the database “as designed.”

The final DBA excuse I’ll talk about here is over-reliance on company standards. Some DBA groups grunt out mounds of steaming standards and guidelines that they then attempt to enforce for every project. Standards can be useful, but they shouldn’t become a crutch. When a specific standard ceases to make development and administration easier, or performance better, then it is time to make an exception to that standard… or even time to remove or re-write that standard.

If you come up with an elegant solution that works better or performs better than the “standard,” do not just accept it when the DBA says, “that doesn’t fit our standards.” Make sure that you both understand each other’s position. Perhaps there is a valid reason why the standard should apply – but make sure the DBA explains it to you if your solution really is easier. And as a programmer, make sure that you really have to deviate from a standard before making a big deal about it. That way, when you have a valid case, the DBA will be more apt to listen and compromise.

If I had a nickel for every time a DBA used the excuses in this article I’d be retired and living on an island somewhere. But you’d still be hearing the excuses!

Maybe if we all pay a little more attention to working together for the benefit of our organization, instead of hiding behind excuses, the workplace might be a more enjoyable and productive place to be!

Posted in DBA | 2 Comments

On DBA Tools, Utilities, Suites and Solutions

“Vendor speak” can be a difficult thing to decipher. And that is particularly true within the realm of DBA tools vendors. It would be easier if every vendor followed the same terminology… but, of course, they do not. And there is no way to force them to do so. But we can adopt a rigorous lexicon to describe the software offerings and use it ourselves… and analyze all software that we review using this simple descriptive lexicon.

Here is what I propose.

First, let’s be clear on what is a utility and what is a tool.

  • A database utility is generally a single purpose program for moving and/or verifying database pages; examples include load, unload, import, export, reorg, check, dbcc, copy and recover. There may be others I am missing, but these are functions that are frequently bundled with a DBMS, but also may be sold as an independent product.
  • A database tool is a multi-functioned program designed to simplify database monitoring, management, and/or administrative tasks. So performance monitor, change manager, data modeling tool, recovery/performance/SQL analyzers, etc. are all examples of DBA tools. Again, the list is not intended to be an exhaustive one (for a more in-depth list check out this blog post).

OK, it would be simple enough if these were the only two types of products we had to concern ourselves with, but they are not. Vendors also talk about solutions and suites. Sometimes, these two terms are used interchangeably, but I contend that they should not be. What I propose it his:

  • solution is a synergistic group of tools and utilities designed to work together to address a customer’s business issue.
  • suite is a group of tools that are sold together, but are not necessarily integrated to work with each other in any way.

So, a solution is a suite, but a suite is not necessarily a solution. Solutions are designed to simplify things for the customer in terms of usage and efficiency. Suites are designed to help the vendor and its salespeople sell more.

Now don’t get me wrong… I am not saying that you should not buy suites of DBA tools and utilities. If the price point is good and you really want or need all (or most) of the bundled software, then it can make sense. But know what you are buying! Understand all of the components of the suite and the terms and conditions of the agreement.

Solutions, on the other hand, should work together seamlessly and you may not even know that there are multiple underlying products. If that is not the case, it isn’t really a “solution” is it?

Of course, these are just my definitions. But I think these are useful definitions that make it easier to review, analyze and discuss DBA products and programs.

What do you think?

Posted in backup & recovery, change management, data modeling, performance, tools | 1 Comment

A Brief Introduction to Data Normalization

Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data in a relational database. These steps reduce data redundancy and the chances of data becoming inconsistent. A table is said to be normalized if it satisfies certain constraints. Codd’s original work defined three such forms but there are now five generally accepted steps of normalization. The output of the first step is called First Normal Form (1NF), the output of the second step is Second Normal Form (2NF), etc.

A row is in first normal form if and only if all underlying domains contain atomic values only. 1NF eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship. A row is in second normal form if and only if it is in first normal form and every non-key attribute is fully dependent on the key. 2NF eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key. A row is in third normal form if and only if it is in second normal form and every non-key attribute is non-transitively dependent on the primary key. 3NF eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.

But normalization does not stop with 3NF. Additional normal forms have been identified and documented. However, normalization past 3NF does not occur often in normal practice because most tables in 3NF are usually also in 5NF. The additional normal forms are:

  • Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. Indeed, in his later writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if and only if every determinant is a candidate key. Most entities in 3NF are already in BCNF.
  • An entity is in Fourth Normal Form (4NF) if and only if it is in 3NF and has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.
  • Fifth Normal Form (5NF) specifies that every join dependency for the entity must be a consequence of its candidate keys.

For more information on normalization consult the following links:

A comparison of BCNF and 3NF is given here:
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node12.html

A complete definition of 4NF is given here:
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node16.html

Posted in data modeling, database design, normalization | 4 Comments

Everyone in IT Should Serve A Mainframe Tour of Duty

Today’s post is an update on a post I first wrote 10 years ago on my DB2portal blog. The general idea is that everybody in IT would be well-served by learning about mainframes and their robust management environment.

Mainframe developers are well aware of the security, scalability, and reliability of mainframe computer systems and applications. Unfortunately, though, the bulk of new programmers and IT personnel are not mainframe-literate. This should change. But maybe not for the reasons you are thinking.

Yes, I am a mainframe bigot. I readily admit that. In my humble opinion there is no finer platform for mission critical software development than the good ol’ mainframe. And that is why every new programmer should have to work a tour of duty on mainframe systems and applications as soon as they graduate from college.

You may note that I use the word mainframe, instead of the z Systems or z Server terms that IBM is using these days. Nothing wrong with the z thing, but I think there is nothing wrong with the term mainframe!

Why would I recommend a  mainframe tour of duty for everybody?

Well, due to the robust system management processes and procedures which are in place and working at every mainframe shop in the world. This is simply not the case for Windows, Unix, and other platforms. Of course, I don’t want to overly disparage non-mainframe systems. Indeed, much of the credit for the mainframe’s superior management lies in its long legacy. Decades of experience helped mainframers build up the systems management capabilities of the mainframe.

But by working on mainframe systems, newbies will finally begin to learn the correct IT discipline for managing mission critical software. The freedom that is allowed on non-mainframe systems helps folks to learn – but it is not conducive to the creation of hardened, manageable systems.

No longer is it okay to just insert a CD download something from the web and install new software willy-nilly onto a production machine. Mainframe systems have well-documented and enforced change management procedures that need to be followed before any software is installed into a production environment.

No longer is it okay to just flip the switch and reboot the server. Mainframe systems have safeguards against such practices. Months, sometimes years, can go by without having to power down and re-IPL the mainframe.

And don’t even think about trying to get around security protocols. In mainframe shops there is an entire group of people in the operations department responsible for protecting and securing mainframe systems, applications, and data.

Ever wonder why there are no mainframe viruses? A properly secured operating system and environment make such a scenario extremely unlikely.

Project planning, configuration management, capacity planning, job scheduling and automation, storage management, database administration, operations management, and so on – all are managed and required in every mainframe site I’ve ever been involved with. When no mainframe is involved many of these things are afterthoughts, if they’re even thought of at all. Sure, things are getting better in the distributed world – at least better than they were 10 years ago – but it is still far from perfect!

Growing up in a PC world is a big part of the problem. Although there may be many things to snark about with regard to personal computers, one of the biggest is that they were never designed to be used the way that mainframes are used. Yet we call a sufficiently “pumped-up” PC a server – and then try to treat it like we treat mainframes. Oh, we may turn it on its side and tape a piece of paper on it bearing a phrase like “Do Not Shut Off – This is the Production Server”… but that is a far cry from the glass house that we’ve built to nourish and feed the mainframe environment.

IMG_0503

And it is probably unfair to criticize PCs for not being mainframes because the PC was not designed to be a mainframe… but over the years people have tried to use them for enterprise production workloads… sometimes successfully. Sometimes not.

The bottom line is that today’s applications and systems do not always deliver the stability, availability, security, or performance of mainframe systems. A forced tour of duty supporting or developing applications for a mainframe would do every IT professional a whole world of good!

Posted in enterprise computing, mainframe | 3 Comments

Avoiding Deadly DBA Habits

Every now and then I’ll write a short blog post to promote an article that I think is worth reading. This is one of those posts…

Today’s article worth reading is titled The Seven Deadly Habits of a DBA …and how to cure them by Paul Vallee. This well-thought-out list of habits to avoid should be on every DBA’s reading list. It is well worth the short time it takes to read through it.

The article is not deeply technical, nor should it be. Although the article is written from an Oracle perspective, the guidance is broadly applicable to managing any DBMS environment. In the article, Vallee discusses seven tendencies that need to be overcome and offers cures for overcoming them… ’nuff said.

Click on over and read it while you’re still thinking about it: The Seven Deadly Habits of a DBA …and how to cure them.

Posted in DBA | Tagged | 1 Comment

Inside the Data Reading Room – Spring Break 2015 Edition

Welcome to another installment of Inside the Data Reading Room, a regular feature of this blog where I take a look at some of the recently published database- and data-related books. In today’s post we’ll examine a book with a unique spin on data governance, a book on MDM and Big Data, and an A to Z guide on Business Intelligence.

The first book we’ll examine inside the Data Reading Room is Non-Invasive Data Governance by Robert S. Seiner (2014, Technics Publications, ISBN 978-1-835504-85-6).  This book offers an insightful and practical approach for embracing data governance best practices by evolving existing data governance activities. The premise of the book, as Bob so aptly states early on, is that “(a)ll organizations already govern data. They may do it informally, sometimes inefficiently, often ineffectively, but they already govern data. And they all can do it better.”

The book does a wonderful job of explaining the trademarked phrase that is the of this book, “non-invasive data governance,” as well as to guide the reader along the path of formalizing and improving existing data governance practices. The key, according to Seiner, is not to start from square one, but to build upon the data responsibilities that are already being performed within the organization.

If your organization is planning to embark on a data governance project, is looking for help for your existing data governance practice, or simply desires a useful, non-invasive method of managing and governing your data, look no further than Seiner’s informative Non-Invasive Data Governance.

Next up is Beyond Big Data: Using Social MDM to Drive Deep Customer Insight by Martin Oberhofer, et al (2015, IBM Press/Pearson, ISBN 978-0-13-3505980-9).    Written by five IBM data management professionals, this book offers up new ways to integrate social, mobile, location, and traditional data.

Even with all of the new big data and analytics books being published this book is worth seeking out for its unique perspective and focus. Covering IBM’s experience with using social MDM at enterprise customer sites, the book provides guidance on improving relationships, enhancing prospect targeting, and fully engaging with customers.

In Beyond Big Data you will be introduced to a not only the basic concepts of master data and MDM, but its role with social data. By combining social and master data the book shows how to derive insight from this data and to incorporate it into your business.

Along the way the authors help to explain how social MDM extends fundamental MDM concepts and techniques, method of architecting a social MDM platform, using social MDM to identify high-value relationships and more. The book even tackles thorny issues like the ethics and privacy concerns of gathering and using social MDM.

What the book is not, is another tome attempting to describe Big Data; what is, is a useful approach and roadmap to exploiting a new Big Data niche – social MDM.

And last, but definitely not least, we have Rick Sherman’s impressive new book Business Intelligence Guidebook: From Data Integration to Analytics (2015, Morgan Kaufmann, ISBN 978-0-12-411461-6).

This 500+ page tome highlights the importance of data to the modern business and describes how to exploit data to gain business insight. Throughout the course 19 chapters describes the requirements, architecture, design, and practices that should be used to build business intelligence applications. The pratical advice and guidelines offered within the pages of Sherman’s book will help you to build successful BI, DW and data integration solutions.

The overarching theme that business people need to participate in the entire process of building business intelligence application is incorporated into the entire book. Each of the seven (7) parts that the book is organized into provides useful and actionable knowledge covering the following areas:

  • Concepts and Context
  • Business and Technical Needs
  • Architectural Framework
  • Data Design
  • Data Integration Design
  • Business Intelligence Design
  • Organization

By deploying the information contained in this guidebook, you should be able to successfully justify, launch, develop, manage and deliver a highly-functioning business intelligence system. And do it on time and within budget.

A companion website includes templates and examples, further discussion of key topics, instructor materials, and references to trusted industry sources.

All in all, we have three recommended new data books that are well worth your time to seek out and read. Doing so can only help to improve your data knowledge and employabaility.

Other Books of Note

Posted in DBA | Tagged , , , | 1 Comment

A Couple of Database Predictions

I usually try to avoid predicting the future because humans, as a general rule, are not very good at it. But when things look repetitive or cyclical, then sometimes a “bold” prediction here and there is worth attempting.

So, with that in mind, over the course of the next 5 years or so I predict widespread consolidation in the NoSQL DBMS market. As of today (February 18, 2015), NoSQL-Database.org lists 150 different NoSQL database systems. Anybody should be able to foresee that a number of NoSQL offerings so vast is not sustainable for very long. Winners will emerge pushing laggards out of business (or to languish).

Why are there so many NoSQL options? Well, IT folks like options. Many did not necessarily want to be tied to the Big Three RDBMS vendors (Oracle, IBM and Microsoft); others were looking for novel ways to solve problems that were not very well served by relational offerings. But nobody wants a sea of incompatible, proprietary DBMSes for very long because it is hard to support, hard to train talent for, and hard to find new employees with experience.

So consolidation will happen.

Additionally, the Big Three RDBMS vendors (and others) will work to incorporate the best features and functionality of the NoSQL database systems into their products. This is already happening (witness the column store capability of IBM DB2 for LUW with BLU Acceleration). We will almost certainly see more NoSQL capabilities being added to the relational world.

So there you have it: two “bold” predictions.

  1. The NoSQL database market will consolidate with a few winners, some middle of the packers, and many losers.
  2. The relational database market will combat NoSQL by adding capabilities to their existing portfolio

What do you think? Do these predictions make sense to you? Add your thoughts and comments below…

Posted in NoSQL | 2 Comments