Intelligently Automating Database Administration Tasks

Implementing, managing and maintaining complex database applications spread throughout the world is a difficult task. To support modern applications a vast IT infrastructure is required that encompasses all of the physical things needed to support your applications. This includes your databases, desktops, networks, and servers, as well as any networks and servers outside of your environment that you rely upon for e-business. These things, operating together, create your IT infrastructure. These disparate elements are required to function together efficiently for your applications to deliver service to their users.

But these things were not originally designed to work together. So not only is the environment increasingly complex, it is inter-related. But it is not necessarily designed to be inter-related. When you change one thing, it usually impacts others. What is the impact of this situation on DBAs?

Well, for starters, DBAs are working overtime just to support the current applications and relational features. But new DBMS releases are being made available faster than ever before. Yet many organizations cannot react fast enough to run on the most recent (and therefore most secure and most functional versions).

So, the job of database administration is getting increasingly more difficult as database technology rapidly advances adding new functionality, more options, and more complex and complicated capabilities. But DBAs are overworked, under-appreciated, and lack the time to gain the essential skills required to support and administer the latest features of the RDBMS they support. What can be done?

Intelligent Automation

One of the ways to reduce these problems is through intelligent automation. As IT professionals we have helped to deliver systems that automate multiple jobs throughout our organizations. That is what computer applications do: they automate someone’s job to make that job easier. But we have yet to intelligently automate our DBA jobs. By automating some of the tedious day-to-day tasks of database administration, we can free up some time to learn about new DBMS features and to implement them appropriately.

But simple automation is not sufficient. The software should be able to intelligently monitor, analyze, and optimize applications using past, present, and future analysis of collected data. Simply stated, the software should work the way a consultant works–fulfilling the role of a trusted advisor.

This advisory software should collect data about the IT environment from the systems (OS, DBMS, OLTP, etc.), objects, and applications. It should require very little initial configuration, so that it is easy to use for novices and skilled users alike. It should detect conditions requiring maintenance actions, and then advise the user of the problem, and finally, and most beneficial to the user, optionally perform the necessary action to correct the problems it identifies. Most management tools available today leave this analysis and execution up to the user. But intelligent automation solutions should be smart enough to optimize and streamline your IT environment with minimal, perhaps no, user or DBA interaction.

Only through intelligent automation will we be able to deliver on the promise of technology. As IT tasks get more complex and IT professionals are harder to employ and retain, more and more IT duties should be automated using intelligent management software. This is especially true for very complex jobs, such as DBA. Using intelligent automation will help to reduce the amount of time, effort, and human error associated with managing databases and complex applications.

Posted in DBA, tools | Tagged | 1 Comment

Happy Holidays!

Here’s wishing all of my readers a very happy and safe holiday season… Enjoy this time with your family and friends… and we’ll see each other again next year!

Posted in DBA | 1 Comment

Opening Eyes on Database Query Performance

Today’s blog post is from a guest blogger, Curt Triplett. Curt is one of the founding partners at DBG Software and an IBM Champion. He has over 20 years of experience specializing in performance optimization across all major databases including DB2,  Oracle, SQL Server, and Teradata.  

Take it away Curt…

After working in over 40 companies now, there is either some amazing coincidence or the discipline of query performance is not truly “owned” by anyone. Yes, every company says they “have it covered” to some extent. Usually, this means the DBAs tune queries. But the reality is that the typical DBA simply does not have the time to tune queries every day, and therefore cannot possibly build the skill set required for such a discipline.

What do they say… repetition breeds proficiency? The point is you can’t possibly be as good at something you do 20 percent of the time that someone else does 100 percent of the time.

A day in the life of a DBA is immense. First and foremost (way before performance) is the availability and integrity of the database. Oh, and then there is the constant cycle of applying patches, migrating database objects between databases, creating new databases, answering dozens of questions from various sources, etc. etc. etc.

What ends up happening is performance takes a back seat. And when “performance” is looked at, it’s usually database performance that is the focus. Not query performance. Too often, a shotgun approach is taken to turn a big knob on the database to make all the queries run faster. Or maybe hardware is added. Did that fix anything? Of course not, but you can bet the queries will run faster. For now.

Because there is a mentality to want to kill many birds with one stone by tuning the database, we never fix the true problem. The queries. So, the common cycle of making changes at the database level, partitioning, buying hardware, etc. is prescribed instead of doing what’s right… tuning the queries.

I propose that we quit thinking of ways to take broad strokes at “fixing” query performance and start making broad strokes at “diagnosing” what queries to fix! Whether it’s fixing the anomalies that bubble to the top of the averages or the individual elapse times, the idea is to determine which queries bring the greatest return.

So, let’s trade in the habit of taking the shotgun approach of tuning with tactics like partitioning and disk array tactics intended to evenly distribute I/O. Why distribute I/O? Eliminate it. That sounds far more efficient to me. Eliminate the I/O by tuning at the level that’s needed, the query level.

Having said all that, it takes a unique process and discipline to help identify what queries will bring the greatest return. This discipline will need two things. The right product and the right skill set.

So, share with me any thoughts and think about the following:

  1. How is query performance addressed in your shop?
  2. Is it effective?
  3. How do you know it’s effective?
  4. Could there be a possibility of a different, maybe even better, methodology that exists to make your query performance discipline a world class service?

I have assumptions about the answers to the first three questions, but I know the answer to the last one for sure. The answer is “Yes” and it’s called DPOP. DPOP is the methodology I’ve created for addressing query performance and you’ll quickly see how it applies to any database (DB2, Oracle, SQL Server) and for any application (developed or purchased).

Investigate how DPOP can help your organization achieve improved query performance at http://www.softwareonz.com/page31.html

 

Posted in DBA | 1 Comment

Business Eye for the DBA Guy

On why it is important for DBAs to be business-savvy…

Heads-down DBAs who know technology, but not their company’s business, soon will be on the endangered species list. Although DBAs are technologists first and foremost, we need to be ever cognizant of the business reasons that our beloved technologies support. Yes, DBAs like to immerse themselves in the bits and bytes of technical solutions and learn all there is to know about the software we use. And this is okay–up to a point. As long as we take care not to blind ourselves to the business reasons for the software and hardware they love so much.

To keep the business “top of mind,” the DBA’s tools and utilities need to be tied to business strategies and initiatives. In this way, the DBA’s work becomes integrated with the goals and operations of the organization.

The first step in achieving this needed synergy is the integration of DBA services with the other core components of the IT infrastructure. Of course, the DBA should be able to monitor and control the databases under his purview, but he should also be able to monitor them within the context of the broader spectrum of the IT infrastructure–including systems, applications, storage and networks. Only then can companies begin to tie service-level agreements to business needs, rather than technology metrics.

To fulfill the promise of business/IT integration, it will be necessary to link business services to the underlying technology. For example, a technician should be able to immediately comprehend that a service outage to transaction X7R2 in the PRD2 CICS region means that regional demand deposit customers cannot access their accounts. See the difference?

Focusing on transactions, TP monitors and databases is the core of the DBA’s job. But servicing customers is the reason the DBA builds those databases and manages those transactions. Technicians with an understanding of the business impact of technology decisions will do a better job of servicing the business strategy. This is even more true for the DBA’s manager. Technology managers who speak in business terms are more valuable to their company.

Of course, the devil is in the details. A key component to realizing effective business/IT integration for DBAs is the ability to link specific pieces of technology to specific business services. This requires a service impact management capability–that is, analyzing the technology required to power each critical business service and documenting the link. Technologies exist to automate some of this through event automation and service modeling. Such capabilities help to transform availability and performance data into detailed knowledge about the status of business services and service level agreements.

Today’s modern corporation needs technicians who are cognizant of the business impact of their management decisions. As such, DBAs need to get busy transforming themselves to become more business-savvy–that is, to keep an eye on the business impact of the technology under their span of control.

Posted in DBA | 2 Comments

Choosing a DBMS

One of the questions I get on a regular basis is some variation of “how can I choose what DBMS is the best”… so I’ll try to offer some high-level guidance here.

When mainframes dominated enterprise computing, the DBMS architecture was a simple concern. Today, the IT infrastructure is distributed and heterogeneous. Even for a modern mainframe DBMS implementation, the overall architecture will likely consist of multiple platforms and inter-operating pieces of system software. The final architecture must be based on the business needs of the organization. It should be chosen by a team consisting of business experts and IT experts. Business experts should include representatives from various lines of business as well as accounting and legal for software contract issues. IT experts should include database resource management representatives including data administrators, database administrators, and system administrators. But the selection process should also include other IT professionals, such as members of the networking group, operating system experts, operations control personnel, programming experts, and any other interested parties.

But let’s consider the nuances of the question. The problem — and the answers to it — take on many different forms. For example, should I answer it based on you having no existing DBMS and you’re just looking to buy one for the first time? That is a valid, though not too common, situation. On the other hand, you are more likely asking something like “Which of the several different DBMS platforms that we already have should we use for Project X?” This is also a valid question. So, let me try to answer both — at least in some fashion.

First of all, if you are brand new to DBMS it would be a very wise course of action to hire a database consultant (or two) to help you with your selection process. There are several very good choices out there. My preference is that new users should generally choose from the market leaders, and that means one of the big three: IBM DB2, Oracle, or Microsoft SQL Server. Of course, you also have other options such as an open source DBMS like MySQL or PostgreSQL. These can be used for some types of production work (mostly lower-end or perhaps web-based development projects). For high-end mission-critical applications, stick with the Big Three.

Another recent phenomenon is Big Data and NoSQL. There are a variety of options to choose from in this camp, with most of the leading candidates based on some for of Hadoop. For projects requiring large amounts of data and high availability “big data” solutions such as the CouchDB and Hbase NoSQL DBMS offerings may be ideal. For the most part, the NoSQL DBMS offerings are not for supporting typical, enterprise applications. Again, I would direct you to one of the Big Three for most typical enterprise usages.

OK, which of the big three? Well, if you are a large organization with a mainframe and want to run your DBMS on that mainframe, you really should go with IBM DB2. Although Oracle has a mainframe presence, IBM is far and away the market leader here. For Unix and Linux installations, your choices are Oracle and DB2. Oracle is the market leader on those platforms, though IBM has a nice presence there, too. For Windows development, all three are viable options, but Microsoft is the leader on Microsoft’s OS.

What about other options? Well, Sybase, Informix, and Teradata are the next biggest players in the market. Sybase has lost ground in the market, but their DBMS is still solid and they are firmly entrenched in the financial market. Informix, now owned by IBM is still a top notch DBMS with great performance and is still being marketed and improved by IBM. Teradata is a high-speed DBMS that is geared for data warehousing and OLAP work and you might want to choose it for those types of projects.

And what if you have several DBMSs installed and need to choose one for a new project? In that case, it is best to base your decision on internal company issues. Consider the existing support and expertise that you have in-house for each DBMS. If the project is highly visible it makes sense to go with the DBMS that is best supported by your in-house experts because they can give it the care and feeding it needs to perform optimally. Also, think about the hardware platform. For your very high availability needs go with the mainframe if you have one. After that, it is Linux and Unix… then Windows.

I haven’t talked about cost yet, but it would be an incomplete answer without at least touching on that subject. When you are examining the cost of the DBMS software do not limit the analysis to just the initial cost and on-going maintenance cost that must be paid to the DBMS vendor. Look at the total cost of ownership of the DBMS. TCO should be calculated as a combination of the license cost of the DBMS, the license cost of any required supporting software, the cost of database professionals to program, support and administer the DBMS, and the cost of the computing resources required to operate the DBMS. Also, try to factor in the reliability of the total package in terms of downtime – and factor in expected losses due to downtime if at all possible. Determining the TCO for a DBMS can be a difficult and time-consuming task without the assistance of a consultant well-versed in server and DBMS performance and operations.

You will also need to pay some attention to the various “flavors” of DBMS offerings “out there” from all of the major vendors. The DBMS you select must be appropriate for the nature and type of processing you plan to implement. Four levels of DBMS architecture can be selected: enterprise, departmental, personal, and mobile.

An enterprise DBMS is designed for scalability and high performance. It must be capable of supporting very large databases, a large number of concurrent users, and multiple types of applications. The enterprise DBMS will run on a large-scale machine, typically a mainframe or a high-end Unix, Linux, or Windows machine. Furthermore, an enterprise DBMS offers all of the “bells and whistles” available from the DBMS vendor.

A departmental DBMS, or workgroup DBMS, supports small to medium sized workgroups within an organization, and typically runs on a Unix, Linux, or Windows server. Hardware and software upgrades often can allow a departmental DBMS to tackle tasks that previously could only be performed by an enterprise DBMS. The steadily falling cost of departmental hardware and software components further contributes to lowering TCO and helping to enable a workgroup environment to scale up to serve the enterprise.

A personal DBMS is designed to be used by a single user, typically on a low- to medium-powered PC platform. Microsoft Access and Filemaker are examples of personal database software. Sometimes the low cost of a personal DBMS causes misguided attempts to choose a personal DBMS for a departmental or enterprise solution. A personal DBMS product is suited only for small scale projects and should not be used to deploy multi-user applications.

Of course, the major DBMS vendors also market free versions of their more high-powered solutions, such as DB2 Express-C and Oracle Database Express Edition. These offerings are targeted at building support for the DBMS among developers and against open source offerings. And they can be a practical alternative to a personal DBMS in that they can be deployed individually and then expand to a multi-user license as usage demands.

And the mobile DBMS is a specialized version of a departmental or enterprise DBMS. It is designed to be used by remote users who are not usually connected to the network. The mobile DBMS enables local database access and modification on a laptop or handheld device, such as a smartphone. Furthermore, the mobile DBMS provides a mechanism for synchronizing remote database changes to a centralized, enterprise or departmental database server.

A DBMS designed for one type of processing will likely be ill-suited for other uses. If your organization requires DBMS solutions at different levels, whenever possible favor the selection of a group of DBMS solutions from the same vendor. Doing so will minimize differences in access, development, and administration.

Posted in DB2, enterprise computing | 4 Comments

An Introduction to Database Design: From Logical to Physical

A proper database design cannot be thrown together quickly by novices. What is required is a practiced and formal approach to gathering data requirements and modeling data. This modeling effort requires a formal approach to the discovery and identification of entities and data elements. Data normalization is a big part of data modeling and database design. A normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately.

From Logical…

So, database design is the process of transforming a logical data model into an actual physical database. Technicians sometimes leap to the physical implementation before producing the model of that implementation. This is unwise. A logical data model is required before you can even begin to design a physical database. And the logical data model grows out of a conceptual data model. And any type of data model begins with the discipline of data modeling.

The first objective of conceptual data modeling is to understand the requirements. A data model, in and of itself, is of limited value. Of course, a data model delivers value by enhancing communication and understanding, and it can be argued that these are quite valuable. But the primary value of a data model is its ability to be used as a blueprint to build a physical database.

When databases are built from a well-designed data model the resulting structures provide increased value to the organization. The value derived from the data model exhibits itself in the form of minimized redundancy, maximized data integrity, increased stability, better data sharing, increased consistency, more timely access to data, and better usability. These qualities are achieved because the data model clearly outlines the data resource requirements and relationships in a clear, concise manner. Building databases from a data model will result in a better database implementation because you will have a better understanding of the data to be stored in your databases.

Another benefit of data modeling is the ability to discover new uses for data. A data model can clarify data patterns and potential uses for data that would remain hidden without the data blueprint provided by the data model. Discovery of such patterns can change the way your business operates and can potentially lead to a competitive advantage and increased revenue for your organization.

Data modeling requires a different mindset than requirements gathering for application development and process-oriented tasks. It is important to think “what” is of interest instead of “how” tasks are accomplished. To transition to this alternate way of thinking, follow these three “rules”:

  • Don’t think physical; think conceptual – do not concern yourself with physical storage issues and the constraints of any DBMS you may know. Instead, concern yourself with business issues and terms.
  • Don’t think process; think structure – how something is done, although important for application development, is not important for data modeling. The things that processes are being done to are what is important to data modeling.
  • Don’t think navigation; think relationship – the way that things are related to one another is important because relationships map the data model blueprint. The way in which relationships are traversed is unimportant to conceptual and logical data modeling.

Data models are typically rendered in a graphical format using an entity-relationship diagram, or E/R diagram for short. An E/R diagram graphically depicts the entities and relationships of a data model. There are many popular data modeling tools on the market from a variety of vendors. But do not confuse the tool as being more important than the process. Of what use is a good tool if you do not know how to deploy it?

A data model is built using many different components acting as abstractions of real world things. The simplest data model will consist of entities and relationships. As work on the data model progresses, additional detail and complexity is added. Let’s examine the many different components of a data model and the terminology used for data modeling.

The first building block of the data model is the entity. An entity, at a very basic level, is something that exists and is capable of being described. It is a person, place, thing, concept, or event about which your organization maintains facts. For example: “STUDENT,” “INSTRUCTOR,” and “COURSE” are specific entities about which a college or university must be knowlegeable to perform its business.

Entities are comprised of attributes. An attribute is a characteristic of an entity. Every attribute does one of three things:

  1. Describe – An attribute is descriptive if it does not identify or relate, but is used to depict or express a characteristic of an entity occurrence.
  2. Identify – An attribute that identifies is a candidate key. If the value of an identifying attribute changes, it should identify a different entity occurrence. An attribute that identifies should be unchangeable and immutable.
  3. Relate – An attribute that relates entities is a foreign key; the attribute refers to the primary key attribute of an occurrence of another (or the same) entity.

Each attribute is assigned a domain that defines the type of data, its size, and the valid values that can be assigned to the attribute. As a general rule of thumb, nouns tend to be entities and adjectives tend to be attributes. But, of course, this is not a hard and fast rule: be sure to apply of the business to determine which nouns and attributes are entities and which are attributes. Every attribute must either identify the entity occurrence, describe the entity occurrence, or relate the entity occurrence to another entity occurrence (in the same or another entity).

Relationships define how the different entities are associated with each other. Each relationship is named such that it describes the role played by an entity in its association with another (or perhaps the same) entity. A relationship is defined by the keys of the participating entities: the primary key in the parent entity and the foreign key in the dependent entity. Relationships are not just the “lines” that connect entities, but provide meaning to the data model and must be assigned useful names.

Keep in mind that as you create your data models, you are developing the lexicon of your organization’s business. Much like a dictionary functions as the lexicon of words for a given language, the data model functions as the lexicon of business terms and their usage. Of course, this short introduction just scrapes the tip of the data modeling iceberg.

…To Physical

Assuming that the logical data model is complete, though, what must be done to implement a physical database?

The first step is to create an initial physical data model by transforming the logical data model into a physical implementation based on an understanding of the DBMS to be used for deployment. To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:

  • In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.
  • Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.
  • Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.
  • Knowledge of the DBMS configuration parameters that are in place.
  • Data definition language (DDL) skills to translate the physical design into actual database objects.

Armed with the correct information, you can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step. The transformation consists of the following:

  • Transforming entities into tables
  • Transforming attributes into columns
  • Transforming domains into data types and constraints

To support the mapping of attributes to table columns you will need to map each logical domain of the attribute to a physical data type and perhaps additional constraints. In a physical database, each column must be assigned a data type. Certain data types require a maximum length to be specified. For example a character data type could be specified as CHAR(25), indicating that up to 25 characters can be stored for the column. You may need to apply a length to other data types as well, such as graphic, floating point, and decimal (which require a length and scale) types.

But no commercial DBMS product fully supports relational domains. Therefore the domain assigned in the logical data model must be mapped to a data type supported by the DBMS. You may need to adjust the data type based on the DBMS you use. For example, what data type and length will be used for monetary values if no built-in currency data type exists? Many of the major DBMS products support user-defined data types, so you might want to consider creating a data type to support the logical domain, if no built-in data type is acceptable.

In addition to a data type and length, you also may need to apply a constraint to the column. Consider a domain of integers between 1 and 10 inclusive. Simply assigning the physical column to an integer data type is insufficient to match the domain. A constraint must be added to restrict the values that can be stored for the column to the specified range, 1 through 10. Without a constraint, negative numbers, zero, and values greater than ten could be stored. Using check constraints you can place limits on the data values that can be stored in a column or set of columns.

Specification of a primary key is an integral part of the physical design of entities and attributes. A primary key should be assigned for every entity in the logical data model. As a first course of action you should try to use the primary key as selected in the logical data model. However, multiple candidate keys often are uncovered during the data modeling process. You may decide to choose a primary key other than the one selected during logical design – either one of the candidate keys or another surrogate key for physical implementation. But even if the DBMS does not mandate a primary key for each table it is a good practice to identify a primary key for each physical table you create. Failure to do so will make processing the data in that table more difficult.

Of course, there are many other decisions that must be made during the transition from logical to physical. For example, each of the following must be addressed:

  • The nullability of each column in each table
  • For character columns, should fixed length or variable length be used?
  • Should the DBMS be used to assign values to sequences or identity columns?
  • Implementing logical relationships by assigning referential constraints
  • Building indexes on columns to improve query performance
  • Choosing the type of index to create: b-tree, bit map, reverse key, hash, partitioning, etc.
  • Deciding on the clustering sequence for the data
  • Other physical aspects such as column ordering, buffer pool specification, data files, denormalization, and so on.

Summary

A logical data model should be used as the blueprint for designing and creating a physical database. But the physical database cannot be created properly with a simple logical to physical mapping. Many physical design decisions need to be made by the DBA before implementing physical database structures. This may necessitate deviating from the logical data model. But such deviation should occur only based on in-depth knowledge of the DBMS and the physical environment in which the database will exist.

Posted in data integrity, data modeling, DBA | 6 Comments

Big Data and 150 Trillion Calculations Per Second at Vestas (#IODGC)

IBM highlighted Danish energy company Vestas at their Information On Demand (IOD) 2011 conference in Las Vegas today. Vestas uses IBM big data analytics software to improve its wind turbine placement. The placements of wind turbines is a significant challenge for Vestas. During the IOD general session, Lars Christensen, VP of Plant Siting and Forecasting for Vestas, likened choosing the wrong location for a wind turbine to “moving the Hoover Dam.” That did a great job of describing the challenge!

Vestas employs 23,000 folks and is the world leader in providing high-tech wind power systems. The company has supplied over 44,500 wind turbines in 67 countries. Vestas’ wind library contains more than 2.8 petabytes of information tracking wind hour by hour over the past 12 years. That’s a lot of data. But the company relies on it to properly place its turbines. Obviously, they want to put them in windy places to generate energy.

Vestas is using IBM BigInsights software and an IBM “Firestorm” supercomputer to analyze this structured and unstructured data. In addition to wind data, the company also tracks tidal phases, geospatial and sensor data, satellite images, deforestation maps, and weather modeling research. The analysis of all of this data, which used to take weeks, can be done in less than an hour using the IBM system.

“Using IBM software and systems, we can now answer these questions quickly to identify new markets for wind energy and help our clients meet aggressive renewable energy goals,” said Christensen.

The software, IBM InfoSphere BigInsights, is the result of a four year effort of more than 200 IBM Research scientists. It is powered by Hadoop. BigInsights provides the framework for large scale parallel processing and scalable storage for terabyte to petabyte-level data plus the ability to enable “what-if” scenarios.

Vestas is running BigInsights on 1,222 connected, workload optimized SYstem x DataPlex servers that collectively make up the “Firestorm” supercomputer… which can perform 150 trillion calculations per second.

This is a real-life scenario of Big Data and how directing the right resources and solutions at the data can solve big problems.

Posted in analytics | Tagged , , , | 1 Comment