Data Technology Today’s 2016 Year in Review

Well, another year has come and gone and I thought it might be interesting to share a bit about this blog’s activity in 2016. It was an active year that saw 17 new posts, down a bit from 2015, but still averaging more than a post a month.

Posts on the blog were viewed 47,264 times by 36,870 visitors, meaning each visitor averaged 1.28 views.

The most popular post in 2016 was actually first posted in 2011: An Introduction to Database Design: From Logical to Physical was viewed 10,575 times in 2016. Obviously database design is an interesting topic — at least for the reader’s of this blog!

The second most popular post in 2016 was On The Importance of Database Backup and Recovery, which was first posted in 2014.  The most popular post actually posted in 2016 was published in December, late in the year to lead the year, but evidently people are interested in A Useful Guide to Data Fundamentals from Fabian Pascal. As well they should be!

And the blog gets read all over the world, as shown in the Top Ten Countries visiting in 2016 below:

countries

Yes, most of my readers are from the United States, but I’m proud of the following I have in India (and across the world).

So to end this brief synopsis of 2016, thank you to all of my regular readers – please keep visiting and suggesting more topics for 2017 and beyond. And if this is your first visit to the blog, welcome. Take some time to view the historical content – there are several informative posts that are popular every year… and keep checking back for new content on data, database, and related topics!

Posted in backup & recovery, DBA, review | Leave a comment

A Good Start for Your SQL Library

Every professional programmer (and DBA) should have a library of books on SQL fundamentals. There are many SQL titles to choose from, and a lot of them are very good. But you can’t buy them all unless you are independently wealthy. So this blog post will highlight the first four SQL books that should be on every database professional’s bookshelf.

The first SQL book is SQL Performance Tuning by Peter Gulutzan and Trudy Pelzer. This well-written book provides a treasure trove of tips for improving SQL performance on all of the major database systems. It does not teach SQL syntax, but instead helps the reader to understand the differences between the most popular DBMS products, including Oracle, DB2, SQL Server, Sybase ASE, MySQL, Informix, Ingres, and even InterBase.

Throughout this book the authors present and test techniques for improving SQL performance, and grade each technique for its usefulness on each of the major DBMSs. If you deal with heterogeneous database implementations this book will be a great assistance, whether you are a programmer, consultant, DBA, or technical end user. The contents of this book can help you to decide which tuning techniques will work for which DBMS.

My next SQL book recommendation is altogether different in purpose than the first. It is SQL in a Nutshell, 3rd edition by Kevin Kline, Daniel Kline, and Brand Hunt. This book offers a great cross-platform syntax reference for SQL. It probably is not the easiest reference to use for finding the exact syntax for one particular DBMS; but it is absolutely the best reference for those who work with multiple DBMSs.  Be sure to get the 3rd edition, which is up-to-date and offers more depth than the previous editions.

Next up is The Art of SQL by Stephane Faroult, which is a guide to SQL written using the approach of “The Art of War” by Sun-Tzu. The author actually uses the exact same title chapters for The Art of SQL that Sun Tzu used in The Art of War. Amazingly enough, the tactic works.

Consider, for example, the chapter titled “Laying Plans,” in which Faroult examines how to design databases for performance. As anyone who ever built database applications knows an improperly designed database can be the biggest impediment to flawless application performance. The chapter titled “Tactical Dispositions” covers the topic of indexing and in “The Nine Situations” the author examines several classic SQL patterns and how best to approach them.

This book is not for a novice who wants to learn SQL from scratch. The authors assume the reader is conversant with SQL as they describe how to apply SQL in a practical manner. If you can’t code an outer join or don’t know what a nested table expression or in-line view is, then this is not the book for you.

Neither is the book a list of SQL scripts that you can pluck out and use. Instead, the book skillfully manages to explain how to properly attack the job of coding SQL to effectively and efficiently access your data.

The Art of SQL skillfully manages to explain how to properly attack the job of coding SQL to effectively and efficiently access your data. The book offers best practices that teach experienced SQL users to focus on strategy rather than specifics.” You know, if Sun Tzu coded SQL, he might have written a book like “The Art of SQL”. But since Sun Tzu is dead, I’m glad Stephane Faroult was around to author this.

The final SQL book recommendation is the latest edition of Joe Celko’s SQL for Smarties, Fifth Edition: Advanced SQL Programming. Celko was a member of the ANSI SQL standards committee for ten years, and is highly qualified to write such a text. The latest edition of this fine book is the 5th edition, which was completely revised (in 2015) and boasts over 800 pages of advanced SQL programming techniques. If you have any of the past editions of this book, you owe it to yourself to get the newly revised fourth edition.

This book offers tips, techniques, and guidance on writing effective, sometimes complex, SQL statements  using ANSI standard SQL. It touches on topics ranging from database design and normalization to using proper data types to grouping and set operations, optimization, data scaling, and more. Every developer who codes SQL statements for a living will find something useful in SQL for Smarties!

These four books, properly used, can turn a fledgling SQL developer into an expert – and they can assist even the expert to become a better user of SQL.

Posted in books, DBA, performance, SQL | Leave a comment

A Useful Guide to Data Fundamentals from Fabian Pascal

Today’s blog post is a quick review of Fabian Pascal’s latest book The DBDEBUNK Guide to Misconceptions and Data Fundamentals. Those of you who know Pascal will know what to expect — a no-holds barred treatise on the fundamentals of data and database management focusing on the relational model and its benefits. But also you will get a true understanding of what relational means. You see, what is commonly called a relational DBMS more accurately should be referred to as a SQL DBMS. If that doesn’t make sense to you then you are exactly the type of reader who should buy and read this book.

The book is self-published, so it is only available at Mr. Pascal’s website, dbdebunk.com. But the material is well-written and laid out in an easy-to-consume fashion. This comes as no surprise given Pascal’s extensive history and background writing about relational theory and databases (multiple decades of books, articles, blogs, etc. to his credit).

What makes this book special? Well, the material is culled from the the DATABASE DEBUNKINGS website. There are 50 chapters each exposing a common misconception or misunderstanding about data and relational fundamentals. Pascal shows the misconception and then clearly explains the problems and implications that arise because of it. Furthermore, he goes on to provide an explanation of the correct way. This approach, once you get comfortable with it, offers a sound format for exposed fallacies and correcting people’s misunderstanding of the issues.

As the foreword  notes, there is an Index of Misconceptions rather than a a Table of Contents. You can use that index to seek out sections of the book that focus on specific misconceptions, such as on keys or unstructured data.

Also nice is the list of further reading provided in each section. If you read (and understand) not only this book, but also all of the referenced materials you will probably know more about data and database systems than most people working as data analysts and DBAs.

If your job requires you to manage, access or manipulate data, you would do well to read Pascal’s Guide. Even if you think you are an expert on relational theory and data fundamentals I am sure that the information in this powerful tome will offer up useful information. And for most people, who think they know more than they actually do, this book will deliver a wealth of knowledge that will serve you well as you progress with your IT career.

Posted in book review, DBA, relational | 1 Comment

A Database Performance Tuning Primer

In a recent series of posts to this blog I outlined a dozen SQL rules of thumb to follow as you design and code your database applications. These articles can be found at the following links:

Rules of thumb are great but let’s face it, we don’t always follow them. And then we get performance degradation. That means we have to track down the root cause of the problem, analyze the situation and apply corrective measures. This can be difficult without a performance management tool or two.

Database performance management tools can be used to identify bottlenecks and points of contention. These tools can help to show locking, I/O, CPU and other problems as your applications are running; or historically at a point in time if you have appropriate traces started and/or have implemented a performance data warehouse.

Such tools can be beneficial for determining where resources are being used. For example, do you know which programs consume the most database resources? Which consumes the most CPU, takes the most I/Os, issues the most locks, deletes the most data, etc.? Performance monitors can help you to accurately identify all of these issues. You can monitor system and database workload and throughput and manage the resources being used to satisfy your workload.

Database performance management tools can be broken up into three specific functional groupings based on what is being monitored and managed. Some of these tools offer functionality in only one of the following areas, whereas others deliver combined functionality:

  1. The DBMS. Tools that manage DBMS performance are focused more on database system resources and performance. They provide information about performance at the system level, showing things like CPU and memory consumption, buffer usage, locking issues, storage and other hardware information, configuration parameters, and other details.
  2. Database structures. The design of databases, tables and indexes can also impact database performance. Issues that must be monitored and managed include the physical design of the database, number of tables, index design and DDL, disk usage, and so on. How the data is organized must also be managed. And as data is modified in the database, its efficiency will degrade. Reorganization and defragmentation are required to periodically remedy disorganized data.
  3. SQL and application code. Coding efficient SQL statements can be complicated because there are many different ways to write SQL that return the same results. But the efficiency and performance of each formulation can vary significantly. DBAs need tools that can monitor the SQL code that’s being run, show the access paths it uses and provide guidance on how to improve the code.

I recently wrote a four part series of articles for TechTarget on database performance management offerings that delves into more details on these issues. If interested, you can check them out at the following links:

Good luck as you embark upon your database performance tuning and management efforts!

 

Posted in performance | Leave a comment

Managing Database Application Performance

A database application requires constant interaction between various disparate computing resources to operate efficiently and according to specifications. Realistically, the tuning of a database application can be broken down into three components: system tuning, database tuning, and application tuning. All of these are related, and database performance management requires an integrated approach to tuning across each of these areas.

The DBA needs to understand how the DBMS interacts with the server hardware, the operating system, and any other required software. Tuning and configuring these components and connections properly can have a dramatic impact on system performance. Therefore, system tuning occurs at the highest level and can have the greatest impact on the overall health of database applications. The reason for this is that every database application runs on the overall system. No amount of tuning is going to help a database application when the server it is running on is short on resources or improperly installed. Memory configuration is perhaps the single most important aspect of system tuning. Without sufficient memory, the entire DBMS system can grind to a halt. And with today’s modern, in-memory database systems, memory management becomes even more crucial!

Another significant system bottleneck can be the database log. Every database modification is logged by the DBMS; an inefficient log can impact every application that modifies data. System tuning also includes managing and connecting other systems software with which the DBMS interacts, including the operating system, networking software, message queuing systems, middleware, and transaction processors.

Database application performance can be impacted by the physical design of the database, including normalization issues, how the database is stored on disk, number of tables, index design, and proper usage of DDL and its associated parameters. The actual physical location of database files on disk systems will have an impact on the performance of applications accessing the data. As more data is stored on the same disk device, the possibility of concurrent access and performance degradation increases. As data is inserted, updated, and deleted from the database, the efficiency of the database will degrade. Additionally, the files used to hold the data of the database may need to expand as data is added. Or perhaps, additional files, or file extents, will need to be allocated. Both disorganization and file growth can degrade performance. Indexes also need to be monitored, analyzed, and tuned to optimize data access, and to ensure that they are not having a negative impact on data modification.

The application code itself must be designed appropriately and monitored for efficiency. In fact, most experts agree that as much as 70 to 80 percent of performance problems are caused by improperly coded database applications. SQL is the primary culprit. Coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate SQL statements and SQL statements must be constantly monitored and tuned.

Programmers need to be schooled in the practice of examining SQL access paths. Every relational DBMS allows the programmer to request information on how the database optimizer will satisfy each query. Will an index be used? In what order will the tables be joined? Will the query be broken up into parallel tasks or not? These and many other factors influence the efficiency of SQL. Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded also can be inefficient, causing database application performance to suffer.

Managing the performance of your database applications requires in-depth monitoring. Be sure to allocate an appropriate budget to acquire performance management tools to ensure the efficiency of your database systems.

Posted in DBA, performance, SQL, tools | Leave a comment

A Dozen SQL Rules of Thumb, Part 3

Today we pick up our three-part series of SQL rules of thumb (ROTs) with the third and final installment… You can think of these rules as general guiding principles you should follow as your write SQL statements… and we start off today’s post with rule #9…

Rule 9: Know What Works Best

The flexibility of SQL allows the same process to be coded in multiple ways. However, one way of coding usually provides better performance than the others. The DBA should understand the best way to code SQL for each DBMS in use. Furthermore, the DBA should provide information on proper query formulation for performance to the application development staff.

Keep in mind that these rules are DBMS-specific. By that I mean, one way of writing SQL might perform well on SQL Server but not on Oracle, which performs better with a different form of SQL returning the same data.

Rule 10: Issue Frequent COMMITs

When coding programs to run as batch transactions, it is important to issue regular SQL COMMIT statements. The COMMIT statement hardens modifications to the database. When a COMMIT is issued, locks on the modified database objects and data can be released. If you write programs that make a lot of changes, but do not issue periodic COMMITs, then you will be locking data and negatively impacting concurrent access to the data. I call this Bachelor Programming Syndrome (you know, fear of committing).

An additional consideration for Oracle DBAs is the impact of a COMMIT on the rollback segments. Rollback segments are used by Oracle to store completed transactions before the changes are actually written to the table. When you issue a COMMIT in Oracle, not only is the data finalized to the table but the contents of the rollback segment are removed, too. Oracle rollback segments are used to store before images of the data in case transactions are rolled back before changes are committed.

As a DBA you must ensure that application developers issue enough COMMIT statements to minimize the impact of locking on availability and (for Oracle) to keep rollback segments to a manageable size.

Rule 11: Beware of Code Generators

Beware of application code generators and similar tools that automatically create SQL. Many of these tools use gateways that require each SQL statement to be recompiled and optimized each time it is requested. However, some gateways provide a caching mechanism to store compiled and optimized SQL on the server. Such a cache can be help to improve performance for frequently recurring SQL statements.

Additionally, many code generators create accurate SQL, but not necessarily efficient SQL. So you might need to tweak the SQL that is generated (if that is even possible).

Rule 12: Consider Stored Procedures

Performance degradation due to repeated network traffic can be minimized by using a stored procedure because only a single request is needed to execute it. Within the stored procedure, multiple SQL statements can be issued, and the results processed and sent to the requesting program or user. Without the stored procedure, each of the multiple SQL statements, as well as all of the results, would have to be sent across the network. Additionally, SQL in stored procedures may perform better than the same SQL outside of the stored procedure if the DBMS parses and compiles the statements before run time.

Synopsis

These twelve SQL rules of thumb across three blog posts provide a sound basis for SQL development. Of course, they offer high-level guidance and are not the only things that you need to be aware of, and follow, as you strive to build efficient database applications.

So start here, but expand your knowledge base and keep learning about how you can write effective and efficient SQL for your database applications!

 

Posted in DBA, performance, SQL | 1 Comment

A Dozen SQL Rules of Thumb, Part 2

Catch Phrases 47

Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.

These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…

Rule 5: Avoid Cartesian Products

Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.

Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.

Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!

Rule 6: Judicious Use of OR

The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position = ‘MANAGER’
 OR     position = ‘DIRECTOR’
 OR     position = ‘VICE PRESIDENT’
 ORDER BY position;

to this:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position IN (‘MANAGER’, ‘DIRECTOR’, ‘VICE PRESIDENT’)
 ORDER BY position;

Of course, your results may vary depending on the DBMS in use and the nature of the data.

Rule 7: Judicious Use of LIKE

The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘%X’
 ORDER BY position;

This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.

You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘A%X’
 OR     dept_id LIKE ‘B%X’
 ORDER BY position;

 

In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.

Once again, your results will vary with the DBMS in use and the nature of the data accessed.

Rule 8: Avoid Sorts When Possible

Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).

Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.

Conclusion

And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!

Posted in DBA, performance, SQL | 3 Comments

A Dozen SQL Rules of Thumb, Part 1

today’s blog post we will examine some rules of thumb that apply generally to SQL development regardless of the underlying DBMS. These are the general guiding principles by which your SQL development should be guided…

Rule 1: “It Depends!”

The answer to every question about database performance is “It depends.” A successful DBA (and programmer) will know on what it depends. For example, if someone asks, “What is the best access path for my SQL query?” the best answer is “It depends.” Why? Well, that is more difficult to answer.

If every row must be returned, a table scan is likely to be more efficient than indexed access. However, if only one row is to be returned, direct index lookup will probably perform best. For queries that return between one and all rows, the performance of access paths will depend on how the data is clustered, which version of the DBMS is in use, whether parallelism can be invoked, and so forth.

Be skeptical of tuning tips that use the words “always” or “never.” Just about everything depends on other things.

Rule 2: Be Careful What You Ask For

The arrangement of elements within a query can change query performance. To what degree depends on the DBMS in use and whether rule-based optimization is used.
A good rule of thumb, regardless of DBMS, is to place the most restrictive predicate where the optimizer can read it first. In Oracle, the optimizer reads WHERE clauses from the bottom up, therefore, the most restrictive predicate should be put at the bottom of the query. It is just the opposite in DB2.

Placing the most restrictive predicate where the optimizer can read it first enables the optimizer to narrow down the first set of results before proceeding to the next predicate. The next predicate will be applied to the subset of data that was selected by the most selective condition, instead of against the entire table.

And keep in mind that these things can change from release to release of a DBMS, so keep up with each new version, release and fixpack to make sure you understand what has been changed and how it might impact your SQL.

Rule 3: KISS

A rule of thumb for all types of IT activities is to follow the KISS principle: Keep it simple, Stupid. However, in the world of SQL there is a trade-off between simplicity and performance.

Keeping SQL simple makes development and maintenance tasks easier. A simple SQL statement is easier to decipher and easier to change. With simple SQL, application developers can perform their job more easily than with complex SQL.

Nevertheless, complex SQL can outperform simple SQL. The more work that can be performed by the DBMS and the optimizer, the better performance is likely to be. Let’s look at an example: Some programmers avoid joins by coding multiple SQL SELECT statements and joining the data using program logic. The SQL is simpler because the programmer need not understand how to write SQL to join tables. However, SQL joins usually outperform program joins because less data is returned to the program.

Furthermore, the relational optimizer can change the join methodology automatically if the database or data changes. Conversely, program logic must be changed manually by a skilled programmer.

Rule 4: Retrieve Only What is Needed

As simple as this rule of thumb sounds, you might be surprised at how often it is violated. To minimize the amount of data returned by your SQL statements, be sure to specify the absolute minimum number of columns in the SELECT list. If the column is not needed to satisfy the business requirement, do not request it to be returned in the result set.
Specify the absolute minimum number of columns in the SELECT list.

Programmers frequently copy SQL statements that work well to use as templates for new statements. Sometimes the programmer will forget to trim down the number of columns requested when they only need a subset of the columns in the original query. This can adversely impact performance. The more columns that must be returned by the DBMS, the greater the processing overhead.

Another common problem is requesting unnecessary data. Consider the following SQL statement:

SELECT position, last_name, empno
FROM   employee
WHERE  last_name = 'SMITH';

There is no reason to specify the last_name column in the SELECT list of this SQL statement. We know that last_name must be ‘SMITH’ for the entire result set because of the WHERE clause.

Returning only what is needed does not apply only to columns. You should also minimize the number of rows to be returned by coding the proper WHERE clauses for every SQL statement. The more data that can be filtered out of the result set by the DBMS, the more efficient the query will be because less data must be returned to the requester.

Sometimes application programmers avoid coding appropriate WHERE clauses in a misguided attempt to simplify SQL statements. The more information the optimizer has about the data to be retrieved, the better the access paths it formulates will be. A sure sign of potential abuse is finding a SQL statement embedded in an application program that is immediately followed by a series of IF-THEN-ELSE statements. Try to tune the query by moving the IF-THEN-ELSE statements into SQL WHERE clauses.

Conclusion

And so concludes the first four of our 12 SQL rules of thumb… tune in next time for the next 4 rules (numbers 5 thru 8) in our dozen guidelines for developing SQL that works and performs well…

Posted in DBA, Rules of Thumb, SQL | 1 Comment

Designing Applications for Relational Access

Application design is an important component of assuring relational database performance and efficiency. When performance problems persist it may become necessary to revisit application design. But redesigning and re-coding your applications can be time-consuming and costly, so it is better to properly address good design from the outset.

Design issues to examine when application performance suffers include:

  • Type of SQL. Is the correct type of SQL (planned or unplanned, dynamic or static, embedded or stand-alone) being used for this particular application?
  • Programming language. Is the programming language capable of achieving the required performance, and is the language environment optimized for database access?
  • Transaction design and processing. Are the transactions within the program properly designed to assure ACID properties, and does the program use the transaction processor of choice appropriately and efficiently?
  • Locking strategy. Does the application hold the wrong type of locks, or does it hold the correct type of locks for too long?
  • COMMIT strategy. Does each application program issue SQL COMMIT statements to minimize the impact of locking?
  • Batch processing. Are batch programs designed appropriately to take advantage of the sequential processing features of the DBMS?
  • Online processing. Are online applications designed to return useful information and to minimize the amount of information returned to the user’s screen for a single invocation of the program?
Posted in DBA, performance | Leave a comment

Transforming Mainframe Data Management

Just a quick post today to let my readers — specifically those in the US Midwest — that I will be presenting at an executive briefing at the River Forest Country Club in Elmhurst, IL  (Chicago suburb) on August 23, 2016. The event will address the challenges of data management for the digital economy and will look at issues such as:

  • Simplified data management with intelligent automation that requires no downtime.
  • Increased application performance.
  • Data integrity of structured and unstructured data.
  • The ability to easily manage growing databases.

My presentation will take a look at the trends and challenges being faced by data professionals as they attempt to manage the burgeoning amount of data that modern organizations collect, manage and turn into actionable intelligence. I will be joined at the event by other great speakers including Sherly Larsen and John Barry who will talk about new technology and innovations from BMC Software to meet those challenges, including BMC’s Next Generation Technology for DB2 utility management.

So if you live in the Midwest area, be sure to register for this event and check up on the latest data trends and techniques for dealing with modern data management on the mainframe

Posted in data, DB2, mainframe, tools | Leave a comment