News from IBM InterConnect 2017

This week I am in Las Vegas for the annual IBM InterConnect conference. IBM touts the event as a way to tap into the most advanced cloud technology in the market today. And that has merit, but there is much more going on here.

If I had to summarize the theme of InterConnect I would say that it is all about cloud, IBM’s Watson, and DevOps. This is evident in terms of the number of sessions being delivered on these topics, as well as the number of vendors and IBM booths in the concourse area devoted to these topics.

But the highlight of the conference for me, so far, was Ginni Rometty’s keynote address on Tuesday. She was engaging and entertaining as she interacted with IBM customers and partners to weave the story of IBM’s cloud and cognitive computing achievements. The session is available to for replay on IBMGO and it is well worth your time to watch it if you are at all interested in how some of the biggest and most innovative organizations are using IBM technology to gain competitive advantage.

And let’s not forget that Will Smith – yes, that Will SmithWill Smith – was part of the general session on Monday. Not surprisingly, he was intelligent and amusing calling himself an African-American Watson as he described how he used primitive data analytics to review the types of movies that were most successful as he planned his acting career. My favorite piece of advice he offered was something that he learned as he moved from music to acting. When he was asked if he had ever acted before (he hadn’t) he said “Of course,” and it led to him getting case in the mega-hit sitcom The Fresh Prince of Bel-Aire. His advice? “If someone asks if you have ever done something just say ‘yes’ and figure it out later.” He had a lot more to say, but let me send you here if you are interested in reading more about Will.

Of course, there is a lot more going on here than just what is happening in the keynote and general sessions. Things I’ve learned this week include:

  • DevOps is as much about business change as technology change
  • The largest area of growth for DevOps is now on the mainframe (according to Forrester Research)
  • Some companies are bringing college grads up to proficiency in mainframe COBOL in less than a month using a modern IDE
  • Networking is the hidden lurking problem in many cloud implementations
  • The mainframe is not going away (I knew this, but it was good to hear a Forrester analyst say it)
  • And a lot more

But that is enough for now. So to conclude, I’d like to end with a quote from Ginni Rometty that I think all of us in IT should embrace: “Technology is never good or bad; it is what you do with it that makes a difference.”

Let’s all get to work and do good things with technology!

Posted in cloud, DevOps, IBM, Watson | Leave a comment

Inside the Data Reading Room – Analytics Edition

If you are a regular reader of this blog you know that, from time-to-time, I review data-related books. Of course, it has been over a year since the last book review post, so this post is long overdue.

Today, I will take a quick look at a couple of recent books on analytics that might pique your interest. First up, is The Data and Analytics Playbook by Lowell Fryman, Gregory Lampshire and Dan Meers (2017, Morgan Kaufmann, ISBN 978-0-802307-5).

This book is written as a guide to proper implementation of data management methods and procedures for modern data usage and exploitation.

The first few chapters lay the groundwork and delve into the need for a new approach to data management that embraces analytics. Then, in Chapter 3, the authors guide the reader through steps to assess their current conditions, controls and capabilities with regard to their data. The material here can be quite helpful to assist you in gauging where your organization falls in terms of data maturity. Chapter 4, which chronicles the detailed activities involved in building a data and analytics framework comprises about a quarter of the book and this chapter alone can give you a good ROI on your book purchase.

Chapter 8 is also well done. It addresses data governance as an operations process, giving advice and a framework for successful data governance. If you are at all involved in your organization’s data management and analytics practice, do yourself a favor and grab a copy of this book today.

The second book I will cover today is a product-focused book on IBM’s Watson Analytics product. Most people have heard of IBM’s Watson because of the Jeopardy challenge. But if your only knowledge of Watson is how it beat Jeopardy champions at the game several years ago, then you need to update what you know!

So what book can help? How about Learning IBM Watson Analytics by James D. Miller (2016, Packt Publishing, ISBN 978-1-78588-077-3)?

This short book can help you to understand what Watson can do for your organization’s analytics. It shows how to access and configure Watson and to develop use cases to create solutions for your business problems.

If you are a nascent user of Watson, or are just looking to learn more about what Watson can do, then this is a superb place to start. Actually, if you learn best through books, then this is the only place to start because it is currently the only book available on IBM Watson Analytics.

As with any technology book that walks you through examples and screen shots, as the product matures over time, things may look different when you actually use Watson. But that is a small issue that usually won’t cause distraction. And with all of the advice and guidance this book offers in terms of designing solutions with Watson, integrating it with other IBM solutions, and more, the book is a good place to start your voyage with Watson.

Hopefully, you’ll find these two books as interesting and worthwhile as I did!

Posted in analytics, book review, books, DBA, Watson | Leave a comment

Time to Plan Your Trip to IBM InterConnect 2017

I am looking forward to attending this year’s IBM InterConnect conference in Las Vegas, NV the week of March 19-23, 2017. And after reading my blog post today I bet you will be interested in attending, too!


The first thing you will notice is that IBM InterConnect covers a plethora of technical topics, including some of the hottest and most important for your business. If you attend the conference you can learn abour Hybrid Cloud, Process Transformation, Integration, Internet of Things, DevOps, IT Service Management, Security, Data Management, and more.  There are educational presentations as well as hands-on sessions that allow you to build your conference experience how you’d like, using the learning techniques that best suit you.

And there are a lot of learning opportunities! IBM InterConnect has over 2,000 sessions, 200 exhibitors, hundreds of labs, certification opportunities, as well as the ability to network with other IT professionals from all around the world.

For me, there are several sessions that I’m very much looking forward to attending. The Continuous Delivery keynote on March 20th promises to inform and educate on DevOps best practices including a roadmap for IBM’s UrbanCode. On Tuesday I’m excited about the session on How Watson “Really Works… I mean who wouldn’t be interested in learning more about the AI, natural language and machine learning capabilities of IBM Watson? And Wednesday offers an intriguing session for mainframers like me – “Why z/OS is a Great Platform for Developing and Hosting APIs.”

Of course, there are a lot of additional sessions that I plan to attend, but I doubt anybody is interested in a rundown of my entire agenda. Especially with so much variety and choice available to attendees this year. And if you get stuck choosing from all the great sessions that are available, this year you can solicit Watson’s help recommending sessions as you build your agenda. I tried it and it was interesting and helpful to see what Watson chose for me.

So take a look at what IBM InterConnect has to offer this year. And if you plan on attending I hope we get a chance to meet and discuss our experiences at the conference. See you in Vegas!

Posted in certification, cloud, education, enterprise computing, IoT | Leave a comment

Data Technology Today’s 2016 Year in Review

Well, another year has come and gone and I thought it might be interesting to share a bit about this blog’s activity in 2016. It was an active year that saw 17 new posts, down a bit from 2015, but still averaging more than a post a month.

Posts on the blog were viewed 47,264 times by 36,870 visitors, meaning each visitor averaged 1.28 views.

The most popular post in 2016 was actually first posted in 2011: An Introduction to Database Design: From Logical to Physical was viewed 10,575 times in 2016. Obviously database design is an interesting topic — at least for the reader’s of this blog!

The second most popular post in 2016 was On The Importance of Database Backup and Recovery, which was first posted in 2014.  The most popular post actually posted in 2016 was published in December, late in the year to lead the year, but evidently people are interested in A Useful Guide to Data Fundamentals from Fabian Pascal. As well they should be!

And the blog gets read all over the world, as shown in the Top Ten Countries visiting in 2016 below:


Yes, most of my readers are from the United States, but I’m proud of the following I have in India (and across the world).

So to end this brief synopsis of 2016, thank you to all of my regular readers – please keep visiting and suggesting more topics for 2017 and beyond. And if this is your first visit to the blog, welcome. Take some time to view the historical content – there are several informative posts that are popular every year… and keep checking back for new content on data, database, and related topics!

Posted in backup & recovery, DBA, review | Leave a comment

A Good Start for Your SQL Library

Every professional programmer (and DBA) should have a library of books on SQL fundamentals. There are many SQL titles to choose from, and a lot of them are very good. But you can’t buy them all unless you are independently wealthy. So this blog post will highlight the first four SQL books that should be on every database professional’s bookshelf.

The first SQL book is SQL Performance Tuning by Peter Gulutzan and Trudy Pelzer. This well-written book provides a treasure trove of tips for improving SQL performance on all of the major database systems. It does not teach SQL syntax, but instead helps the reader to understand the differences between the most popular DBMS products, including Oracle, DB2, SQL Server, Sybase ASE, MySQL, Informix, Ingres, and even InterBase.

Throughout this book the authors present and test techniques for improving SQL performance, and grade each technique for its usefulness on each of the major DBMSs. If you deal with heterogeneous database implementations this book will be a great assistance, whether you are a programmer, consultant, DBA, or technical end user. The contents of this book can help you to decide which tuning techniques will work for which DBMS.

My next SQL book recommendation is altogether different in purpose than the first. It is SQL in a Nutshell, 3rd edition by Kevin Kline, Daniel Kline, and Brand Hunt. This book offers a great cross-platform syntax reference for SQL. It probably is not the easiest reference to use for finding the exact syntax for one particular DBMS; but it is absolutely the best reference for those who work with multiple DBMSs.  Be sure to get the 3rd edition, which is up-to-date and offers more depth than the previous editions.

Next up is The Art of SQL by Stephane Faroult, which is a guide to SQL written using the approach of “The Art of War” by Sun-Tzu. The author actually uses the exact same title chapters for The Art of SQL that Sun Tzu used in The Art of War. Amazingly enough, the tactic works.

Consider, for example, the chapter titled “Laying Plans,” in which Faroult examines how to design databases for performance. As anyone who ever built database applications knows an improperly designed database can be the biggest impediment to flawless application performance. The chapter titled “Tactical Dispositions” covers the topic of indexing and in “The Nine Situations” the author examines several classic SQL patterns and how best to approach them.

This book is not for a novice who wants to learn SQL from scratch. The authors assume the reader is conversant with SQL as they describe how to apply SQL in a practical manner. If you can’t code an outer join or don’t know what a nested table expression or in-line view is, then this is not the book for you.

Neither is the book a list of SQL scripts that you can pluck out and use. Instead, the book skillfully manages to explain how to properly attack the job of coding SQL to effectively and efficiently access your data.

The Art of SQL skillfully manages to explain how to properly attack the job of coding SQL to effectively and efficiently access your data. The book offers best practices that teach experienced SQL users to focus on strategy rather than specifics.” You know, if Sun Tzu coded SQL, he might have written a book like “The Art of SQL”. But since Sun Tzu is dead, I’m glad Stephane Faroult was around to author this.

The final SQL book recommendation is the latest edition of Joe Celko’s SQL for Smarties, Fifth Edition: Advanced SQL Programming. Celko was a member of the ANSI SQL standards committee for ten years, and is highly qualified to write such a text. The latest edition of this fine book is the 5th edition, which was completely revised (in 2015) and boasts over 800 pages of advanced SQL programming techniques. If you have any of the past editions of this book, you owe it to yourself to get the newly revised fourth edition.

This book offers tips, techniques, and guidance on writing effective, sometimes complex, SQL statements  using ANSI standard SQL. It touches on topics ranging from database design and normalization to using proper data types to grouping and set operations, optimization, data scaling, and more. Every developer who codes SQL statements for a living will find something useful in SQL for Smarties!

These four books, properly used, can turn a fledgling SQL developer into an expert – and they can assist even the expert to become a better user of SQL.

Posted in books, DBA, performance, SQL | Leave a comment

A Useful Guide to Data Fundamentals from Fabian Pascal

Today’s blog post is a quick review of Fabian Pascal’s latest book The DBDEBUNK Guide to Misconceptions and Data Fundamentals. Those of you who know Pascal will know what to expect — a no-holds barred treatise on the fundamentals of data and database management focusing on the relational model and its benefits. But also you will get a true understanding of what relational means. You see, what is commonly called a relational DBMS more accurately should be referred to as a SQL DBMS. If that doesn’t make sense to you then you are exactly the type of reader who should buy and read this book.

The book is self-published, so it is only available at Mr. Pascal’s website, But the material is well-written and laid out in an easy-to-consume fashion. This comes as no surprise given Pascal’s extensive history and background writing about relational theory and databases (multiple decades of books, articles, blogs, etc. to his credit).

What makes this book special? Well, the material is culled from the the DATABASE DEBUNKINGS website. There are 50 chapters each exposing a common misconception or misunderstanding about data and relational fundamentals. Pascal shows the misconception and then clearly explains the problems and implications that arise because of it. Furthermore, he goes on to provide an explanation of the correct way. This approach, once you get comfortable with it, offers a sound format for exposed fallacies and correcting people’s misunderstanding of the issues.

As the foreword  notes, there is an Index of Misconceptions rather than a a Table of Contents. You can use that index to seek out sections of the book that focus on specific misconceptions, such as on keys or unstructured data.

Also nice is the list of further reading provided in each section. If you read (and understand) not only this book, but also all of the referenced materials you will probably know more about data and database systems than most people working as data analysts and DBAs.

If your job requires you to manage, access or manipulate data, you would do well to read Pascal’s Guide. Even if you think you are an expert on relational theory and data fundamentals I am sure that the information in this powerful tome will offer up useful information. And for most people, who think they know more than they actually do, this book will deliver a wealth of knowledge that will serve you well as you progress with your IT career.

Posted in book review, DBA, relational | 1 Comment

A Database Performance Tuning Primer

In a recent series of posts to this blog I outlined a dozen SQL rules of thumb to follow as you design and code your database applications. These articles can be found at the following links:

Rules of thumb are great but let’s face it, we don’t always follow them. And then we get performance degradation. That means we have to track down the root cause of the problem, analyze the situation and apply corrective measures. This can be difficult without a performance management tool or two.

Database performance management tools can be used to identify bottlenecks and points of contention. These tools can help to show locking, I/O, CPU and other problems as your applications are running; or historically at a point in time if you have appropriate traces started and/or have implemented a performance data warehouse.

Such tools can be beneficial for determining where resources are being used. For example, do you know which programs consume the most database resources? Which consumes the most CPU, takes the most I/Os, issues the most locks, deletes the most data, etc.? Performance monitors can help you to accurately identify all of these issues. You can monitor system and database workload and throughput and manage the resources being used to satisfy your workload.

Database performance management tools can be broken up into three specific functional groupings based on what is being monitored and managed. Some of these tools offer functionality in only one of the following areas, whereas others deliver combined functionality:

  1. The DBMS. Tools that manage DBMS performance are focused more on database system resources and performance. They provide information about performance at the system level, showing things like CPU and memory consumption, buffer usage, locking issues, storage and other hardware information, configuration parameters, and other details.
  2. Database structures. The design of databases, tables and indexes can also impact database performance. Issues that must be monitored and managed include the physical design of the database, number of tables, index design and DDL, disk usage, and so on. How the data is organized must also be managed. And as data is modified in the database, its efficiency will degrade. Reorganization and defragmentation are required to periodically remedy disorganized data.
  3. SQL and application code. Coding efficient SQL statements can be complicated because there are many different ways to write SQL that return the same results. But the efficiency and performance of each formulation can vary significantly. DBAs need tools that can monitor the SQL code that’s being run, show the access paths it uses and provide guidance on how to improve the code.

I recently wrote a four part series of articles for TechTarget on database performance management offerings that delves into more details on these issues. If interested, you can check them out at the following links:

Good luck as you embark upon your database performance tuning and management efforts!


Posted in performance | Leave a comment

Managing Database Application Performance

A database application requires constant interaction between various disparate computing resources to operate efficiently and according to specifications. Realistically, the tuning of a database application can be broken down into three components: system tuning, database tuning, and application tuning. All of these are related, and database performance management requires an integrated approach to tuning across each of these areas.

The DBA needs to understand how the DBMS interacts with the server hardware, the operating system, and any other required software. Tuning and configuring these components and connections properly can have a dramatic impact on system performance. Therefore, system tuning occurs at the highest level and can have the greatest impact on the overall health of database applications. The reason for this is that every database application runs on the overall system. No amount of tuning is going to help a database application when the server it is running on is short on resources or improperly installed. Memory configuration is perhaps the single most important aspect of system tuning. Without sufficient memory, the entire DBMS system can grind to a halt. And with today’s modern, in-memory database systems, memory management becomes even more crucial!

Another significant system bottleneck can be the database log. Every database modification is logged by the DBMS; an inefficient log can impact every application that modifies data. System tuning also includes managing and connecting other systems software with which the DBMS interacts, including the operating system, networking software, message queuing systems, middleware, and transaction processors.

Database application performance can be impacted by the physical design of the database, including normalization issues, how the database is stored on disk, number of tables, index design, and proper usage of DDL and its associated parameters. The actual physical location of database files on disk systems will have an impact on the performance of applications accessing the data. As more data is stored on the same disk device, the possibility of concurrent access and performance degradation increases. As data is inserted, updated, and deleted from the database, the efficiency of the database will degrade. Additionally, the files used to hold the data of the database may need to expand as data is added. Or perhaps, additional files, or file extents, will need to be allocated. Both disorganization and file growth can degrade performance. Indexes also need to be monitored, analyzed, and tuned to optimize data access, and to ensure that they are not having a negative impact on data modification.

The application code itself must be designed appropriately and monitored for efficiency. In fact, most experts agree that as much as 70 to 80 percent of performance problems are caused by improperly coded database applications. SQL is the primary culprit. Coding efficient SQL statements can be complicated. Developers need to be taught how to properly formulate SQL statements and SQL statements must be constantly monitored and tuned.

Programmers need to be schooled in the practice of examining SQL access paths. Every relational DBMS allows the programmer to request information on how the database optimizer will satisfy each query. Will an index be used? In what order will the tables be joined? Will the query be broken up into parallel tasks or not? These and many other factors influence the efficiency of SQL. Not all application problems are due to improperly coded SQL. The host language application code in which the SQL has been embedded also can be inefficient, causing database application performance to suffer.

Managing the performance of your database applications requires in-depth monitoring. Be sure to allocate an appropriate budget to acquire performance management tools to ensure the efficiency of your database systems.

Posted in DBA, performance, SQL, tools | Leave a comment

A Dozen SQL Rules of Thumb, Part 3

Today we pick up our three-part series of SQL rules of thumb (ROTs) with the third and final installment… You can think of these rules as general guiding principles you should follow as your write SQL statements… and we start off today’s post with rule #9…

Rule 9: Know What Works Best

The flexibility of SQL allows the same process to be coded in multiple ways. However, one way of coding usually provides better performance than the others. The DBA should understand the best way to code SQL for each DBMS in use. Furthermore, the DBA should provide information on proper query formulation for performance to the application development staff.

Keep in mind that these rules are DBMS-specific. By that I mean, one way of writing SQL might perform well on SQL Server but not on Oracle, which performs better with a different form of SQL returning the same data.

Rule 10: Issue Frequent COMMITs

When coding programs to run as batch transactions, it is important to issue regular SQL COMMIT statements. The COMMIT statement hardens modifications to the database. When a COMMIT is issued, locks on the modified database objects and data can be released. If you write programs that make a lot of changes, but do not issue periodic COMMITs, then you will be locking data and negatively impacting concurrent access to the data. I call this Bachelor Programming Syndrome (you know, fear of committing).

An additional consideration for Oracle DBAs is the impact of a COMMIT on the rollback segments. Rollback segments are used by Oracle to store completed transactions before the changes are actually written to the table. When you issue a COMMIT in Oracle, not only is the data finalized to the table but the contents of the rollback segment are removed, too. Oracle rollback segments are used to store before images of the data in case transactions are rolled back before changes are committed.

As a DBA you must ensure that application developers issue enough COMMIT statements to minimize the impact of locking on availability and (for Oracle) to keep rollback segments to a manageable size.

Rule 11: Beware of Code Generators

Beware of application code generators and similar tools that automatically create SQL. Many of these tools use gateways that require each SQL statement to be recompiled and optimized each time it is requested. However, some gateways provide a caching mechanism to store compiled and optimized SQL on the server. Such a cache can be help to improve performance for frequently recurring SQL statements.

Additionally, many code generators create accurate SQL, but not necessarily efficient SQL. So you might need to tweak the SQL that is generated (if that is even possible).

Rule 12: Consider Stored Procedures

Performance degradation due to repeated network traffic can be minimized by using a stored procedure because only a single request is needed to execute it. Within the stored procedure, multiple SQL statements can be issued, and the results processed and sent to the requesting program or user. Without the stored procedure, each of the multiple SQL statements, as well as all of the results, would have to be sent across the network. Additionally, SQL in stored procedures may perform better than the same SQL outside of the stored procedure if the DBMS parses and compiles the statements before run time.


These twelve SQL rules of thumb across three blog posts provide a sound basis for SQL development. Of course, they offer high-level guidance and are not the only things that you need to be aware of, and follow, as you strive to build efficient database applications.

So start here, but expand your knowledge base and keep learning about how you can write effective and efficient SQL for your database applications!


Posted in DBA, performance, SQL | 1 Comment

A Dozen SQL Rules of Thumb, Part 2

Catch Phrases 47

Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.

These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…

Rule 5: Avoid Cartesian Products

Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.

Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.

Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!

Rule 6: Judicious Use of OR

The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position = ‘MANAGER’
 OR     position = ‘DIRECTOR’
 OR     position = ‘VICE PRESIDENT’
 ORDER BY position;

to this:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 ORDER BY position;

Of course, your results may vary depending on the DBMS in use and the nature of the data.

Rule 7: Judicious Use of LIKE

The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘%X’
 ORDER BY position;

This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.

You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘A%X’
 OR     dept_id LIKE ‘B%X’
 ORDER BY position;


In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.

Once again, your results will vary with the DBMS in use and the nature of the data accessed.

Rule 8: Avoid Sorts When Possible

Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).

Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.


And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!

Posted in DBA, performance, SQL | 3 Comments