Happy New Year 2019

Just a quick post today to wish everybody out there a very Happy New Year!

Happy-New-Year-

I hope you have started 2019 off with a bang and that the year is successful and enjoyable for one and all!

Posted in Happy New Year | Leave a comment

FaunaDB: A multi-model, distributed database system with ACID consistency

Although relational, SQL database systems continue to dominate the DBMS market, modern database management has shifted to encompass additional types of database systems. This is exemplified in the rise of the NoSQL database system to serve the needs of modern applications that are not as well-suited for existing relational, SQL database systems.

What used to be rather simple – choosing from three or four market leading SQL DBMS products – has now become confusing and difficult trying to understand the morass of different DBMS types and offerings on the market.

A Multi-Model Approach

Well, one solution to avoid the confusion is to select a multi-model DBMS offering. A multi-model database system supports multiple types of database models, such as relational, document, graph, wide column, and key/value. FaunaDB is an example of a multi-model DBMS capable of managing both relational and NoSQL data, and designed to support modern, scalable, real-time applications.

FaunaDB combines the scale and flexibility of NoSQL with the safety and data integrity of relational systems. The company refers to this as Relational NoSQL. Unlike many NoSQL database systems, FaunaDB delivers ACID compliance. You can scale transactions across multiple shards and regions while FaunaDB guarantees the accuracy and integrity of your data and transactions.

FaunaDB enables your developers to write sophisticated transactions using languages they already know. And you can pull data from document, relational, graph, and temporal data sets all from within a single query.

Since FaunaDB is NoSQL, you won’t be using SQL to access databases. The Fauna Query Language (FQL) is the primary interface for interacting with a FaunaDB cluster. FQL is not a general-purpose programming language, but it provides for complex, manipulation and retrieval of data stored within FaunaDB. The language is expression-oriented: all functions, control structures, and literals return values. This makes it easy to group multiple results together by combining them into an Array or Object, or map over a collection and compute a result – possibly fetching more data – for each member.

A query is executed by submitting it to a FaunaDB cluster, which computes and returns the result. Query execution is transactional, meaning that no changes are committed when something goes wrong. If a query fails, an error is returned instead of a result.

FQL supports a comprehensive set of data types in four categories: simple types, special types, collection type and complex types. A simple data type is one that is native to FaunaDB and also native to JSON, such as Boolean, Null, Number and String. Special data types in FaunaDB extend the limited number of native JSON data types; Bytes, Date, Query, Ref, Set and Timestamp. A complex data type is a composite of other existing data types, such as an Object or Instance. And the collection data type is able to handle multiple items while maintaining order, such as Array and Page.

Consistency

Perhaps the most impressive aspect of FaunaDB is how it enables strict serializability for external transactions. By supporting serializable isolation, FaunaDB can process many transactions in parallel, but the final result is the same as processing them one after another. The FaunaDB distributed transaction protocol processes transactions in three phases:

  • In the first, speculative phase, reads are performed as of a recent snapshot, and writes are buffered.
  • The second phase uses a consensus protocol to insert the transaction into a distributed log. At this point, the transaction gets a global transaction identifier that indicates its equivalent serial order relative to all other concurrent transactions. This is the only point at which global consensus is required.
  • Finally, the third phase checks each replica verifying the speculative work. If there are no potential serializability violations, the work is made permanent and buffered writes are written to the database. Otherwise, the transaction is aborted and restarted.

This software approach is novel and allows for the scaling of transactions across multiple shards and regions while guaranteeing transactional correctness and data accuracy. Contrast this with other database systems, such as Google Spanner, that rely on distributed clock synchronization to ensure data consistency.

The FaunaDB approach is based on a 2012 Yale University paper titled “Calvin: Fast Distributed Transactions for Partitioned Database Systems.” You can download that paper here. And if you are interested in additional details, consult this blog post: Consistency without Clocks: The FaunaDB Distributed Transaction Protocol.

Multi-Tenancy

Many database systems provide multi-tenant capabilities. They can contain multiple databases, each with their own access controls. FaunaDB takes this further by allowing any database to have multiple child databases. This enables an operator to manage a single large FaunaDB cluster, create a few top-level databases, and give full administrative access of those databases to associated teams. Each team is free to create as many databases as they need without requiring operator intervention. As far as the team is concerned, they have their own full FaunaDB cluster.

Temporality

Strong temporal support is an additional capability of FaunaDB. Traditionally, a database system stores only data that is valid at the current point-in-time; it does not track the past state of the data. Most data changes over time, and different users and applications can have requirements to access that data at different points in time. Temporal support makes it possible to query data “as of” different past states.

All records in FaunaDB are temporal. When instances are changed, instead of overwriting the prior contents, a new instance version at the current transaction timestamp is inserted into the instance history, and marked as a create, update, or delete event. This means that with FaunaDB, all reads can be executed consistently at any point in the past or transformed into a change feed of events between any two points in time. This is useful for many different types of applications, such as auditing, rollback, cache coherency, and others.

 Strong Security

Data protection and security has become more important as data breaches continue to dominate the news. Regulation and data governance practices dictate that organizations implement strong protective measure on sensitive data.

FaunaDB implements security at the API level. Access to the FaunaDB API uses  access keys, which authenticate connections as having particular permissions. This access key system applies to administrator- and server-level connections, as well as to object- and user-level connections.

In other words, reading or writing instances of user-defined classes in FaunaDB requires a server key, or an instance token with appropriate permissions.

Delivery Models

FaunaDB can run anywhere you need it to run: on-premises, in your cloud, the public cloud, even multiple clouds. Basically, FaunaDB can run anywhere you can run a JVM.

The FaunaDB Serverless Cloud enables developers to implement and elastically scale cloud applications with no capacity planning or provisioning. FaunaDB Cloud provides essential features that enable developers to safely build and run serverless applications without configuring or operating infrastructure.

The serverless approach uses an event-driven architecture where developers code functions and deploy them to the infrastructure. The functions only consume resources when they are invoked, at which point they run within the architecture. A serverless architecture is conducive to modern development practices because it can eliminate many of the difficulties developers face reconciling their database infrastructure with today’s development methods.

Summing Things Up

Prior to founding Fauna in 2012, the team at FaunaDB was part of the team that developed the infrastructure at Twitter. And FaunaDB is already being used at many leading enterprises. Check out these write ups about FaunaDB usage at NVIDIA, ShiftX, and VoiceConnect. Others are available at Fauna’s web site.

So, if you are looking for a multi-model, secure NoSQL database platform with strong consistency, horizontal scalability, multi-tenenacy and temporal capabilities, that can run on-premise and in the cloud, consider taking a look at FaunaDB.

Posted in cloud, data availability, DBMS, Isolation Level, NoSQL, relational, temporal | Leave a comment

SQL Performance and Optimization

Just a quick post today to refer my readers to a series of blog posts that I recently made to the IDERA database community blog.

This four-part blog series took a look into SQL performance and optimization from a generic perspective.  By that I mean that I did not focus on any particular DBMS, but on the general things that are required of and performed during the optimization of SQL.

Part one – Relational Optimization, introduces and explains the general concept of relational optimization and what it entails;

Part two – Query Analysis and Access Path Formulation, examines the process of analyzing SQL queries and introduces the types of access that can be performed on a single table;

Part three – Multiple Table Access Methods – takes a look at optimization methods for combining data from more than one table

And finally, part four – Additional Considerations – concludes the series with an overview of several additional aspects of SQL optimization that we have yet to discuss.

If you are looking for a nice overview of SQL and relational optimization without DBMS-specific details, give these posts a read!

 

Posted in optimization, performance, SQL | 4 Comments

My Data Quotes – 2018

I am frequently approached by journalists and bloggers for my thoughts on the data-related news of the day… and I am usually happy to discuss data with anybody! Some of these discussions wind up getting quoted in news articles and posts. I like to try to keep track of these quotes.

With that in mind, I thought I’d share the articles where I have been quoted (so far) in 2018:

I may be missing some, so if you remember chatting with me last year and you don’t see your piece listed above please ping me to let me know…

And if you are interested in some of the older pieces where I’ve been quoted I keep a log of them on my web site at mullinsconsulting.com/quoted.html.  (Note, some of the older articles/posts are no longer available, so some of the links are inoperable.)

Posted in DBA | Leave a comment

Teradata Analytics Universe 2018 and Pervasive Data Intelligence

I spent last week in Las Vegas at the Teradata Analytics Universe conference, Teradata’s annual user conference. And there was a lot to do and learn there.

 

IMG_0182

Attendees heading to the Expo Hall at the Teradata Analytics Universe conference in Las Vegas, NV — October 2018

 

The major message from Teradata is that the company is a “new Teradata.” And the message is “Stop buying analytics,” which may sound like a strange message at a conference with analytics in its name!

But it makes sense if you listen to the entire strategy. Teradata is responding to the reality of the analytics marketplace. And that reality centers around three findings from a survey the company conducted of senior leaders from around the world:

  1. Analytics technology is too complex. 74 percent of senior leaders said their organization’s analytics technology is complex; 42 percent said that analytics is not easy for their employees to use and understand.
  2. Users don’t have access to all the data they need. 79 percent of said they need access to more company data to do their job effectively.
  3. Data scientists are a bottleneck. Only 25 percent said that, within their enterprise, business decision makers have the skills to access and use intelligence from analytics without the need for data scientists.

 

WhereAreDataScientists_02x600

 

To respond to these challenges, Teradata says you should buy “answers” not “analytics.” And they are correct. Organizations are not looking for more complex, time-consuming, difficult-to-use tools, but answers to their most pressing questions.

Teradata’s calls their new approach “pervasive data intelligence,” which delivers access to all data, all the time, to find answers to the toughest challenges. This can be done on-premises, in the cloud, and anywhere in between.

A big part of this new approach is founded on Teradata Vantage, which provides businesses the speed, scale and flexibility they need to analyze anything, deploy anywhere and deliver analytics that matter. At the center of Vantage is Teradata’s respected analytics database management system, but it also brings together analytic functions and engines within a single environment. And it integrates with all the popular open source workbenches, platforms, and languages, including SQL, R, Python, Jupyter, RStudio, SAS, and more.

“Uncovering valuable intelligence at scale has always been what we do, but now we’re taking our unique offering to new heights, unifying our positioning while making our software and consulting expertise available as-a-service, in the cloud, or on-premises,” said Victor Lund, Teradata CEO.

Moving from analytical silos to an analytics platform that can deliver pervasive data intelligence sounds to me like a reasonable way to tackle the complexity, confusion, and bottlenecks common today.

Check out what Teradata has to offer at teradata.com

Posted in analytics, data, Teradata, tools | Leave a comment

Data Modeling with Navicat Data Modeler

Data modeling is the process of analyzing the things of interest to your organization and how these things are related to each other. The data modeling process results in the discovery and documentation of the data resources of your business. Data modeling asks the question “What?” instead of the more common data-processing question “How?”

As data professionals, it is important that we understand what the data is and what it means before we attempt to build databases and applications using the data. Even with today’s modern infrastructure that includes databases with flexible schemas that are applied when read (instead of the more traditional method of applying the schema on write), you still need a schema and an understanding of the data in order to do anything useful with it. And that means a model of the data.

The modeling process requires three phases and types of models: conceptual, logical and physical. A conceptual data model is generally more abstract and less detailed than a complete logical data model. It depicts a high-level, business-oriented view of information. The logical data model consists of fully normalized entities with all attributes defined. Furthermore, the domain or data type of each attribute must be defined. A logical data model provides an in-depth description of the data independent of any physical database manifestations. The physical data model transforms the logical model into a physical implementation using a specific DBMS product such as Oracle, MySQL or SQL Server.

Navicat Data Modeler

Which brings us to the primary focus of today’s blog post: Navicat Data Modeler. We have looked at other Navicat products in this blog before (1, 2, 3), but those were performance and DBA tools. Navicat Data Modeler is designed to be used by data architects and modelers (but it can, of course, be used by DBAs, too).

A good data modeling tool provides the user with an easy-to-use palette for creating data models, and Navicat Data Modeler succeeds in this area. Navicat Data Modeler provides a rich interface for visually designing and building conceptual, logical and physical data models. Figure 1 shows a portion of larger logical data model for a university application.

fig1

Figure 1. A Logical Data Model in Navicat Data Modeler

 

The interface enables the user to clearly see the relationships, entities, and attributes at a high level, as well as the ability to zoom in to see details (see Figure 2).

fig2

Figure 2. Attribute details for the Student entity

 

The tool offers a lot of flexibility, so you can create, modify, and design models in a user-friendly manner and the way you like. Navicat Data Modeler supports three standard notations: Crow’s Foot, IDEF1x and UML.

Although easy to use, Navicat Data Modeler is a powerful data modeling and database design tool. As already mentioned, it supports conceptual, logical, and physical modeling. Importantly, though, the tool manages migration of models using reverse and forward engineering processes.  Using the Model Conversion feature, you can convert a conceptual model into a logical model, modify and further design at the logical level, and then convert into a physical database implementation. Navicat Data Modeler supports MySQL, MariaDB, Oracle, Microsoft SQL Server, PostgreSQL, and SQLite. See Figure 3.

fig3

Figure 3. Forward engineering to a target database

 

OK, so that covers forward engineering, but what about reverse engineering? You can use Navicat Data Modeler to reverse engineer a physical database structure into a physical model, thereby enabling you to visualize the database to see the physical attributes (tables, columns, indexes, RI, and other objects) and how they relate to each other without showing any actual data.

Furthermore, you can import models from ODBC data sources, print models to files, and compare and synchronize databases and models. The Synchronize to Database function can be used to discover all database differences. You can view the differences and generate a synchronization script to update the destination database to make it identical to your model. And there are settings that can be used to customize how comparison and synchronization works between environments.

It is also worth noting that Navicat Data Modeler is fully integrated with Navicat Cloud. This makes sharing models much easier. You can sync your model files and virtual groups to the cloud for a real-time access at anytime and anywhere.

Summary

A proper database design cannot be thrown together quickly by novices. Data professionals require domain and design knowledge and powerful tools to implement their vision. Navicat Data Modeler offers one such tool that is worthy of your consideration.

Posted in data modeling, database design, DBA | Leave a comment

10 Rules for Succeeding as a DBA

Being a successful database administrator requires more than just technical acumen and deep knowledge of database systems. You also must possess a proper attitude, sufficient fortitude, and a diligent personality to achieve success in database administration. Gaining the technical know-how is important, yes, but there are many sources that offer technical guidance for DBAs. The non-technical aspects of DBA are just as challenging, though. So with that in mind, let’s take a look at ten “rules of thumb” for DBAs to follow as they improve their soft skills.

Rule #1: Write Down Everything – DBAs encounter many challenging tasks and time-consuming problems. The wise DBA always documents the processes used to resolve problems and overcome challenges. Such documentation can be very valuable (both to you and others) should you encounter a similar problem in the future. It is better to read your notes than to try to re-create a scenario from memory.

Rule #2: Keep Everything – Database administration is the perfect job for you if you are a pack rat. It is a good practice to keep everything you come across during the course of performing your job. If not, it always seems like you’ll need that stuff the day after you threw it out! I still own some manuals for DB2 Version 2.

Rule #3: Automate – Why should you do it by hand if you can automate your DBA processes? Anything you can do, probably can be done better by the computer – if it is programmed to do it properly. And once it is automated you save yourself valuable time that is better spent tackling other problems.

Rule #4: Share Your Knowledge – The more you learn the more you should try to share what you know with others. There are many vehicles for sharing your knowledge: local user groups, online forums, web portals, magazines, blogs, Twitter, and so on. Sharing your experiences helps to encourage others to share theirs, so we can all benefit from each other’s best practices.

Rule #5: Focus Your Efforts – The DBA job is complex and spans many diverse technological and functional areas. It is easy for a DBA to get overwhelmed with certain tasks – especially those tasks that are not performed regularly. Understand the purpose for each task you are going to perform and focus on performing the steps that will help you to achieve that purpose. Do not be persuaded to broaden the scope of work for individual tasks unless it cannot be avoided. Analyze, simplify, and focus. Only then will tasks become measurable and easier to achieve.

Rule #6: Don’t Panic! – Problems will occur. There is nothing you can do to eliminate every possible problem or error. Part of your job is to be able to react to problems calmly and analytically. When a database is down and applications are unavailable your environment will become hectic and frazzled. The best things you can do when problems occur is to remain calm and go about your job using your knowledge and training.

Rule #7: Measure Twice, Cut Once – Being prepared means analyzing, documenting, and testing your DBA policies and procedures. Creating simple procedures in a vacuum without testing will do little to help you run an efficient database environment. And it will not prepare you to react rapidly and effectively to problem situations.

Rule #8: Understand the Business – Remember that being technologically adept is just a part of being a good DBA. Technology is important but understanding your business needs is more important. If you do not understand the business reasons and impact of the databases you manage then you will simply be throwing technology around with no clear purpose.

Rule #9: Don’t Be a Hermit – Be accessible; don’t be one of those “curmudgeon in the corner” DBAs that developers are afraid to approach. The more you are valued for your expertise and availability, the more valuable you are to your company. By learning what the applications must do you can better adjust and tune the databases to support the business.

Rule #10: Use All of the Resources at Your Disposal – Remember that you do not have to do everything yourself. Use the resources at your disposal. Many times others have already encountered and solved the problem that vexes you. Use your DBMS vendor’s technical support to help with particularly thorny problems. Use internal resources for areas where you have limited experience, such as network specialists for connectivity problems and system administrators for OS and system software problems. Build a network of colleagues that you can contact for assistance. Your network can be an invaluable resource and no one at your company even needs to know that you didn’t solve the problem yourself.

Achieve DBA Success!

The job of the DBA is a challenging one – from a technological, political and interpersonal perspective. Follow the rules presented in this blog post to improve your success as a DBA.

Posted in DBA | 1 Comment

How Much Data Availability is Enough?

I bet that some of you reading the title of this blog post scoffed at it. I mean, in this day age, isn’t round-the-clock availability for every application and user just a basic requirement?

No, it shouldn’t be. Let’s discuss.

Availability is traditionally discussed in terms of the percentage of total time that a service needs to be up. For example, a system with 99% availability will be up and running 99% of the time and down, or unavailable, 1% of the time.

Another term used to define availability is MTBF, or mean time between failure. More accurately, MTBF is a better descriptor of reliability than availability. However, reliability has a definite impact on availability. In general, the more reliable the system the more available it will be.

In this Internet age, the push is on to provide never-ending uptime, 365 days a year, 24 hours a day. At 60 minutes an hour that mean 525,600 minutes of uptime a year. Clearly to achieve 100% availability is a laudable goal, but just as clearly an unreasonable one. Why? Because things break, human error is inevitable, and until everybody and everything is perfect, there will be downtime.

The term five nines is often used to describe highly available systems. Meaning 99.999% uptime, five nines describes what is essentially 100% availability, but with the understanding that some downtime is unavoidable (see the accompanying table).

Table 1. Availability versus Downtime

Availability Approximate downtime per year
In minutes In hours
99.999% 5 minutes .08 hours
99.99% 53 minutes .88 hours
99.95% 262 minutes 4.37 hours
99.9% 526 minutes 8.77 hours
99.8% 1,052 minutes 17.5 hours
99.5% 2,628 minutes 43.8 hours
99% 5,256 minutes 87.6 hours
98% 10,512 minutes 175.2 hours
(or 7.3 days)

Even though 100% availability is not reasonable, some systems are achieving availability approaching five nines. DBAs can take measures to design databases and build systems that are created to achieve high availability. However, just because high availability can be built into a system does not mean that every system should be built with a high-availability design. That is so because a highly available system can cost many times more than a traditional system designed with unavailability built into it. The DBA needs to negotiate with the end users and clearly explain the costs associated with a highly available system.

Whenever high availability is a goal for a new system, database, or application, careful analysis is required to determine how much downtime users can really tolerate, and what the impact of an outage would be. High availability is an alluring requirement, and end users will typically request as much as they think they can get. As a DBA, your job is to investigate the reality of the requirement.

The amount of availability that should be built into the database environment must be based on service level agreements and cost. How much availability does the business require? And just as importantly, how much availability can the business afford to implement?

That is the ultimate question. Although it may be possible to achieve high availability, it may not be cost-effective, given the nature of the application and the budget available to support it. The DBA needs to be proactive in working with the application owner to make sure the cost aspect of availability is fully understood by the application owner.

Posted in availability, DBA, SLA | Leave a comment

Database Performance Management Solutions

Performance management, from a database perspective, is comprised of three basic components:

  1. Monitoring a database system and applications accessing it to find problems as they arise. This is typically referred to as performance monitoring.
  2. Analyzing performance data (logs, trace records, etc.) from the system to determine the root cause of the problem.
  3. Assembling a corrective action to implement a fix to the problem.

Database performance software can aid in all three areas. But some simply monitor systems or fix problems, while others deliver combined functionality.

Database performance management software can also be broken down by the type of database performance issues it addresses. Database performance problems can be arise in any of the following three areas:

  • The DBMS itself, which must interact with other system software and hardware, requiring proper configuration to ensure it functions accurately and performs satisfactorily. Additionally, there are many database system parameters used to configure the resources to be used by the DBMS, as well as its behavior. This includes important performance criteria such as memory capacity, I/O throughput and locking of data pages.
  • Database schema/structures. The design of databases, tables and indexes can also impact database performance. Issues include the physical design of the database, disk usage, number of tables, index design and data definition language parameters. How the data is organized must also be managed. And as data is modified in the database, its efficiency will degrade. Reorganization and defragmentation are required to periodically remedy disorganized data.
  • SQL and application code. Coding efficient SQL statements can be complicated because there are many different ways to write SQL that return the same results. But the efficiency and performance of each formulation can vary significantly. DBAs need tools that can monitor the SQL code that’s being run, show the access paths it uses and provide guidance on how to improve the code.

Database performance tools can identify bottlenecks and points of contention, monitor workload and throughput, review SQL performance and optimization, monitor storage space and fragmentation, and view and manage your system and DBMS resource usage. Of course, a single tool is unlikely to perform all of these tasks, so you may need multiple tools (perhaps integrated into a functional suite) to perform all of your required database performance management tasks.

Without proactive tools that can identify problems as they occur, database performance problems are most commonly brought to the attention of the DBA by end users. The phone rings and the DBA hears a complain that is usually vague and a bit difficult to interpret… things like “my system is a bit sluggish” or “my screen isn’t working as fast as it used to.” In such cases, the DBA needs tools that can help uncover the exact problem and identify a solution. Database performance management tools can help to find the problem as well as to put together and deploy a solution to the problem.

A lot organizations use more than one production DBMS. Frequently, the same DBA team (and sometimes even the same excact DBA) will have to assure the performance of more than one DBMS (such as Oracle and SQL Server… or Db2 and MySQL). But each DBMS has different interfaces, parameters and settings that affect how it performs. Database performance tools can mitigate this complexity with intelligent interfaces that mask the complexing making disparate components and settings look and feel similar from DBMS to DBMS.

There are many providers of database performance management tools, including the DBMS vendor (IBM, Microsoft and Oracle), large ISVs like BMC, CA and Quest and a wide array of niche vendors that focus on DBA and database peformance software.

What database performance tools do you use and recommend? Share your expeiences with us in a comment here on the blog.

Posted in DBA, performance, tools | Leave a comment

SQL Basics

It is hard to imagine a time when SQL was unknown and not the lingua franca it is today for accessing databases. That said, there are still folks out there who don’t know what SQL is… so for them, here is an introductory place to start…

SQL is an acronym for Structured Query Language. It is often procounced a sequel, but also spelled out as letters, like ess-cue-ell. SQL is a powerful tool for accessing and manipulating data. It is the de facto standard query language for relational database management systems, used by all of the leading RDBMS products including Oracle, SQL Server (natch), Db2, MySQL, Postgres, SAP Adaptive Server, and more.

Intereestingly enough, NoSQL database systems are increasingly being adapted to allow SQL access, too! So SQL is ubiquitous and it makes sense for anybody with an interest in data management to learn how to code SQL.

SQL is a high-level language that provides a greater degree of abstraction than do procedural languages. Most programming languages require that the programmer navigate data structures. This means that program logic needs to be coded to proceed record-by-record through data elements in an order determined by the application programmer or systems analyst. This information is encoded in programs and is difficult to change after it has been programmed.

SQL, on the other hand, is fashioned so that the programmer can specify what data is needed, instead of how to retrieve it. SQL is coded without embedded data-navigational instructions. The DBMS analyzes the SQL and formulates data-navigational instructions “behind the scenes.” These data-navigational instructions are called access paths.

By having the DBMS determine the optimal access path to the data, a heavy burden is removed from the programmer. In addition, the database can have a better understanding of the state of the data it stores, and thereby can produce a more efficient and dynamic access path to the data. The result is that SQL, used properly, can provide for quicker application development.

Another feature of SQL is that it is not merely a query language. The same language used to query data is used also to define data structures, control access to the data, and insert, modify, and delete occurrences of the data. This consolidation of functions into a single language eases communication between different types of users. DBAs, systems programmers, application programmers, systems analysts, and end users all speak a common language: SQL. When all the participants in a project are speaking the same language, a synergy is created that can reduce overall system-development time.

Arguably, though, the single most important feature of SQL that has solidified its success is its capability to retrieve data easily using English-like syntax. It is much easier to understand the following than it is to understand pages and pages of program source code.

SELECT   LASTNAME
FROM      EMP
WHERE   EMPNO = '000010';

Think about it; when accessing data from a file the programmer would have to code instructions to open the file, start a loop, read a record, check to see if the EMPNO field equals the proper value, check for end of file, go back to the beginning of the loop, and so on.

SQL is, by nature, quite flexible. It uses a free-form structure that gives the user the ability to develop SQL statements in a way best suited to the given user. Each SQL request is parsed by the DBMS before execution to check for proper syntax and to optimize the request. Therefore, SQL statements do not need to start in any given column and can be strung together on one line or broken apart on several lines. For example, the following SQL statement is equivalent to the previously listed SQL statement:

SELECT LASTNAME FROM EMP WHERE EMPNO = '000010';

Another flexible feature of SQL is that a single request can be formulated in a number of different and functionally equivalent ways. One example of this SQL capability is that it can join tables or nest queries. A nested query always can be converted to an equivalent join. Other examples of this flexibility can be seen in the vast array of functions and predicates. Examples of features with equivalent functionality are:

  • BETWEEN versus <= / >=
  • IN versus a series of predicates tied together with AND
  • INNER JOIN versus tables strung together in the FROM clause separated by commas
  • OUTER JOIN versus a simple SELECT, with a UNION, and a correlated subselect
  • CASE expressions versus complex UNION ALL statements

This flexibility exhibited by SQL is not always desirable as different but equivalent SQL formulations can result in extremely differing performance. The ramifications of this flexibility are discussed later in this paper with guidelines for developing efficient SQL.

As mentioned, SQL specifies what data to retrieve or manipulate, but does not specify how you accomplish these tasks. This keeps SQL intrinsically simple. If you can remember the set-at-a-time orientation of a relational database, you will begin to grasp the essence and nature of SQL. A single SQL statement can act upon multiple rows. The capability to act on a set of data coupled with the lack of need for establishing how to retrieve and manipulate data defines SQL as a non-procedural language.

Because SQL is a non-procedural language a single statement can take the place of a series of procedures. Again, this is possible because SQL uses set-level processing and DB2 optimizes the query to determine the data-navigation logic. Sometimes one or two SQL statements can accomplish tasks that otherwise would require entire procedural programs to do.

Summary

Of course, this brief introduction does not constitute an education in SQL and it will not make you a SQL programmer. For that, you will need education and practice. A good place to start is with a SQL book or two. I can recommend these:

After reading through some good books, practice writing some SQL and keep learning… move on to more advanced texts and if you can, attend a class on the SQL. Because learning SQL makes sense in this day and age of analytics!

Posted in DBA, SQL | Leave a comment