A Dozen SQL Rules of Thumb, Part 3

Today we pick up our three-part series of SQL rules of thumb (ROTs) with the third and final installment… You can think of these rules as general guiding principles you should follow as your write SQL statements… and we start off today’s post with rule #9…

Rule 9: Know What Works Best

The flexibility of SQL allows the same process to be coded in multiple ways. However, one way of coding usually provides better performance than the others. The DBA should understand the best way to code SQL for each DBMS in use. Furthermore, the DBA should provide information on proper query formulation for performance to the application development staff.

Keep in mind that these rules are DBMS-specific. By that I mean, one way of writing SQL might perform well on SQL Server but not on Oracle, which performs better with a different form of SQL returning the same data.

Rule 10: Issue Frequent COMMITs

When coding programs to run as batch transactions, it is important to issue regular SQL COMMIT statements. The COMMIT statement hardens modifications to the database. When a COMMIT is issued, locks on the modified database objects and data can be released. If you write programs that make a lot of changes, but do not issue periodic COMMITs, then you will be locking data and negatively impacting concurrent access to the data. I call this Bachelor Programming Syndrome (you know, fear of committing).

An additional consideration for Oracle DBAs is the impact of a COMMIT on the rollback segments. Rollback segments are used by Oracle to store completed transactions before the changes are actually written to the table. When you issue a COMMIT in Oracle, not only is the data finalized to the table but the contents of the rollback segment are removed, too. Oracle rollback segments are used to store before images of the data in case transactions are rolled back before changes are committed.

As a DBA you must ensure that application developers issue enough COMMIT statements to minimize the impact of locking on availability and (for Oracle) to keep rollback segments to a manageable size.

Rule 11: Beware of Code Generators

Beware of application code generators and similar tools that automatically create SQL. Many of these tools use gateways that require each SQL statement to be recompiled and optimized each time it is requested. However, some gateways provide a caching mechanism to store compiled and optimized SQL on the server. Such a cache can be help to improve performance for frequently recurring SQL statements.

Additionally, many code generators create accurate SQL, but not necessarily efficient SQL. So you might need to tweak the SQL that is generated (if that is even possible).

Rule 12: Consider Stored Procedures

Performance degradation due to repeated network traffic can be minimized by using a stored procedure because only a single request is needed to execute it. Within the stored procedure, multiple SQL statements can be issued, and the results processed and sent to the requesting program or user. Without the stored procedure, each of the multiple SQL statements, as well as all of the results, would have to be sent across the network. Additionally, SQL in stored procedures may perform better than the same SQL outside of the stored procedure if the DBMS parses and compiles the statements before run time.

Synopsis

These twelve SQL rules of thumb across three blog posts provide a sound basis for SQL development. Of course, they offer high-level guidance and are not the only things that you need to be aware of, and follow, as you strive to build efficient database applications.

So start here, but expand your knowledge base and keep learning about how you can write effective and efficient SQL for your database applications!

 

Posted in DBA, performance, SQL | Leave a comment

A Dozen SQL Rules of Thumb, Part 2

Catch Phrases 47

Today’s blog post picks up where we left off in our three-part series of rules of thumb (ROTs) that apply generally to SQL development regardless of the underlying DBMS.

These are the general guiding principles by which your SQL development should be guided… and we start off today’s post with rule #5…

Rule 5: Avoid Cartesian Products

Be sure to code predicates matching the columns of every table that will be joined within each SQL statement. Failure to do so will result in performance degradation and possibly incorrect results.

Whenever predicates do not exist when you are joining two tables, the RDBMS must perform a Cartesian product. This is the combination of every row of one table with every row of the other table. Non-matching rows are not eliminated because there is nothing that can be matched. The results of a Cartesian product are difficult to interpret and contain no information other than a simple list of all rows of each table munged together.

Of course, there are exceptions to this rule where you are specifically coding a Cartesian product for a business or technical reason. When coding a Cartesian product on purpose always make sure that you have investigated other methods of producing the same results and tested each method for performance and accuracy… and be sure to specifically test any query with a Cartesian product using production volumes of data!

Rule 6: Judicious Use of OR

The OR logical operator can be troublesome for performance. If you can convert a SQL statement that uses OR to one that uses IN, it is likely that performance will improve. For example, consider changing this SQL statement:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position = ‘MANAGER’
 OR     position = ‘DIRECTOR’
 OR     position = ‘VICE PRESIDENT’
 ORDER BY position;

to this:

 SELECT e.position, e.last_name, e.empno, d.manager
 FROM   employee e,
        department d
 WHERE  d.dept_id = e.dept_id
 AND    position IN (‘MANAGER’, ‘DIRECTOR’, ‘VICE PRESIDENT’)
 ORDER BY position;

Of course, your results may vary depending on the DBMS in use and the nature of the data.

Rule 7: Judicious Use of LIKE

The LIKE logical operator is another troublesome beast. It is very easy to create performance problems when using LIKE in SQL. For example, consider the following SQL statement:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘%X’
 ORDER BY position;

This query will return employee information for all employees working in any department where dept_id ends in ‘X’. However, the relational optimizer will have to scan the data in order to resolve this query—there is no way to use an index. Because the high-order portion of the column is not known, traversing a b-tree index structure is impossible.

You might be able to use your knowledge of the data to rewrite this query without a leading wild-card character (%). For example, perhaps all dept_id values start with either ‘A’ or ‘B’. In that case, you could modify the SQL as follows:

 SELECT position, last_name, empno
 FROM   employee
 WHERE  dept_id LIKE ‘A%X’
 OR     dept_id LIKE ‘B%X’
 ORDER BY position;

 

In this case, the DBMS may be able to use a non-matching index scan if an index exists on the dept_id column. Scanning the index may be more efficient than scanning the entire table, but even then you will be scanning data, which is not usually very efficient.

Once again, your results will vary with the DBMS in use and the nature of the data accessed.

Rule 8: Avoid Sorts When Possible

Sorting data is an inhibitor of optimal performance in SQL queries. Your DBMS will sort data as needed to satisfy your database requests. The types of operations that usually require sorting of some form are ORDER BY, GROUP BY, DISTINCT, UNION, INTERSECT, and EXCEPT. When performance is important, remember to look for sorts and find ways to eliminate them. You might be able to create an index to avoid sorting, or to use an alternate syntax if duplicate elimination is not important (e.g. UNION ALL versus UNION).

Keep in mind that sorting is an I/O intensive operation and can degrade query performance, sometimes significantly. When performance is important, remember to look for sorts and find ways to eliminate them.

Conclusion

And so we come to the end of part 2 in our 3 part series offering up 12 SQL rules of thumb… tune in next time for the final 4 rules (numbers 9 thru 12) in our dozen guidelines for developing effective and efficient SQL!

Posted in DBA, performance, SQL | Leave a comment

A Dozen SQL Rules of Thumb, Part 1

today’s blog post we will examine some rules of thumb that apply generally to SQL development regardless of the underlying DBMS. These are the general guiding principles by which your SQL development should be guided…

Rule 1: “It Depends!”

The answer to every question about database performance is “It depends.” A successful DBA (and programmer) will know on what it depends. For example, if someone asks, “What is the best access path for my SQL query?” the best answer is “It depends.” Why? Well, that is more difficult to answer.

If every row must be returned, a table scan is likely to be more efficient than indexed access. However, if only one row is to be returned, direct index lookup will probably perform best. For queries that return between one and all rows, the performance of access paths will depend on how the data is clustered, which version of the DBMS is in use, whether parallelism can be invoked, and so forth.

Be skeptical of tuning tips that use the words “always” or “never.” Just about everything depends on other things.

Rule 2: Be Careful What You Ask For

The arrangement of elements within a query can change query performance. To what degree depends on the DBMS in use and whether rule-based optimization is used.
A good rule of thumb, regardless of DBMS, is to place the most restrictive predicate where the optimizer can read it first. In Oracle, the optimizer reads WHERE clauses from the bottom up, therefore, the most restrictive predicate should be put at the bottom of the query. It is just the opposite in DB2.

Placing the most restrictive predicate where the optimizer can read it first enables the optimizer to narrow down the first set of results before proceeding to the next predicate. The next predicate will be applied to the subset of data that was selected by the most selective condition, instead of against the entire table.

And keep in mind that these things can change from release to release of a DBMS, so keep up with each new version, release and fixpack to make sure you understand what has been changed and how it might impact your SQL.

Rule 3: KISS

A rule of thumb for all types of IT activities is to follow the KISS principle: Keep it simple, Stupid. However, in the world of SQL there is a trade-off between simplicity and performance.

Keeping SQL simple makes development and maintenance tasks easier. A simple SQL statement is easier to decipher and easier to change. With simple SQL, application developers can perform their job more easily than with complex SQL.

Nevertheless, complex SQL can outperform simple SQL. The more work that can be performed by the DBMS and the optimizer, the better performance is likely to be. Let’s look at an example: Some programmers avoid joins by coding multiple SQL SELECT statements and joining the data using program logic. The SQL is simpler because the programmer need not understand how to write SQL to join tables. However, SQL joins usually outperform program joins because less data is returned to the program.

Furthermore, the relational optimizer can change the join methodology automatically if the database or data changes. Conversely, program logic must be changed manually by a skilled programmer.

Rule 4: Retrieve Only What is Needed

As simple as this rule of thumb sounds, you might be surprised at how often it is violated. To minimize the amount of data returned by your SQL statements, be sure to specify the absolute minimum number of columns in the SELECT list. If the column is not needed to satisfy the business requirement, do not request it to be returned in the result set.
Specify the absolute minimum number of columns in the SELECT list.

Programmers frequently copy SQL statements that work well to use as templates for new statements. Sometimes the programmer will forget to trim down the number of columns requested when they only need a subset of the columns in the original query. This can adversely impact performance. The more columns that must be returned by the DBMS, the greater the processing overhead.

Another common problem is requesting unnecessary data. Consider the following SQL statement:

SELECT position, last_name, empno
FROM   employee
WHERE  last_name = 'SMITH';

There is no reason to specify the last_name column in the SELECT list of this SQL statement. We know that last_name must be ‘SMITH’ for the entire result set because of the WHERE clause.

Returning only what is needed does not apply only to columns. You should also minimize the number of rows to be returned by coding the proper WHERE clauses for every SQL statement. The more data that can be filtered out of the result set by the DBMS, the more efficient the query will be because less data must be returned to the requester.

Sometimes application programmers avoid coding appropriate WHERE clauses in a misguided attempt to simplify SQL statements. The more information the optimizer has about the data to be retrieved, the better the access paths it formulates will be. A sure sign of potential abuse is finding a SQL statement embedded in an application program that is immediately followed by a series of IF-THEN-ELSE statements. Try to tune the query by moving the IF-THEN-ELSE statements into SQL WHERE clauses.

Conclusion

And so concludes the first four of our 12 SQL rules of thumb… tune in next time for the next 4 rules (numbers 5 thru 8) in our dozen guidelines for developing SQL that works and performs well…

Posted in DBA, Rules of Thumb, SQL | Leave a comment

Designing Applications for Relational Access

Application design is an important component of assuring relational database performance and efficiency. When performance problems persist it may become necessary to revisit application design. But redesigning and re-coding your applications can be time-consuming and costly, so it is better to properly address good design from the outset.

Design issues to examine when application performance suffers include:

  • Type of SQL. Is the correct type of SQL (planned or unplanned, dynamic or static, embedded or stand-alone) being used for this particular application?
  • Programming language. Is the programming language capable of achieving the required performance, and is the language environment optimized for database access?
  • Transaction design and processing. Are the transactions within the program properly designed to assure ACID properties, and does the program use the transaction processor of choice appropriately and efficiently?
  • Locking strategy. Does the application hold the wrong type of locks, or does it hold the correct type of locks for too long?
  • COMMIT strategy. Does each application program issue SQL COMMIT statements to minimize the impact of locking?
  • Batch processing. Are batch programs designed appropriately to take advantage of the sequential processing features of the DBMS?
  • Online processing. Are online applications designed to return useful information and to minimize the amount of information returned to the user’s screen for a single invocation of the program?
Posted in DBA, performance | Leave a comment

Transforming Mainframe Data Management

Just a quick post today to let my readers — specifically those in the US Midwest — that I will be presenting at an executive briefing at the River Forest Country Club in Elmhurst, IL  (Chicago suburb) on August 23, 2016. The event will address the challenges of data management for the digital economy and will look at issues such as:

  • Simplified data management with intelligent automation that requires no downtime.
  • Increased application performance.
  • Data integrity of structured and unstructured data.
  • The ability to easily manage growing databases.

My presentation will take a look at the trends and challenges being faced by data professionals as they attempt to manage the burgeoning amount of data that modern organizations collect, manage and turn into actionable intelligence. I will be joined at the event by other great speakers including Sherly Larsen and John Barry who will talk about new technology and innovations from BMC Software to meet those challenges, including BMC’s Next Generation Technology for DB2 utility management.

So if you live in the Midwest area, be sure to register for this event and check up on the latest data trends and techniques for dealing with modern data management on the mainframe

Posted in data, DB2, mainframe, tools | Leave a comment

Evaluating Database Performance Management Tools

I just completed a four part series of articles for TechTarget on database performance management and the different categories of tools are used for managing database performance. The general goal of the series is to bolster organizations’ understanding   of the issues involved with assuring database performance and to inform potential buyers of the considerations and decision points when choosing from the wide variety of products that are available in this field.

With that in mind, I am going to share links to these articles with my blog readers today. The first article, published in May 2016, defines database performance and clarifies What You Need to Know About Database Performance Software at a high level. This article offers a good introduction for those new to the topic and creates a template for moving forward with database performance solutions. It should be read first to understand the overall theme of the series of articles.

Part 2 was published in early June 2016, and it focuses on how to determine whether you need database performance management tools. The article, titled Three indicators that could signal database performance issues, focuses on the types of performance issues that you should look for and provides a brief overview of tools that can help manage those issues.

Next I took a look at the most important features of database performance monitoring and tuning technologies. Choosing between different technologies and vendors can always be a challenge unless you know your requirements and focus on the software that can best achieve the results you desire. Part 3 of this series, Examining the functions and features of database performance tools, focuses on helping organizations do that.

And finally, in Part 4, I focused on the actual vendors and products highlighting important considerations to evaluate when choosing a database performance management tools. Check this article out here –> Tips on selecting the right database performance tools.

The series of articles is accompanied by multiple, high-level vendor and product overviews of some of the leading database performance management providers that you might consider. Only a few of these have been published, so far, but there will be a total of eight overviews published in this series focusing on BMC, CA, IBM, Oracle, IDERA, SolarWinds, Dell, and Bradmark. (Links to all overviews added August 16, 2016).

Hopefully these articles and overviews offer a nice roadmap to success for your database performance management efforts.

Note: None of the vendors covered in this series of articles paid to be included in this series. The vendors covered were chosen as a representative sampling of the leading ISVs providing database performance tools.

Posted in DBA, performance, tools | Leave a comment

Data Modeling Concepts

Oftentimes organizations assign the DBA group the responsibility for data modeling, in addition to the more conventional physical database management tasks that are well-known DBA responsibilities. Of course, that does not mean that DBAs are well-trained in data modeling, nor does it mean that DBAs are best-suited to take on the task of data modeling. An organization that is serious about data will create a data administration or data architecture group whose responsibility it is to understand the organization’s data. Data administration (DA) separates the business aspects of data resource management from the technology used to manage data. When the DA function exists in an organization it is more closely aligned with the actual business users of data. The DA group is responsible for understanding the business lexicon and translating it into a logical data model.

That said, many organizations lump DA and DBA together into a DBA group. As such, the DA tasks usually suffer. One of these tasks is data modeling. There is a popular folk story about four blind men and an elephant that helps to illustrate the purpose of data modeling:

A group of four blind men happened upon an elephant during the course of their journey. The group of blind men had never encountered an elephant before, but they were a curious group. So each blind man attempted to learn about the elephant by touching it. The first blind man grabbed the elephant by the trunk and exclaimed, “Oh! An elephant is like a snake – it is a long, slithery tube.” The second blind man reached out and touched the side of the elephant and remarked “No, no, the elephant is more like a wall – very flat and solid.” The third blind man was confused so he reached out to touch the elephant but poked his hand on a tusk, and he said “No, you’re both wrong, the elephant is more like a spear than anything else!” The fourth blind man grabbed the elephant by the leg and shouted “You’re all wrong, an elephant is very much like a tree – round and solid.”

Well, each blind man was right, but he was also wrong. The problem was not with the experience of each blind man, but with the scope of their experience. To be a successful data modeler you must learn to discover the entire truth of the data needs of your business. You cannot simply ask one user or rely upon a single expert because his or her scope of experience will not be comprehensive. The goal of a data model is to record the data requirements of a business process. The scope of the data model for each line of business must be comprehensive. If an enterprise data model exists for the organization then each individual line of business data model must be verified against the overall enterprise data model for correctness.

An enterprise data model is a single data model that comprehensively describes the data needs of the entire organization. Managing and maintaining an enterprise data model is fraught with many non-database-related distractions such as corporate politics and ROI that is hard to quantify.

But back to the basic topic at hand – data modeling. Data modeling begins as a conceptual venture. The first objective of conceptual data modeling is to understand the requirements. A data model, in and of itself, is of limited value. Of course, a data model delivers value by enhancing communication and understanding, and it can be argued that these are quite valuable. But the primary value of a data model is its ability to be used as a blueprint to build a physical database.

When databases are built from a well-designed data model the resulting structures provide increased value to the organization. The value derived from the data model exhibits itself in the form of minimized redundancy, maximized data integrity, increased stability, better data sharing, increased consistency, more timely access to data, and better usability. These qualities are achieved because the data model clearly outlines the data resource requirements and relationships in a clear, concise manner. Building databases from a data model will result in a better database implementation because you will have a better understanding of the data to be stored in your databases.

Another benefit of data modeling is the ability to discover new uses for data. A data model can clarify data patterns and potential uses for data that would remain hidden without the data blueprint provided by the data model. Discovery of such patterns can change the way your business operates and can potentially lead to a competitive advantage and increased revenue for your organization.

Data modeling requires a different mindset than requirements gathering for application development and process-oriented tasks. It is important to think “what” is of interest instead of “how” tasks are accomplished. To transition to this alternate way of thinking, follow these three “rules”:

  • Don’t think physical; think conceptual – do not concern yourself with physical storage issues and the constraints of any DBMS you may know. Instead, concern yourself with business issues and terms.
  • Don’t think process; think structure – how something is done, although important for application development, is not important for data modeling. The things that processes are being done to are what is important to data modeling.
  • Don’t think navigation; think relationship – the way that things are related to one another is important because relationships map the data model blueprint. The way in which relationships are traversed is unimportant to conceptual and logical data modeling.

With all of the current hype surrounding Big Data and NoSQL and DevOps it is not uncommon for data modeling to become an afterthought, if it is even a “thought” at all. And that is too bad. Sure, it speeds up the development process… but what happens when somebody other than the developer wants to use the data? And use it perhaps in a different way or with a different access pattern?

The answer to being able to reuse data is proper data modeling and design.

Keep in mind that as you create your data models, you are developing the lexicon of your organization’s business. Much like a dictionary functions as the lexicon of words for a given language, the data model functions as the lexicon of business terms and their usage. Of course, this article just scrapes the tip of the data modeling iceberg. If you are a DBA with data modeling responsibilities I recommend that you find your way to a class, or if you cannot afford that, at least pick up a few good books on the topic.

Posted in data, data modeling, DBA | Leave a comment

Data Summit: A Trip Report

Last week I had the pleasure of attending, and speaking at, the annual Data Summit event in New York City. The event, sponsored by Database Trends & Applications, boasted knowledge-packed days of presentations on all things data.

Day one started off with a good keynote (once it got started… the speaker was late) on using statistics on publicly available data to gain insight. There is a lot of data out there for the taking but few people actually take advantage of it. After listening to this keynote it would be hard not to want to grab some of that data and see what it can tell you!

The conference sported four tracks of sessions: the first one was focused on Moving to a Modern Data Architecture that offered sessions on becoming data-driven, data discovery and governance, and the future of data warehousing. This is the track where I spoke on Going Beyond Relational. The focus of my session was on describing the various NoSQL technologies at a high level and trying to bring some reality to the whole “Big Data will change everything” mantra.

The second track looked at Analytics and Applications; the third track was the IOUG track and it focused on Oracle and big data; and the fourth track focused on Hadoop one day and virtualization the next.

There was also a vendor expo hall where various and sundry data-focused tools providers were available to talk about and explain their wares.

All in all, it was a useful conference that is well worth attending if you plan out the sessions you want to attend and use your time well.

Posted in Big Data, data, DBA, NoSQL | Leave a comment

Database Access Auditing: Who Did What to Which Data When?

As just about anyone in business these days knows there is a growing list of government and industry regulations that organizations must understand and comply with. This increasing compliance pressure is particularly intense on data stored in corporate databases. Companies need to be ever more vigilant in the techniques used to protect their data, as well as to monitor and ensure that sufficient protection is in place. Such requirements are driving new and improved software methods and techniques.

One of these techniques is database auditing, sometimes called data access monitoring (or DAM). At a high level, database auditing is basically a facility to track the use of database resources and authority. When auditing is enabled, each audited database operation produces an audit trail of information including information such as what database object was impacted, who performed the operation, and when. The comprehensive audit trail of database operations produced can be maintained over time to allow DBAs and auditors, as well as any authorized personnel to perform in-depth analysis of access and modification patterns against data in the DBMS.

Database auditing helps to answer questions like “Who accessed or changed data?” and “When was actually changed?” and “What was the old content prior to the change?” Your ability to answer such questions is very important for regulatory compliance. Sometimes it may be necessary to review certain audit data in greater detail to determine how, when, and who changed the data.

Why would you need to ask such questions? Consider, HIPAA the Health Insurance Portability and Accountability Act. This legislation contains language specifying that health care providers must protect individual’s health care information even going so far as to state that the provider must be able to provide an individual a list of everyone who even so much as looked at their information. Think about that? Could you produce a list of everyone who looked at a specific row or set of rows in any database under your control?

Industry regulations, such as PCI DSS  control the protective measures that must be undert(Payment Card Industry Data Security Standard),aken to secure personally identifiable information (PII). Organizations that fail to comply run the risk of losing their ability to accept payments using credit cards and debit cards… and that can quickly ruin a company.

Tracking who does what to which piece of data is important because there are many threats to the security of your data. External agents trying to compromise your security and access your company data are rightly viewed as a threat to security. But industry studies have shown that the majority of security threats are internal – within your organization. Indeed, some studies have shown that internal threats comprise 60% to 80% of all security threats. The most typical security threat comes from a disgruntled or malevolent current or ex-employee that has valid access to the DBMS. Auditing is crucial because you may need to find an unauthorized access emanating from an authorized user.

But keep in mind that auditing tracks what a particular user has done once access has been allowed. Auditing occurs post-activity; it does not do anything to prohibit access. Audit trails help promote data integrity by enabling the detection of security breaches, also referred to as intrusion detection. An audited system can serve as a deterrent against users tampering with data because it helps to identify infiltrators.

There are many situations where an audit trail is useful. Your company’s business practices and security policies may dictate a comprehensive ability to trace every data change back to the initiating user. Perhaps government regulations (such as the Sarbanes-Oxley Act) require your organization to analyze data access and produce regular reports. You may be required to produce detailed reports on an ongoing basis, or perhaps you just need the ability to identify the root cause of data integrity problems on a case-by-case basis. Auditing is beneficial for all of these purposes.

A typical auditing facility permits auditing at different levels within the DBMS, for example, at the database, database object level, and user levels. One of the biggest problems with existing internal DBMS audit facilities is performance degradation. The audit trails that are produced must be detailed enough to capture before- and after-images of database changes. But capturing so much information, particularly in a busy system, can cause performance to suffer. Furthermore, this audit trail must be stored somewhere which is problematic when a massive number of changes occur. Therefore, a useful auditing facility must allow for the selective creation of audit records to minimize performance and storage problems.

There are several different names used for database auditing. You may have heard database auditing capabilities referred to as any of the following:

  • Data Access Auditing
  • Data Monitoring
  • Data Activity Monitoring

Each of these is essentially the same thing: monitoring who did what to which piece of data when. In addition to database auditing, you may wish to include database authorization auditing, which is the process of reviewing who has been granted what level of database access authority. This typically is not an active process, but is useful for regularly reviewing all outstanding authorization to determine if it is still required. For example, database authorization auditing can help to identify ex-employees whose authorization has not yet been removed.

Database Access Auditing Techniques

There are several popular techniques that can be deployed to audit your database structures. Let’s briefly discuss three of them and highlight their pros and cons.

The first technique is trace-based auditing. This technique is usually built directly into the native capabilities of the DBMS. Commands or parameters are set to turn on auditing and the DBMS begins to cut trace records when activity occurs against audited objects. Although each DBMS offers different auditing capabilities, some common items that can be audited by DBMS audit facilities include:

  • login and logoff attempts (both successful and unsuccessful attempts)
  • database server restarts
  • commands issued by users with system administrator privileges
  • attempted integrity violations (where changed or inserted data does not match a referential, unique, or check constraint)
  • select, insert, update, and delete operations
  • stored procedure executions
  • unsuccessful attempts to access a database or a table (authorization failures)
  • changes to system catalog tables
  • row level operations

The problems with this technique include a high potential for performance degradation when audit tracing is enabled, a high probability that the database schema will need to be modified, and insufficient granularity of audit control, especially for reads.

Another technique is to scan and parse the database transaction logs. Every DBMS uses transaction logs to capture every database modification for recovery purposes. Software exists that interprets these logs and identifies what data was changed and by which users. The drawbacks to this technique include the fact that reads are not captured on the logs, there are ways to disable logging that will cause modifications to be lost, performance issues scanning volumes and volumes of log files looking for only specific information to audit and the difficulty of retaining logs over long periods for auditing when they were designed for short-term retention for database recovery.

Additionally, third party vendors offer products that scan the database logs to produce audit reports. The DBMS must create log files to assure recoverability. By scanning the log, which has to be produced anyway, the performance impact of capturing audit information can become a non-issue.

The third database access auditing technique is proactive monitoring of database operations at the server. This technique captures all SQL requests as they are made. It is important that all SQL access is audited, not just network calls, because not every SQL request goes over the network. Proactive audit monitoring does not require transaction logs, does not require database schema modification, and should be highly granular in terms of specifying what to audit.

The Questions That Must be Answerable

As you investigate the database access auditing requirements for your organization, you should compile a list of the types of questions that you want your solution to be able to answer. A good database access auditing solution should be able to provide answers to at least the following questions:

  1. Who accessed the data?
  2. At what date and time was the access?
  3. What program or client software was used to access the data?
  4. From what location was the request issued?
  5. What SQL was issued to access the data?
  6. Was the request successful; and if so, how many rows of data were retrieved?
  7. If the request was a modification, what data was changed? (A before and after image of the change should be accessible)

Of course, there are numerous details behind each of these questions. A robust database access auditing solution should provide an independent mechanism for the long-term storage and access of audit details. The solution should offer the canned queries for the most common types of queries, but the audit information should be accessible using industry standard query tools to make it easier for auditors to customize queries as necessary.

Summary

Database auditing can be a crucial component of database security and compliance with government regulations. Be sure to study the auditing capabilities of your DBMS and to augment these capabilities with third party tools to bolster the auditability of your databases.

 

 

Posted in auditing, compliance | Leave a comment

What Does the Latest Salary Survey Say About Data Professionals?

In the latest Computerworld IT Salary Survey (2016) 71% of IT workers who took the survey reported that they received a raise in the past year. That is a nice healthy number.

For those of us who specialize in data and database systems though, it may be bad news that we are not as “in demand” as our application development brethren: 45% expect their organizations to hire new application developers this upcoming year whereas only 17% expect to hire new database analysis and development folks. Of course, this may not be all bad news because organizations need many more developers than they do DBAs, right?

In terms of compensation, the national average for DBAs was $98,213 in the 2016 survey, up 1.9% over 2015. This figure includes both a base salary and bonus.

Let’s compare that to the application developer compensation. AppDev folks averaged $91,902 in 2016, up 4.4% over 2015. So DBAs are still out-earning application developers, but not by much. And the application folks are getting bigger raises!

That said, these types of surveys are always skewed somewhat because of the multitude of titles and jobs out there that fall into multiple categories. For example, the national average compensation for database analysts was $90,370 in 2016, up 3.5% over 2015. And the national average for database architects was $128,242 in 2016, up 3.3% over 2015. I think this sends a clear message to DBAs: it is time to ask for your title to be changed to database architect!

You can go to the web site and search on the various categories to uncover the compensation figures for your favorite profession. I was curious, for example, about data scientists, but there were only 13 respondents rendering the results not significant.

 

 

Posted in DBA, salary | Leave a comment