Seasons Greetings 2023

We have entered the holiday season 2023. Did you know there are nearly two dozen different holidays and celebrations observed from the end of October through the beginning of January every year? With that in mind, I like to make an annual post wishing well to everyone who celebrates any (or all) of these festivities!

So with that in mind, have a great holiday season… and we’ll see you again back here next year!

Posted in Uncategorized | Tagged , | Leave a comment

My Latest Book: Mainframe Specialty Processors

Today’s post will be brief… just to announce my latest book. It does have relevance to data, but only for IBM System Z mainframe shops.

The book is titled: Mainframe Specialty Processors: Understanding zIIPs, Licensing, and Cost Savings on the IBM System z. And the title pretty much sums up the content of the book. But here are some more details…

If you are an IT professional who works on IBM z Series mainframes, then you’ve probably heard about zIIPs and other “specialty processors.” But you may not really know what they are, what they do, and why they exist. This book will clarify the purpose of specialty processors and how you can best utilize them for cost optimization.

The book provides a high-level overview of pertinent mainframe internals such as control blocks, SRBs, and TCBs, and why they are important for understanding how zIIPs work.

Additionally, because reducing mainframe software cost is essential to the purpose of specialty processors, the book proivdes a high-level introduction to understanding mainframe licensing and pricing. But mainframe pricing and billing is a complex topic that is understood by few. Nevertheless, this book will help you become conversant on the topic.

Furthermore, the book describes the types of workloads that can take advantage of specialty processors, including advice on how to promote zIIP usage in your applications and systems. One of the biggest beneficiaries of zIIPs is Db2 workload, but only specific types of Db2 processing qualify. And the book will guide you as to what types of Db2 processes qualify!

The book is primarily focused on zIIP usage, but it also takes a look at the other types of specialty processors, including what they are used for and why it can be beneficial to use them. And finally, the book speculates on the future of specialty processors in the realm of the IBM Z.

So, if you work on mainframe systems, or know a loved one who does, why not pick up a copy of the book on amazon right now?

Posted in architecture, books, capacity planning, DB2, IBM, mainframe, optimization, performance, software cost, zIIP | Leave a comment

Q+A: Upgrading the Company DBMS

I regularly receive questions from readers and every now and then I choose one to highlight here on the blog. Today, I want to share a Q+A from a reader/student asking for guidance on things a company should consider when upgrading their DBMS…

Question:

I am writing a company proposal on a technical issue of my choice. Since I took a MySQL class and it may be related to my career field I thought it would be a good idea to pick “upgrading the company’s DBMS.” So my question would be, can you give me a little insight on the quality of DBMSes. Such as why choose Oracle over MySQL, and is new brands better than older brands. For the sake of my report I choose MySQL as the Brand that I wanted to propose to my company, mainly because it is the one that I am familiar with. Any insight would be very much appreciated.

My Answer:

Those are interesting questions indeed, and not always easy to answer. Here is how I would go about determining the database architecture for an organization.

The first thing to do is to determine the primary relational DBMS that will serve as the backbone of the organization’s transactional systems. This could be MySQL, of course, or you may need to go with a DBMS that has a stronger track record of enterprise support (such as IBM Db2, Microsoft SQL Server, or Oracle). Factors to examine include the O/S supported by the DBMS and your organization’s required O/S support, performance and availability requirements, scalability needs, available technicians, support for the cloud, total cost of ownership, and analyst recommendations.

Keep in mind that an organization may have multiple relational/SQL database systems due to packaged application support and historical needs. Most relational/SQL DBMSes will support transaction processing for most applications. The next step is to decide on an analytical DBMS for the backbone of analytical, data warehousing and data science type processing. Again, we should choose a relational/SQL DBMS, but it should have the capability to perform OLAP and column-based queries. If the DBMS supports HTAP (Hybrid Transaction Analytical Processing) then perhaps the same DBMS can be used for both transactions and analytics, but this often requires additional configuration and perhaps an appliance (ex. IBM Db2 Analytics Accelerator).

Then you need to review the use cases for all upcoming projects. If the project can be accomplished using the DBMSes selected already, this is ideal. However, life is not always ideal and sometimes this requires adding a new DBMS, and even a new type of DBMS. For example, consider the NoSQL DBMS variants (graph, wide column, document, and key/value databases). Each of these support specific use cases and might be required for web, social media, IoT, or other types of projects that are not easily supported by relational/SQL database systems.

And do not forget legacy DBMS systems required to run applications that have been around for a while. This may mean that you will have to also support database systems, for example IBM IMS (hierarchical), Broadcom IDMS (network/CODASYL), Software AG Adabas (inverted list), or OO database systems.

Once you have your infrastructure set, which will likely consist of multiple DBMS products, you need to plan for the update cycle for each DBMS product. This means coordinating with the published release cycle for each vendor. You must always ensure that you are running a supported version of the DBMS. Vendors typically release new versions every 18 to 36 months, but support a product for 5 or more years running from GA (general availability), thru EOM (end of marketing – meaning no longer sold), and EOS (end of support – meaning no longer supported). You must plan to move to a new release of each DBMS before the EOS deadline.

Further complicating issues is that many vendors have moved to an agile, continuous availability model. This means that smaller, more frequent updates are being made available. You must decide where your organization fits on the Gartner technology adoption framework (https://www.gartner.com/it-glossary/type-a-b-and-c-enterprises) and create a strategy for how frequently you accept new modifications. At any rate, you will need to understand the mechanisms used by your vendor(s) and strategically determine how you will introduce changes into your database environment.

The cloud throws another monkey-wrench in to this because if you use a cloud database like SQL Azure, for example, then the cloud provider (in this case Microsoft) will always be running the most recent version of the DBMS. So you will have to adopt a more aggressive strategy for accepting change into your DBMS environment.

Best of luck with your project and I hope this information has helped.

Posted in architecture, availability, change management, maintenance, relational, scalability | Leave a comment

Optimizing Database Performance

Last week I was proud to publish my latest database book, this time on the important topic of achieving and maintaining efficient performance for your databases and the applications that access them. Appropriately enough, this book is titled Optimizing Database Performance, and it is available now in both print and ebook formats on amazon. So I hope you will click on the links and check it out!

I wrote the book to explain the many aspects of performance management that need to be addressed when dealing with database management systems. You see, even though it is generally accepted that you should tune your systems and that high performance is a “good thing,” rarely does anybody really take the time to define what is meant by the term “performance.” Especially when things are falling apart or on fire!

One of the first things this book does is to define the elements of database performance. After grasping the basic concepts, the book expands to detail the components and issues that contribute to how your database systems and application perform. It defines the four levels of database performance and offers in-depth guidance on the components of each. It also offers up steps that can be taken to not only review database performance, but to optimize the systems and applications accessing your databases.

Armed with the information in this book you can confidently implement, maintain, and tune your databases and applications to ensure their effectiveness and efficiency. In other words, ensure optimal database performance.

Get your copy today!

Posted in analytics, availability, data availability, DBA, indexing, operational, optimization, performance | Tagged , , , | Leave a comment

Database Performance versus Scalability

Database performance and scalability are both important aspects of managing and optimizing databases, but they refer to different characteristics and considerations.

Database performance refers to how efficiently a database system can respond to and process queries, transactions, and other operations. It is focused on the speed, responsiveness, and overall efficiency of database operations. Factors that influence database performance include:

  • Query Optimization: Ensuring that database queries are structured and written in a way that minimizes resource usage and execution time.
  • Indexing: Creating and maintaining appropriate indexes on tables to speed up data retrieval and query execution.
  • Data Modeling: Designing the database schema and relationships in a way that minimizes data redundancy and maximizes query performance.
  • Caching: Implementing caching mechanisms to store frequently accessed data in memory for faster retrieval.
  • Hardware Resources: Utilizing sufficient hardware resources, such as CPU, memory, and disk speed, to support the database workload.
  • Tuning: Regularly monitoring and tuning the database system to identify and address performance bottlenecks.
  • Concurrency Control: Managing concurrent access to the database to ensure data consistency and prevent conflicts among multiple users or processes.

Database scalability refers to the ability of a database system to handle increasing amounts of data, workload, and users without significantly sacrificing performance. It involves the capacity of the database to grow and handle increased demand over time. There are two main types of scalability:

  • Vertical Scalability (Scaling Up): Increasing the capacity of a single server by adding more resources, such as upgrading the CPU, adding more memory, or increasing storage capacity. This approach has limits and can become costly.
  • Horizontal Scalability (Scaling Out): Distributing the database workload across multiple servers or nodes in a cluster. New servers can be added as needed, allowing for more linear scalability. This approach is more flexible and can potentially handle very large workloads.

Scalability considerations include:

  • Partitioning/Sharding: Dividing the data into smaller partitions or shards to distribute the load across multiple nodes in a distributed system.
  • Replication: Creating copies of data on multiple nodes to improve fault tolerance and allow for read scaling.
  • Load Balancing: Distributing incoming queries and requests evenly across multiple nodes to avoid overloading any single node.
  • Consistency and Coordination: Ensuring data consistency and coordination across distributed nodes while maintaining acceptable performance.

In summary, database performance focuses on optimizing the efficiency and responsiveness of individual database operations, while database scalability focuses on the system’s ability to handle increased data and workload demands while maintaining acceptable performance levels. Both aspects are crucial for designing and managing a successful database system.

Posted in performance, scalability | 1 Comment

Perhaps The Most Important Data Management Issue Today

Although there are many important issues that organizations are struggling with regarding data and data management these days, few would argue that securing and protecting their data is not among the most vital. With the increasing reliance on digital systems and the proliferation of sensitive data, organizations face significant challenges in protecting their databases from unauthorized access, data breaches, and cyber threats.

Data breaches can lead to severe consequences, including financial losses, reputational damage, and legal repercussions. As a result, ensuring the security and privacy of data has become a critical concern for organizations across various industries. It involves implementing robust access control mechanisms, encryption techniques, and adopting best practices for data governance and compliance. Of course, these things are easier said than done.

It is also worth noting that there are many industry and governmental regulations driving the need to improve data protection, management, and administration. One of the more visible governmental regulations is the Sarbanes-Oxley Act (SOX), officially known as the U.S. Public Accounting Reform and Investor Protection Act of 2002. The goal of SOX is to regulate corporations in order to reduce fraud and to improve disclosure and financial reporting. The impact and cost of the law on IT and database management is significant. Section 404 of SOX specifies that the CFO must guarantee the processes used to produce financial reports. Those processes are typically computer programs that access data in a database.

Consider also the Health Insurance Portability and Accountability Act, commonly referred to as HIPAA. This legislation mandates that health care providers protect individual’s health care information, stating that providers must be able to document everyone who even so much as looked at their information. HIPAA audits frequently require the examination of the processes used to create, document and review exception reports and logs. When confronted with a HIPAA audit, organizations can be required to produce a list of exceptions to policy, such as, “When were patient records accessed during off hours and by whom?”  Without database auditing software, it is impossible to produce a list of users who looked at a specific row or set of rows in any database.

Other compliance related legislation includes the Gramm-Leach-Bliley (GLB) Act (also known as the Financial Modernization Act of 1999)  and the E-Government Act, passed in 2002 as a response to terrorist threats. Title III of the act is named the Federal Information Security Management Act (FISMA), which states that federal agencies, contractors, and any entity that supports them, must maintain security commensurate with potential risk.

And we would be remiss without mentioning the PCI-DSS, which is the Payment Card Industry Data Security Standard. PCI-DSS is an industry standard, as opposed to the others mentioned previously which are governmental regulations. Established by the major credit card companies, PCI-DSS was established to dictate the requirements for organizations who accept payment cards. Its goal is to help prevent credit card fraud, hacking, and other security issues. Failing to comply risks losing the right to accept credit cards as payment. PCI-DSS emphasizes the importance of real time monitoring and tracking of access to cardholder data, as well as continuous assessment of security health status of the database storing the data.

Regulatory compliance holds an important sway over upper level management at most medium- to large-size organizations because of its potent impact. Business executives are keenly aware of the need to comply, although they are not always aware of all the details that involves. This is so because failure to comply can result in prosecution which may involve huge fines and even. Regulatory compliance can impose upon C-level executives the need to be able to prove that corporate data (and therefore, database systems) are protected and that processes and procedures enacted upon the data are accurate and required.

Of course, there are other important issues in terms of data security and protection including data archiving, database auditing, data access controls, encryption, data masking, and more.

Database archiving protects data by removing it from the database system and storing it for posterity in an archive data store. There are over 150 international, federal and local laws that establish the mandated retention period for different types of data. Many of these laws greatly expand the duration over which data must be retained. Many organizations deal with retention periods based on business needs only, but this can be short-sighted. Depending on the industry, what was once a short retention period may now expand into decades or even longer.To comply with these laws corporations must re-evaluate their established methods and policies for managing and retaining data. What worked in the past to retain data for a few
years is no longer sufficient over a much longer period.

Database auditing refers to the process of monitoring and recording activities that occur within a database system. It involves capturing and analyzing various actions and events, such as user logins, data modifications, schema changes, and system configuration updates. The primary goal of database auditing is to ensure the integrity, security, and compliance of the database environment. You can think of it as “monitoring who did what to which data when” in the database. Database auditing should be conducted in alignment with applicable legal and privacy requirements. Organizations should define clear auditing policies, specify the scope of auditing, and ensure the appropriate level of data protection and access controls are in place to safeguard the audit logs themselves.

Data access controls refer to mechanisms and procedures enacted to manage the access and usage of data within a database. These controls are designed to ensure that only authorized individuals or entities can access, modify, or retrieve specific data based on predefined permissions and security policies. Data access controls are an essential component of data security and play a crucial role in protecting sensitive information from unauthorized access, misuse, or data breaches. Examples include authentication, authorization, Role-Based Access Control (RBAC), data encyrption, and database views.

It’s important to implement a comprehensive and layered approach to data access controls, combining multiple control mechanisms to ensure the security and integrity of sensitive data. The specific access control measures implemented may vary based on the nature of the data, the system architecture, and the regulatory requirements applicable to the organization.

Another technique to protect database data is data masking, which is the process of protecting sensitive data from inappropriate visibility by replacing it with realistic, but not accurate data. The goal is for sensitive, personally identifiable information (PII) to not be exposed outside of authorized environments. Data masking can prevent fraud, identity theft, & other criminal activities. Data masking can be done while provisioning test environments so that copies created to support development & testing do not expose sensitive information. Valid production data is replaced with usable, referentially intact, but obfuscated data. After masking, the test data is usable just like production data, but the information content is secure.

And let’s not forget about another long-standing security issue, SQL injection, which is a form of web hacking whereby SQL statements are specified in the fields of a web form to cause a poorly designed application to dump database content to the attacker. In order for SQL injection to succeed, the application code used by the website must be vulnerable to an injection attack. SQL injection relies on programs that do not adequately filter for string literal escape characters embedded in SQL statements or where user input is not strongly typed. So instead of inputting data into a form, SQL statements are supplied. The SQL is “injected” from the web form into the database causing it to be executed and access (or even modify) unintended data. This form of attack has been known for decades now, but it still happens to poorly-designed applications.

Furthermore, as the volume of data continues to grow exponentially, another important issue is managing the scalability and performance of databases. Ensuring that data is protected, while at the same time being efficiently accessible, is a major data management challenge being faced today.

Posted in auditing, availability, compliance, data, data availability, data breach, data governance, DBA | Leave a comment

The Impact of Row Size on Query Performance

As part of my job as a database consultant, I regularly keep my eyes open for interesting topics in relevant blogs and forums. Recently, one interaction caught my eye, where somebody asked what kind of a performance impact could be expected if a query was issued against two similar tables. The first table had (say) 20 columns, and the second table had the same 20 columns, as well as 35 additional columns.

There were a significant number of folks providing input on the topic, but the overwhelming consensus was that as long as the query was going against the same columns then performance should be about the same.

I disagree. Here is why.

You also need to factor in the I/O requests that are required to return the data. Most DBMSes perform I/O operations at the block (or page) level. And this is the case regardless of whether you want to return one row or millions of rows.

For multi-row results, accessing data from the table with the wider row (more columns) will usually take longer than accessing from a narrower row (fewer columns). This is the case because the table with narrower rows will have more rows per page/block than the table with wider rows.

Just a quick calculation should help you to understand what I mean. Assume that the narrower row (say with 100 columns) has an overall row length of 600 bytes. If the block size of the table/tablespace is 4K, then about 6 rows can be stored in each page.

How assume that the wider row (say with 120 column) has an overall row length of 900 bytes. Again with a block size of 4K, this time we can store 4 rows per page.

So what, you may ask? In each case I only want to return the same 4 columns. Who cares what else those columns are stored with?

Well, remember that the DBMS stores data in blocks/pages and also reads at the block/page level. This is the case no matter how many columns you want to access. So, let’s consider the worst case scenario, a table scan with no indexes. There will be more pages with wider columns! Consider the case where there are just 1000 total rows:

  • The wider table – at 4 rows per page – will require at least 1000/4 or 250 pages
  • The narrower table – at 6 rows per page – will require at least 1000/6 or 167 pages

So, with a table scan the wider table requires 250 I/O operations, whereas the narrower table requires only 167 I/O operations – or 83 fewer I/Os! And that is the case regardless of the number of columns being returned.

Now which one of these do you think will be more efficient?

Of course, this is just looking at things at a macro level, and we all now that there are many nuances and variations that can impact performance. Nevertheless, and speaking in generalities, the fewer rows/page, the less efficient queries against that table will be.

Think about it this way. Is it faster to pull smaller peaches out of a basket than bigger peaches? That is about the same type of question and anybody can envision the process. Say you want 100 peaches. Small peaches fit 25 per basket; big peaches fit ten per basket. To get 100 small peaches you’d need to pull 4 baskets from the shelf. To get 100 big peaches you’d need to pull 10 baskets from the shelf. The second task will clearly take more time.

Of course, the exact performance difference is difficult to calculate – especially without knowledge of the specific DBMS being used, information about in-memory tactics and buffer pools, and the exact type of data and queries being used. But there will, more than likely, be a performance effect on queries when you add columns to a table.

Posted in database design, performance, SQL | Leave a comment