Implementing a Database Infrastructure

I recently received a question about hot to put together a database environment from scratch. This is a quick question that requires an in-depth answer — usually moreso than can adequately be explained over email or texting. Furthremore, to do a thorough job of it would require a lot of planning and analysis of the organization in question.

Nevertheless, I always try to provide some good advice whenever somebody seeks me out to ask a question. So here is the answer I offered up:

That is indeed an interesting question that is not always easy to answer. Here is how I would go about determining the database architecture for a an organization.

The first thing to do is to determine the primary relational DBMS that will serve as the backbone of the organization’s transactional systems. This could be MySQL, of course, (note: the query talked about experience with MySQL) or you may need to go with a DBMS that has a stronger track record of enterprise support (such as IBM Db2, Microsoft SQL Server, or Oracle Database). Factors to examine include the O/S supported by the DBMS and your organization’s required O/S support, performance and availability requirements, scalability needs, available technicians, support for the cloud, total cost of ownership, and analyst recommendations.

Keep in mind that an organization may have multiple relational/SQL database systems due to packaged application support and historical needs. The lerading relational/SQL DBMSes will support transaction processing for most applications. The next step is to decide on an analytical DBMS for the backbone of analytical, data warehousing and data science type processing. Again, we should choose a relational/SQL DBMS, but it should have the capability to perform OLAP and column-based queries. If the DBMS supports HTAP (Hybrid Transaction Analytical Processing) then perhaps the same DBMS can be used for both transactions and analytics, but this often requires additional configuration and perhaps an appliance (ex. IBM Db2 Analytics Accelerator).

Then you need to review the use cases for all upcoming projects. If the project can be accomplished using the DBMSes selected already, this is ideal. However, life is not always ideal and sometimes this requires adding a new DBMS, and even a new type of DBMS. For example, consider the NoSQL DBMS variants (graph, wide column, document, and key/value databases). Each of these support specific use cases and might be required for web, social media, IoT, or other types of projects that are not easily supported by relational/SQL database systems.

And do not forget legacy DBMS systems required to run applications that have been around for awhile. This may mean that you will have to also support database systems like IMS (hierarchical), IDMS (network/CODASYL), Adabas (inverted list), or OO database systems.

Once you have your infrastructure set, which will likely consist of multiple DBMS products), you need to plan for the update cycle for each DBMS product. This means coordinating with the published release cycle for each vendor. You must always ensure that you are running a supported version of the DBMS. Vendors typically release new versions every 18 to 36t months, but support a product for 5 or more years running from GA (general availability), thru EOM (end of marketing – meaning no longer sold), and EOS (end of support – meaning no longer supported). You must plan to move to a new release of each DBMS before the EOS deadline.

Further complicating issues is that many vendors have moved to an agile, continuous availability model. This means that smaller, more frequent updates are being made available. You must decide where your organization fits on the Gartner technology adoption framework (https://www.gartner.com/it-glossary/type-a-b-and-c-enterprises) and create a strategy for how frequently you accept new modifications.

The cloud throws another monkey-wrench in to this because if you use a cloud database like SQL Azure, for example, then the cloud provider (in this case Microsoft) will always be running the most recent version of the DBMS. So you will have to adopt a more aggressive strategy for accepting change into your DBMS environment.

What do you think? Is this a good high-level approach to assembling a database infrastructure for an organization?

Posted in architecture, DBMS | Leave a comment

Ancelus: an Intriguing New Type of Database

There are many different DBMS types and products on the market these days. We have relational, key/value, document, wide column stores, graph, in-memory, multi-model, and even pre-relational DBMS types. It is rare to see anything truly new and interesting in the realm of DBMS offerings, but I think you will agree than Ancelus is indeed something new and interesting.

Billed as the nanosecond database, Ancelus has been able to achieve incredible performance benchmark results that are orders of magnitude faster than comparable relational database management system results. Better and faster performance is always at the top of the list of what organizations want from their database systems. And Ancelus can deliver. Benchmark results show constant read/write latency under 100 nanoseconds regartdless of the amount of data, up to 16 exabytes. Ancelus ran a billion-row search in 2.4 microseconds; and it adds only about 500 nanoseconds for each 10x increase in database size. Basically, queries that run for hours or days on typical database implementations can run in microseconds on Ancelus.

A database system capable of such speed is intriguing for many applications such as financial analysis, options trading, AI/ML, and many others.

Sometimes databases and applications sacrifice data integrity for speed, but not so with Ancelus. It is ACID-compliant while delivering extreme speed, complexity handling, and massive scale all within a single system.

Furthermore, Ancelus works efficiently for both transactions and analytics, supporting HTAP (Hybrid Transaction Analytical Processing) and translytical environments. Ancelus is not a SQL database system; data is typically accessed by a native API. However, Ancelus does offer TQL (Threaded Query Language), which converts Ancelus schema structures into SQL-readable formats for ease of integration.

What Makes Ancelus Work

The Ancelus database uses a patented algorithmic approach to data storage that replaces predefined storage structures with an algorithm that determines the physical storage location of data. This algorithmic approach decides where to put data based on efficiency of access to it. This means that physical data storage and retrieval are independent of any predefined data layout.

Ancelus couples this storage approach with in-memory data access techniques which are also based on patented memory management methods. By combining algorithmic data storage and in-memory data management, Ancelus has achieved constant performance even for very large databases and complex applications.

There is no duplication of data in an Ancelus database. Each data element is stored only once and data is linked using pointers to its data dictionary. A consequence of this design is that referential integrity is not optional—it is designed right into the database … and it does not impede performance.

With this architecture, users can implement multiple logical schemas over the data at the same time, to access the data as it is in relational tables (even though it is not). The actual physical schema is not exposed to users.

Ancelus can be run on Linux, UNIX, and Windows platforms.

Consider taking a look at Ancelus if you are in the market for a high-performance database system to power your complex applications. You may be able to reduce cost, speed up development, and improve operations using Ancelus. Indeed, some applications that may have been perceived as impractical or too costly may become achievable with Ancelus.

Posted in data, DBMS, performance | 1 Comment

Happy SysAdmin Day… for DBAs, too!

Today, July 26, 2019 is SysAdmin Day, otherwise known as System Administrator Appreciation Day. Actually, it is the 20th annual SysAdmin Day!

What is SysAdmin Day, you may ask? Well, it is a day to celebrate, appreciate, and thank your IT system administration professionals. These are the folks that keep your IT infrastructure up and running. You know, the people who, if they are doing their job, you don’t even recognize that they are there. The only time they are usually recognized is when you have a problem and they are contacted and dispatched to fix it… but they work every day to minimize the number of problems you experience.

So the SysAdmin Day, take a minute or two to recognize the people at your shop who keep the servers, network, and databases (yes, DBAs are a form of system administrator) running. Buy them a donut.

584fcc8d6a5ae41a83ddee8dOr take them a nice cup of coffee (you know, something better than that swill that is brewed in your office coffee maker).

Here’s to you, Mr. and Mrs. SysAdmin! Have a great day… and I hope you are not working late this weekend.

Posted in DBA, SysAdmin | Leave a comment

A (Very) Quick Intro to Cloud Computing

Cloud computing refers to storing and accessing data and programs over the Internet instead of on your own computers. The “cloud” is a metaphor for the Internet. When you hear the term “cloud,” you can translate that to “someone else’s computer.”

However, things are not quite as uncloudy as that in terms of grasping what cloud means. Sometimes, when folks refer to cloud they are talking about a type of development using APIs and microservices. And for private cloud (or local cloud) implementations there may be no Internet access involved. It can get confusing.

The Trends

The clear direction these days is that many components of the IT infrastructure are moving from on premises to the cloud. Enterprises are choosing which applications, and how much of the infrastructure supporting those applications, should be moved into the cloud.

There are several options that can be chosen. The Public Cloud refers to an infrastructure that is provisioned by a cloud provider for open use by the general public. The Private Cloud is where the infrastructure is typically provisioned solely for a single organization, whether managed internally or by a third-party and hosted internally or externally. Finally, a Hybrid Cloud solution is a combination of two or more clouds bound together, delivering the benefits of multiple deployment models.

Another type of hybrid development is where an organization combines components on premises and in the cloud to deliver services and applications. Many organizations are taking this approach as they try cloud computing options.

Facts and Figures

There is a pervasive belief, especially among industry analysts, that the cloud is going to take over everything and on premises computing will wither away. Gartner estimates that the cloud computing market will reach $411 billion by 2020 (and that is just next year)!

Organizations’ confidence in the cloud, including the ability to protect and secure data and applications is rising. And this increased confidence should correlate with growing cloud adoption. According to a 2017 study, cloud usage increased from 24 percent of workloads in 2015 to 44 percent at the time of the study. Furthermore, they predicted that by 2019 65% of workloads would be in the cloud.

My Take

Clearly, there are benefits to cloud computing including economies of scale and the ability to scale up and down as needed. But there are deteriments, too, including less control of your data and latency issues. What happens to all of those applications built on the cloud if your Internet service is disrupted?

Contrary to the current popular belief, on-premises computing is not going to disappear any time soon… think of all those mainframes still humming away out there. And according to various sources (compiled by Syncsort):

  • Mainframes handle 68 percent of the world’s production IT workloads
  • 71 Percent of Fortune 500 Companies Use Mainframes
  • Mainframes handle 87 percent of all credit card transactions

So mainframes alone, which are not going away, will still handle a large amount of enterprise computing workload.

That said, I believe that public cloud adoption will most likely be much lower than most predictions through 2022, and probably even beyond that. Even if demand is high, Cloud Service Providers (CSPs) cannot possibly build out their infrastructure fast enough to support all the existing data center capacity “out there.”  Mark Thiele shared a concise, insightful article on LinkedIn that is worth reading summarizing these thoughts quite well.

The Bottom Line

At any rate, cloud computing is a viable method for building enterprise applications. It can be used to reduce the cost of managing and maintaining your IT systems. At the same time, the cloud can enhance flexibility, deliver quick scalability, and ensure that you are running with current, up-to-date system software.

Posted in cloud | 1 Comment

A New and Improved Navicat Monitor

DBAs are always looking for ways to better manage the performance of their database systems and the applications that access those databases. Indeed, monitoring and tuning are perhaps the most frequent tasks that DBAs perform. So, it makes sense that DBAs want to use tools that ease this burden.

One tool that should be on the radar for DBAs looking to improve their performance management capabilities is the latest release of Navicat Monitor, version 2.0, which now includes support for Microsoft SQL Server. That means that Navicat Monitor can now support the monitoring of Microsoft SQL Server, MySQL, MariaDB, and Percona Server databases.

Navicat Monitor is an agentless remote server monitoring tool that runs on Windows, Mac or Linux and can be accessed from anywhere via a web browser. Navicat Monitor can be installed on any local computer or virtual machine. It also supports cloud services including AWS, Oracle Cloud, Alibaba Cloud, Google Cloud, Microsoft Azure and others. It does not require any software installation on the servers being monitored.

It is common for DBAs these days to manage multiple different types of DBMSes, and Navicat Monitor can help these DBAs with its intuitive dashboard interface. Navicat can display summary information for all your database instances on its main dashboard screen. And with the compact view you can see over a hundred instances on a single screen!

In Figure 1 we can see a screen shot of the dashboard showing how multiple servers can be monitored from a single pane. You can filter by DBMS type and search for specific instances, simplifying the way in which DBAs manage the performance of all the different databases under their purview.

dashboard

Figure 1. Navicat Monitor Dashboard

Using the dashboard DBAs can view a one-page summary of the real-time health and performance of all the database instances they are responsible for. And the dashboard can be customized to enable DBAs to view the information they want in the manner they want to see it.

Microsoft SQL Server Support

But the big news for Navicat Monitor Version 2.0 is the addition of support for Microsoft SQL Server. All the performance management capabilities you have been using for MySQL and MariaDB are now available for SQL Server. This means you can measure database performance metrics such as CPU load, RAM usage, I/O bottlenecks, locking issues and more for each database type and instance you support.

A major component of performance monitoring and tuning for Microsoft SQL Server involves determining the cause of observed or reported SQL Server issues and implementing the requisite changes to improve performance. Changes may be required to any component of the technology stack supporting the applications including the SQL Server database, application code, operating system configuration, and hardware components. From a database perspective, tuning can require modifications to many different facets of your SQL Server environment, including Transact-SQL (whether in queries, stored procedures, or programs), optimized execution plans, indexing, database structures, your SQL Server configuration, operating system configuration, and the physical hardware you use including memory, disk and other data storage mechanisms, and really, any parameters or configuration options.

The challenge is to identify what is causing the performance issue and that is where Navicat Monitor shines. DBAs can use Navicat Monitor to gain visibility into instance resource utilization, performance, and operational health. Using Navicat Monitor you can get a complete overview of all your instances and how they are performing. You can interact with Navicat Monitor using its graphical visualization of performance metrics for a high-level view, and then drill down into a more detailed analysis.

Navicat Monitor uses a performance repository to capture historical performance metrics which you can use to evaluate performance trends and diagnose problems. DBAs can set up customized rules to alert when specific performance thresholds are reached, delivering notifications via email, SMS, SNMP, or Slack. And if you do not have time to customize your own rules Navicat Monitor comes preconfigured with more than 40 alert policies right out-of-the-box. Of course, these rules can be customized later to conform to the metrics and thresholds most important to your environment.

But probably the single most vexing issue for DBAs and developers is SQL performance. It is estimated that as much as 80% of relational database performance problems can be attributed to poorly performing SQL. And without the capabilities of Navicat Monitor it can be extremely challenging to identify, analyze and improve the performance of poorly performing SQL queries.

Navicat Monitor’s Query Analyzer feature delivers the ability to identify, analyze and optimize the performance of SQL queries. Using Query Analyzer to regularly track the performance of your top resource-consuming SQL statements can help you to constantly improve the overall performance of your applications by finding, and then tuning the worst performers first.query_analyzer

Figure 2. Navicat Monitor Query Analyzer

Refer to Figure 2. You can use Query Analyzer to implement a common best practice, identifying and tuning a top five list of problem queries. Query Analyzer gathers information about SQL queries and identifies the Top 5 Queries, which are the 5 most time-consuming query statements, along with additional details.

Take a look at the Query Table section of the Query Analyzer, shown toward the bottom of the screen shot in Figure 2. Here we see that Navicat has identified the Top 5 queries based on total execution time, along with additional details including the SQL text, a count of how many times the SQL was run, and cumulative execution time.

You can use Query Analyzer to dive down and acquire more information on all of the longest-running queries in your environment. Figure 3 shows the Long Running Queries screen. Here we can perform an in-depth analysis of the long running queries examining when they are run along with additional details including execution plan details, lock waits, I/O, and all the relevant database and system details to help you optimize your SQL.

long_running_queries

Figure 3. Long Running Queries

 

Summary

If you already use Navicat Monitor, you’ll love the new version. If you have yet to use it, now is the time to take a look. And with the new Microsoft SQL Server support, Navicat Monitor 2.0 will be beneficial to even more organizations and DBA teams than ever before.

Posted in DBA, performance, SQL | Leave a comment

Digital Transformation and Database Scalability

Today’s business systems are undergoing a revolutionary transformation to work within the digital economy. This phenomenon is known as digital transformation. There are four over-arching trends that are driving digital transformation today summarized by the acronym SMAC: social, mobile, analytics and cloud.

Mobile computing, is transforming the way most people interact with applications. Just about everybody has a smartphone, a tablet, or both. And the devices are being used to keep people engaged throughout the day, no matter where they are located. This change means that customers are engaging and interacting with organizations more frequently, from more diverse locations than ever before, and at any time around-the-clock. End users are constantly checking their balances, searching for deals, monitoring their health, and more from mobile devices. And their expectation is that they can access their information at their convenience.

Cloud computing, which is the practice of using a network of remote servers hosted on the internet to store, manage, and process data and applications, rather than a local host, enables more types and sizes of organizations than ever before to be able to deploy and make use of computing resources—without having to own those resources. Applications and databases hosted in the cloud need to be resilient in order to adapt to changing workloads.

And the Big Data phenomenon has boosted the amount of data being created and stored.
The amount and types of data that can be accessed and analyzed continue to grow by leaps and bounds. And when analytics is performed on data from mobile, social, and cloud computing, it becomes more accessible and useful by anyone, anywhere, at any time.

All of these trends have caused organizations to scale their database implementations and environments to accommodate data and workload growth. But how can databases be scaled?

Well, at a high level, there are two types of database scalability: vertical scaling and horizontal scaling.

Vertical scaling , also known as scaling up, is the process of adding resources, such as
memory or more powerful CPUs to an existing server. Removing memory or changing to a less powerful CPU is known as scaling down.

Adding or replacing resources to a system typically results in performance gains, but to realize such gains can require reconfiguration and downtime. Furthermore, there are limitations to the amount of additional resources that can be applied to a single system, as well as to the software that uses the system.

Vertical scaling has been a standard method of scaling for traditional RDBMSs that are architected on a single-server type model. Nevertheless, every piece of hardware has limitations that, when met, cause further vertical scaling to be impossible. For example, if your system only supports 256 GB of memory when you need more memory you must migrate to a bigger box, which is a costly and risky procedure requiring database and application downtime.

Horizontal scaling , sometimes referred to as scaling out, is the process of adding more
hardware to a system. This typically means adding nodes (new servers) to an existing system. Doing the opposite, that is removing hardware, is known as scaling in.

With the cost of hardware declining, it can make sense to adopt horizontal scaling  using low-cost commodity systems for tasks that previously required larger computers, such as mainframes. Of course, horizontal scaling can be limited by the capability of software to exploit networked computer resources and other technology constraints. And keep in mind that traditional database servers cannot run on more than a few machines. Scaling is limited, in that you are scaling to several machines, not to 100x or more.

Horizontal and vertical scaling can be combined, with resources added to existing servers to scale vertically and additional servers added to scale horizontally when required. It is wise to consider the tradeoffs between horizontal and vertical scaling as you consider each approach.

Horizontal scaling results in more computers networked together and that will cause increased management complexity. It can also result in latency between nodes and complicate programming efforts if not properly managed by either the database system or the application. Vertical scaling can be less costly; it typically costs less to reconfigure existing hardware than to procure and configure new hardware. Of course, vertical scaling can lead to overprovisioning which can be quite costly. At any rate, virtualization perhaps can help to alleviate the costs of scaling.

 

Posted in Big Data, DBMS, digital transformation, scalability | Leave a comment

Craig Mullins Presenting at IDUG Db2 Tech Conference in Charlotte, NC

Those of you who are Db2 users know that the 2019 IDUG North American Db2 Tech Conference is being held this year in Charlotte, NC the week of June 2-6, 2019. IDUG stands for the International Db2 User Group and it has been meeting every year since 1988.

If you will be there, be sure to attend my speaking presentations on Tuesday, June 4th. My conference session is bright and early Tuesday at 8:00 AM (Session E05) titled Coding Db2 for Performance: By the Book. This session is based on my latest book and it is aimed at application developers. The general idea is to give an overview of the things that you can do as you design and code your Db2 programs (for z/OS or LUW) with performance in mind. All too often performance is an afterthought – and that can be quite expensive. Nail down the basics by attending this session!

Also on Tuesday, but later in the morning, I will deliver a vendor-sponsored presentation for Infotel, at 10:40 AM. This presentation, titled Improving Db2 Application Quality for Optimizing Performance and Controlling Costs, will be delivered in two parts. I will be presenting the first half of the VSP discussing the impact of DevOps on Db2 database management and development. The second half will be delivered by Carlos Almeida of Infotel, who will talk about how their SQL quality assurance solutions can aid the DevOps process for Db2 development.

I hope to see you there!

Posted in DB2, DBA, IDUG, speaking engagements | Leave a comment

Craig Mullins Presenting at Data Summit 2019

The world of data management and database systems is very active right now. Data is at the center of everything that modern organizations do and the technology to manage and analyze it is changing rapidly. It can be difficult to keep up with it all.

If you find yourself in need of being up to speed on everything going on in the world of data, you should plan on attending Data Summit 2019, May 21-22 in Boston, MA.

logo2019

Craig Mullins will be talking about the prevailing database trends during his talk The New World of Database Technologies (Tuesday at Noon).  Keep abreast of the latest trends and issues in the world of database systems, including how the role of DBA is evolving and participating in those trends.

This presentation offers an overview of the rapidly changing world of data management and administration as organizations digitally transform. It examines how database management systems are changing and adapting to modern IT needs.

Issues covered during this presentation include cloud/DBaaS, analytics, NoSQL, IoT, DevOps and the database, and more. We’ll also examine what is happening with DBAs and their role within modern organizations. And all of the trends are backed up with references and links for your further learning and review.

I hope to see you there!

Posted in DBA | Leave a comment

Craig Mullins to Deliver Database Auditing Webinar – May 15, 2019

Increasing governmental and industry regulation coupled with the need for improving the security of sensitive corporate data has driven up the need to track who is accessing data in corporate databases. Organizations must be ever-vigilant to monitor data usage and protect it from unauthorized access.

Each regulation places different demands on what types of data access must be monitored and audited. Ensuring compliance can be difficult, especially when you need to comply with multiple regulations. And you need to be able to capture all relevant data access attempts while still maintaining the service levels for the performance and availability of your applications.

Register for the next Idera Geek Sync webinar, Database Auditing Essentials: Tracking Who Did What to Which Data When, on Wednesday May 15 @ 11 am CT to be delivered by yours truly.

As my regular readers know, database access auditing is a topic I have written and spoken about extensively over the years, so be sure to tune in to hear my latest thoughts on the topic.

You can learn more about the issues and requirements for auditing data access in relational databases. The goal of this presentation is to review the regulations impacting the need to audit at a high level, and then to discuss in detail the things that need to be audited, along with pros and cons of the various ways of accomplishing this.

Register here →

Posted in auditing, compliance, Database security, DBA, speaking engagements | Leave a comment

Inside the Data Reading Room – 1Q2019

It has been awhile since I have published a blog post in the Inside the Data Reading Room series, but that isn’t because I am not reading any more!  It is just that I have not been as active reviewing as I’d like to be. So here we go with some short reviews of data and analytics books I’ve been reading.

Let’s start with Paul Armstrong’s Disruptive Technologies: Understand Evaluate Respond.  Armstrong is a technology strategist who has worked for and with many global companies and brands (including Coca Cola, Experian, and Sony, among others). In this book he discusses strategies for businesses to work with new and emerging technologies.

Perhaps the strongest acclaim that I can give the book is that after reading the book, you will feel that its title is done justice. Armstrong defines what a disruptive technology is and how embrace the change required when something is “disruptive.”

The books offers up a roadmap that can be used to assess, handle, and resolve issues as you identify upcoming technology changes and respond to them appropriately. It idendifies a decision-making framework that can be used that is based on the dimensions of Technology, Behaviour and Data (TBD).

The book is clear and concise, as well as being easy to read. It is not encumbered with a lot of difficult jargon. Since technology is a major aspect of all businesses today (digital transformation) I think both technical and non-technical folks can benefit from the sound approach as outlined in this book.

Another interesting book you should take a look at if you are working with analytics and AI is Machine Learning: A Constraint-Based Approach by Marco Gori. This is a much weightier tome that requires attention and dilgence to digest. But if you are working with analytics, AI, and/or machine learning in any way, the book is worth reading.

The book offers an introductory approach for all readers with an in-depth explanation of the fundamental concepts of machine learning. Concepts such as neural networks and kernel machines are explained in a unified manner.

Information is presented in a unified manner is based on regarding symbolic knowledge bases as a collection of constraints. A special attention is reserved to deep learning, which nicely fits the constrained- based approach followed in this book.

The book is not for non-mathematicians or those only peripherally interested in the subject. Over more than 500 pages the author

There is also a companion web site that procides additional material and assistance.

The last book I want to discuss today is Prashanth H. Southekal’s Data for Business Performance. There is more data at our disposal than ever before and we continue to increase the rate at which we manufacture and gather more data. Shouldn’t we be using this data to improve our businesses? Well, this book provides guidance and techniques to derive value from data in today’s business environment.

Southekal looks at deriving value for three key purposes of data: decision making, compliance, and customer service. The book is structured into three main sections:

  • Part 1 (Define) builds fundamental concepts by defining the key aspects of data as it pertains to digital transformation. This section delves into the different processes that transform data into a useful asset
  • Part 2 (Analyze) covers the challenges that can cause organizations to fail as they attempt to deliver value from their data… and it offers solutions to these challenges that are practical and can be implemented.
  • Part 3 (Realize) provides practical strategies for transforming data into a corporate asset. This section also discusses frameweorks, procedures, and guidelines that you can implement to achieve results.

The book is well-organized and suitable for any student, business person, or techie looking to make sense of how to use data to optimize your business.

If you’ve read any of these books, let me know what you think… and if you have other books that you’d like to see me review here, let me know. I’m always looking for more good books!

Posted in AI, book review, books, business planning, data, data governance, Machine Learning | Leave a comment