Database Fundamentals

I would guess that most of the readers of this blog understand the basic concepts and fundamentals of database technology. However, many folks who think they understand the basics often do not have the knowledge and understanding they believe they have. Therefore, today’s post serves as a very brief introduction to the fundamentals of database management systems.

What Is a Database?

The answer to this question may surprise some readers. Oracle is not a database; neither are Db2, PostgreSQL, MongoDB, MySQL, or SQL Server. Each of these is a DBMS, or database management system. You can use Oracle or Db2 or SQL Server to create a database, but none of these themselves are databases. Many people, even skilled professionals, confuse the overall system – the DBMS – with the creation of the system – databases.

So, what is a database? A database is a structured set of persistent data. A phonebook is a database. However, within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same set of fields where each field is a certain data type and length. In short, a database is an organized store of data where the data is accessible by named data elements.

A DBMS is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving, and storing information in a database. DBMS products are usually responsible for data integrity, data access control, automated rollback, restart and recovery.

Thinking abstractly, you might think of a database as a file folder, and a DBMS as the file cabinet holding the labeled folders. You implement and access database instances using the capabilities of the DBMS.  Your payroll application uses the payroll database, which may be implemented using a DBMS such as Oracle Database 21c, Db2, MongoDB, or SQL Server.

Why is this distinction important? Using precise terms in the workplace avoids confusion. And the less confused we are the more we can avoid problems and issues that lead to over-budget projects, improperly developed systems, and lost productivity. Therefore, precision should be important to all of us.

Why Use a DBMS?

The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations.

Historically, there are four DBMS data models: hierarchical, network, relational. and object-oriented.

A DBMS can be distinguished by the model of data upon which it is based. A data model is a collection of concepts used to describe data. A data model has two fundamental components: its structure, which is the way data is stored, and its operations, which is the way that data can be manipulated. The major DBMS products utilize four different data models:

  1. Network (or CODASYL)
  2. Hierarchical
  3. Relational
  4. Object-oriented

The network data model is structured as a collection of record types and the relationships between these record types. All relationships are explicitly specified and stored as part of the structure of the DBMS. Another common name for the network model is CODASYL. CODASYL is named after the Conference on Data Systems Languages, the committee that formulated the model in the early 1970s. Data is manipulated using the location of a given record and following links to related records. IDMS is an example of a DBMS based on the network model.

The hierarchical data model arranges data into structural trees that store data at lower levels subordinate to data stored at higher levels. A hierarchical data model is based on the network model with the additional restriction that access into a record can only be accomplished in one way. IMS is an example of a DBMS based on the hierarchical model.

The relational data modelconsists of a collection of tables (more properly, relations) wherein the columns define the relationship between tables. The relational model is based on the mathematics of set theory. Contrary to popular belief, the relational model is not named after “relationships,” but after the relations of set theory. A relation is a set with no duplicate values. Data can be manipulated in many ways, but the most common way is through SQL. DB2, Oracle, and SQL Server are examples of DBMS products based on the relational model.

The object-oriented (OO) data modelconsists of a collection of entities, or objects, where each object includes the actions that can take place on that object. In other words, an object encapsulates data and process. With OO systems, data is typically manipulated using an OO programming language. Progress Software’s ObjectStore and Intersystems’ Cache are examples of DBMS products based on the OO model.

Each of these four data models is referred to as a data model for the sake of simplicity. In reality, only the relational and network models have any true, formal data model specification. Different models of data lead to different logical and structural data organizations. The relational model is the most popular data model because it is the most abstract and easiest to apply to data, while providing powerful data manipulation and access capabilities.

Other Types of DBMS

Although the four data models discussed heretofore are the predominant types of DBMS, there are other types of DBMS with varying degrees of commercial acceptance.

A column-oriented DBMS, sometimes called a column store, is a  DBMS that stores its content by column rather than by row. This has advantages for data warehouses where aggregates are computed over large numbers of data items. Of course, a column-oriented DBMS is not based on any formal data model and can be thought of as a special physical implementation of a relational DBMS. Sybase IQ and Greenplum are examples of column stores.

The NoSQL database system is another type of DBMS that has gained traction in the market, usually in Big Data applications. NoSQL DBMSes are characterized by their flexible schema and non-reliance on SQL, however many NoSQL offerings have added support for SQL due to its ubiquity. With Key-Value data stores, a piece of data is associated with a key. The data is not rigidly structured and does not have to conform to a schema such as in a typical database design.

There are four types of NoSQL DBMS products:

  1. Document
  2. Key/Value
  3. Wide-column
  4. Graph

A document store manages and stores data at the document level. A document is essentially an object and is commonly stored as XML, JSON, BSON, etc. A document database is ideally suited for high performance, high availability, and easy scalability. You might consider using a document store for web storefront applications, real-time analytical processing, or to front a blog or content management system. They are not very well-suited for complex transaction processing as typified by traditional relational applications, though. MongoDB is the most popular document database, but others include Couchbase, RavenDB and MarkLogic.

The key/value database system is useful when all access to the database is done using a primary key. There typically is no fixed data model or schema. The key is identified with an arbitrary “lump” of data. A key/value pair database is useful for shopping cart data or storing user profiles. It is not useful when there are complex relationships between data elements or when data needs to be queried by other than the primary key. Examples of key/value stores include Riak, Berkeley DB, and Aerospike.

A wide-column store uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. Examples of wide-column stores include Apache Cassandra, Amazon DynamoDB, and DataStax Enterprise.

Finally, we have the graph database, which uses graph structures with nodes, edges, and properties to represent and store data. In a graph database every element contains a direct pointer to its adjacent element and no index lookups are necessary. Social networks, routing and dispatch systems, and location aware systems are the prime use cases for graph databases. Some examples include Neo4j, GraphBase, and Meronymy.

NoSQL database systems are popular with organizations that face different data challenges than can be solved using traditional RDBMS solutions. Cassandra and MongoDB are examples of NoSQL key-value database systems.

There are other DBMS implementations as well, such as the inverted list structure of the original Adabas and even the dBase format popularized by the PC DBMS, dBase II and dBase III.

Most commercial DBMS implementations today are relational.

Advantages of Using a DBMS

Additionally, a DBMS provides a central store of data that can be accessed by multiple users, from multiple locations. Data can be shared among multiple applications, rather than having to be propagated and stored in new files for every new application. Central storage and management of data within the DBMS provides

  • Data abstraction and independence.
  • Data security.
  • A locking mechanism for concurrent access
  • An efficient handler to balance the needs of multiple applications using the same data
  • The ability to swiftly recover from crashes and errors
  • Robust data integrity capabilities
  • Simple access using a standard API
  • Uniform administration procedures for data

Levels of Data Abstraction

A DBMS can provide many views of a single database schema. A view defines what data the user sees and how that user sees the data. The DBMS provides a level of abstraction between the conceptual schema that defines the logical structure of the database and the physical schema that describes the files, indexes, and other physical mechanisms used by the database. Users function at the conceptual level—by querying columns within rows of tables, for example—instead of having to navigate through the many different types of physical structures that store the data.

A DBMS makes it much easier to modify applications when business requirements change. New categories of data can be added to the database without disruption to the existing system.

Data Independence

A DBMS provides a layer of independence between the data and the applications that use the data. In other words, applications are insulated from how data is structured and stored. The DBMS provides two types of data independence:

  • Logical data independence—protection from changes to the logical structure of data
  • Physical data independence—protection from changes to the physical structure of data

As long as the program uses the API (application programming interface) to the database as provided by the DBMS, developers can avoid changing programs because of database changes.

Note: The primary API to relational databases is SQL. In general, most application SQL statements need not change when database structures change (e.g., a new column is added to a table).

Data Security

Data security prevents unauthorized users from viewing or updating the database. The DBMS uses IDs and passwords to control which users are allowed access to which portions of the database. For example, consider an employee database containing all data about individual employees. Using the DBMS security mechanisms, payroll personnel can be authorized to view payroll data, whereas managers could be permitted to view only data related to project history.

Concurrency Control

A DBMS can serve data to multiple, concurrently executing user programs. This requires a locking mechanism to deliver concurrency control because the actions of different programs running at the same time could conceivably cause data inconsistency. For example, multiple bank ATM users might be able to withdraw $100 each from a checking account containing only $150. A DBMS ensures that such problems are avoided because the locking mechanism isolates transactions competing for the same exact data.

Database Logging

The DBMS uses database logging to record “before” and “after” images of database objects as they are modified. It is important to note that the database log captures information about everydata modification (except in circumstances as determined by the DBA). The information on the database logs can be used to undo and redo transactions. Database logging is handled transparently by the DBMS—that is, it is done automatically.

Ensuring Atomicity and Durability

A DBMS can be used to assure the all-or-nothing quality of transactions. This is referred to as atomicity, and it means that data integrity is maintained even if the system crashes in the middle of a transaction. Furthermore, a DBMS provides recoverability. After a system failure, data can be recovered to a state that existed either immediately before the crash or at some other requisite point in time.

Data Integrity

The DBMS provides mechanisms for defining rules that govern the type of data that can be stored in specific fields or columns. Only data that conforms to the business rules will ever be stored in the database. Furthermore, the DBMS can be set up to manage relationships between different types of data and to ensure that changes to related data elements are accurately implemented.

Data Access

A DBMS provides a standard query language to enable users to interactively interrogate the database and analyze its data. For relational databases, this standard API is SQL, or Structured Query Language. However, SQL is not a requirement for a DBMS to be relational. Furthermore, many DBMS products ship with analytical tools and report writers to further simplify data access.

Summary

This section on DBMS fundamentals is necessarily brief because the focus of this book is on database administration and most readers will find this material to be familiar. If you require additional details on the basic operations and qualities of DBMSs and databases, please refer to the Bibliography for an extensive list of DBMS-related books. My favorites include:

A primary benefit of a DBMS is its ability to maintain and query large amounts of data while assuring data integrity and consistency. It offers transparent recovery from failures, concurrent access, and data independence. In fact, most modern computer applications rely on DBMS and database technology to manage data. Understanding the topic is of benefit to all IT professionals.

Posted in DBMS, education | Leave a comment

A Cold Spell in Texas Makes Me Think About Contingency Planning and the DBMS

Whenever there is an event in the news like the winter storm that wreaked havoc on Texas last week, it makes me think about contingency planning and things like database disaster recovery. The paltry amount of snow we got in Texas probably makes folks “Up North” chuckle, but it was a real problem because Texas almost never gets snow nor does it get as cold for as many days in a row as it did last week. So my first thought is that what qualifies as a “disaster” will differ based on your location and circumstances.

Anyway, when a “disaster” like this hits it is a good time to review your disaster contingency plans… well, before the disaster would have been better, but being human, the wake of a disaster always causes awareness to rise, so let’s discuss disaster recovery in terms of databases.

A disaster recovery plan is like insurance — you’re glad you have it, but you hope you don’t need it. With insurance, you pay a regular fee so that you are covered if you have an accident. A disaster recovery plan is similar because you pay to implement your disaster recovery plan by designating a disaster recovery site, shipping backup copies of the data off-site, preparing recovery jobs, and practicing the recovery procedures.

Database disaster recovery must be an integral component of your overall business recovery plan. A disaster recovery plan must be global in scope. It must handle business issues such as alternate locations for conducting business, communication methods to inform employees of new locations and procedures, and publicity measures to inform customers how to transact business with the company post-disaster. A component of that plan must be the overall plan for resuming data processing and IT operations. And finally, a component of that plan is the resumption of DBMS operations.

In order for your database disaster recovery plan to be effective, you will need to develop and adhere to a written plan. This plan must document all of the routine precautionary measures required to assure the recoverability of your critical data in the event a disaster occurs. Image copy backups or disk backups need to be made as directed and sent to the remote site as quickly as possible. Reports need to be printed and sent off-site. Missing any little detail can render a disaster recovery plan ineffective.

When practicing the disaster recovery plan, make sure that each team member follows the written instruction precisely. Of course, it is quite liklely that things will come up during the practice sessions that were missed or undocumented in the plan. Be sure to capture all of these events and update the written plan after the disaster recovery test. Keep in mind that during an actual disaster you may need to rely on less experienced people, or perhaps consultants and others who are not regular employees. The more failproof the written plan can be the better the chance for a successful disaster recovery will be.

Your disaster recovery procedures will be determined in large part by the method you use to back up your data. If you rely on pack backups, then your recovery will be one disk volume at a time. If you create database image copies, you will probably use the DBMS’s recover utility or a third party recover tool. Of course, you might combine several different techniques for off-site backups depending on the sensitivity and criticality of the data.

The following tips can be helpful as you develop or review your database contingency plans:

Order of Recovery

Make sure the operating system and DBMS are installed at the correct version and maintenance level before proceeding with any database object recovery at the disaster site. Be sure to follow the recovery steps rigorously as documented in the written plan.

Data Latency

How old is the data? If you take nightly backup tapes to another location, your data could be up to 24 hours old. Sometimes having data that old is unacceptable, but sending backup media to off-site storage more than once a day is too expensive. One solution is to get the data to another location digitally—via log shipping or replication, for example. Database logs at the time of the disaster may not be available to apply at the off-site recovery location. Some data may not be fully recoverable and there is really no way around this. The quicker backup copies of database objects and database logs are sent off-site, the better the disaster recovery will be in terms of data currency.

Remember Other Vital Data

Creating offsite backups for database objects may not be sufficient to ensure a complete disaster recovery plan for each application. Be sure to back up related data and send it offsite as well. Additional data and files to consider backing up for the remote site include DDL libraries for database objects, recovery and test scripts, application program source and executable files, stored procedure program source and executable files, user-defined function source and executable files, libraries and passwords for critical third party DBA tools, and other related data files used by the application.

Beware of Compression

If your site uses tape-compression software, be sure that the remote recovery site uses the same tape-compression software. If it does not the image copy backups will not be readable at the remote site. Turn off compression at the primary site for the disaster recovery image copy backups if the remote site cannot read compressed tape files.

Post-Recovery Image Copies

Part of the disaster recovery process should be to create an image copy backup for each database object after it has been recovered at the remote site. Doing enables easier recoverability of the data should an error occur after processing begins at the remtoe site. Without the new image copy backups, the disaster reocvery procedure would have to be performed again if an error occurs after remote site processing begins.

Disaster Prevention

DBAs and IT professionals in general create procedures and enforce policies. Many of these procedures and policies, such as a disaster recovery plan, are geared toward dealing with errors once they occur. Having such procedures and policies is wise. But it is just as wise to establish procedures and policies to prevent problems in the first place. Although you cannot implement procedures to stop an earthquake or flood, you can implement policies to help avoid man-made disasters. For example, enforce frequent password changes to mitigate data loss due to malicious hackers.

Another good idea is to document and diseminate procedures to end users teaching them how to deal with error messages. For example, you cannot expect every user to understand the impact of responding to every error message. Guidelines can help avoid errors – and man-made disasters.

Summary

Only with comprehensive up-front planning, regular testing, and diligent maintenance will a disaster recovery plan be useful. Be sure you have one for your site… or your disaster recovery plan might become a two-step process in the event of a disaster:

1) Update resume

2) Go job-hunting!

Posted in backup & recovery, contingency planning | Leave a comment

DBA Automation

Just a quick blog post today to point you at a couple of other resources that I have created on the topic of automating database administration.

First I’d like to point you to a webinar that I delivered earlier this month (January 2021) titled Automating Database Administration Is No Longer Optional.

You can view the presentation at the link above, in which I discuss data management trends (data growth, DevOps, heterogeneity, complexity, staffing), and how they are driving the need for more automation of DBA tasks and processes. These clear industry trends equate to a demanding, complex environment for managing and administering database systems. Yet DBAs are still required to monitor, optimize, tune, backup, recover, change structures, and move data both on premises and to the cloud. To be successful requires intelligent automation of DBA tasks and procedures.

I also recently published a blog post for IDERA under the same title discussing these issues for those who are not interested in watching the entire presentation.

At any rate, I hope you’ll take a look at one (or both) of these and then weigh in with your comments on automating DBA tasks. I want to hear how you are automating, whether you need more (or less?) DBA automation, how it impacts your job requirements, and anything else you’d like to add. I’d really like to get a conversation going about the need for automation, and perhaps more importantly, the progress you and your organization has made toward DBA automation.

Posted in automation, DBA | Leave a comment

Seasons Greetings 2020

Hello everybody out there in data-land! As always, a big “Thank you” for being a reader of my blog. I hope that you will continue reading in 2021 and that the upcoming year is better for all of us than this year…

And as per my custom, here’s a short post to end the year wishing everybody “out there” a very happy holiday season!

No matter which holidays you celebrate, I hope they bring you joy, contentment, and let you recharge for an even better year next year…

And after the debacle that was 2020, how could 2021 be anything but an improvement!

Happy, Merry, Joyous holidays… see you next year!

Posted in DBA | Leave a comment

What is the Autonomous Digital Enterprise?

Modern organizations are transforming their businesses by adopting and integrating digital technologies and services into the fabric of their operations. This is generally referred to as “digital transformation.” And it is an imperative for success as more business is conducted online, over the web, and using personal devices such as phones and tablets. Businesses have to engage with customers using the same technology and interfaces that their customers use all the time or they risk losing revenue.

But digital transformation is not, in and of itself, sufficient to ensure success. BMC Software has identified what they call the autonomous digital enterprise as the next step on the journey. This means embracing and instilling intelligent automation capabilities throughout the organization.

In an autonomous digital enterprise, automation is a complementary business function that works with – not in place of – humans. By exploiting automation organizations can:

  • Execute with fewer errors
  • Free up employees from mundane tasks
  • Lower costs
  • Improve customer interaction

Note here the term “intelligent” being used with the term “automation.” IT professionals have been automating for a long time. Indeed, everything that IT folks do can be considered some form of automation when compared to manual tasks. But intelligent automation takes things further. It relies on data to drive decision-making, reducing the amount of time it takes to react, thereby reducing latency and saving time.

With artificial intelligence and machine learning capabilities being coupled with automation, the accuracy and ability of automation to intuit what needs to be done improves. As does the agility to effectively implement improvements and correctives measures.

Intelligence enables organizations to transcend impediments to success. For example, in an autonomous digital enterprise, DevOps practices are integrated throughout the enterprise enabling rapid and continuous delivery of applications and services. This requires technology and intelligent automation, but also a shift in organizational mindset to embrace change and instill it everywhere in the company.

An autonomous digital enterprise will have automation everywhere… intelligent automation that improves the customer experiences, speeds up decision-making and implementation, and interacts with customers the way they expect.

This vision if the autonomous digital enterprise is both audacious and compelling – and it is well worth examining for your organization.

Posted in AI, automation, DevOps, digital transformation, enterprise computing | Leave a comment

Data Summit Fall 2020 Presentation Now Available for Replay: Modern Database Administration

I was honored to deliver a presentation at this year’s Data Summit conference on the changing world of the DBA. I spoke for about a half hour on Thursday, October 22nd on DBA and database systems trends and issues.

The conference sessions were conducted live, but were recorded as they were delivered. And now my session can be viewed here!

I hope you take a moment to watch the presentation and consider the issues I bring up.

And, if you are interested in useful tools that help with the trends I discuss, stay around after my presentation (which does not talk about any particular vendor tools) to hear a strategist from Quest give their perspective on the issues and their DBA tools that can help.

Finally, I hope you will comment here on the blog if you have any questions, comments, or additional issues you’d like to discusss!

Posted in AI, analytics, Big Data, data, data breach, Data Growth, DBA, DBMS, DevOps, IoT, Machine Learning, review, speaking engagements, tools, trends | Leave a comment

Craig Mullins Presenting at Data Summit Fall 2020

Keeping abreast of what is happening in the world of data management and database systems is important for all IT professionals these days. Data is at the center of everything that modern organizations do and the amount of data we have to store, manage, and access is growing at an ever-increasing pace. It can be difficult to keep up with it all.

If you find yourself in need of being up to speed on everything going on in the world of data, you should plan on attending the Data Summit Fall 2020 virtual event. Held in person in years past, this year the event is offered as a free online webinar series running from October 20 thru 22, 2020.

logo2019

And this year I will be speaking again at the event, but hopefully more of you will be able to attend than in years past, since there is no travel involved! My presentation will be on the changing world of the DBA (Thursday, October 22nd, at Noon Eastern time. I’ll discuss how the DBA’s job is impacted by big data and data growth, NoSQL, DevOps, the cloud, and more.

I hope to see you there!

Posted in DBA | Leave a comment

COBOL V4.2 to COBOL V6 Migration – The Cost of Procrastination

Today’s post is a guest article written by Dale Vecchio, IT modernization expert and former Gartner analyst.

While no one can argue that the COBOL language has had tremendous staying power over the last 50-60 years, its biggest attribute these days is best summed up in the expression “leave well enough alone”! Yeah, COBOL is here and the applications still work. But the costs of staying wedded to this 3GL procedural language are increasing. As COBOL 4.2 reaches end-of-support, the conversion to COBOL V6 is a non-trivial exercise. Even IBM admitted as much in a 2018 presentation, “Migrating to Enterprise COBOL v6”. Organizations have been upgrading their COBOL versions for a number of decades, but this jump seems particularly onerous. For example, IBM reports that a customer perceived migrating from COBOL V3 to V4 had a difficulty level of “3”, while upgrading from V4 to V6 had a difficulty level of “20”!! Of course, there are improvements in COBOL V6, but they come at a price. Any mainframe organization that is not at current hardware/software levels may find they need to upgrade just to be able to support this version of COBOL. COBOL v6, by IBM’s own admission will require 20x more memory at compile time and will take 5x to 12x longer to compile! But probably most problematic is that 25% of customers migrating to COBOL v6 ran into migration difficulties dues to “invalid data”. One of the many challenges of mainframe modernization is that organizations either “cheated” or simply got away with “unsupported features” in COBOL. Earlier versions of COBOL programs may have accepted data formats that are no longer “valid” in v6. These problems are the most difficult to find, since the program MAY appear to work, but generate wrong results. The best that could happen is the program will fail and then your limited development staff can “go fishing” in a 30-40 year old COBOL program trying to figure out what the heck the problem is!! IBM’s view on this seems to be, “well, you created the problem so you fix it!” The amount of effort necessary to migrate to v6 is greatly exacerbated by this data problem, since it is likely to dramatically increase the testing needed.

Consequently the entire argument that it’s “safer” to stay on COBOL and just upgrade is a specious one. Perhaps the most common modernization strategy of the last 20 years, procrastination, is no longer a viable choice! Prolonging the usage of a procedural, 3-GL language, against the backdrop of a declining skills pool is increasingly risky. I can assure you that many organizations I have spoken to around the world over the last 20 years have the DESIRE to modernization, on or off the mainframe, but the risks and costs have been seen as simply too high. These migration risks are quickly becoming balanced by the risks of NOT modernizing. The modern IT world is increasingly one of Linux, cloud, open source and Java. The innovation is in these areas. The skills are in these areas. No one is saying anything bad about the mainframe here – only that there are acceptable options for running enterprise workload that do NOT require the legacy OS or transactional environments of the past.

While Java is not the only path to a modern IT application environment, it is certainly one of the most common. So the trick is to figure out how to move in that direction, while mitigating the risks. If you are going to have to invest in some of your COBOL applications, why not evolve to a modern Linux world? There are plenty of issues to deal with when modernizing applications, so reducing the risks in some areas is a good idea. Easing your applications into a modern devops environment that is plentiful with skilled developers is a worthwhile investment. You don’t have to modernize every COBOL application any more than you need to upgrade everyone to v6! Modernization is a journey, but you’ll never reach your destination if you don’t take the first step. Code transformation solutions that give you decent , performant Java programs, that can be managed by a devops tool chain, and enhanced by Java developers are a worthwhile consideration. Code transformation solutions that are syntactic, line-by-line transformations are NOT the answer – ones that refactor COBOL in Java classes and methods are! Let’s be realistic – someof your COBOL applications have very little enhancements made annually. If you can get them transformed into Java, and they can then take advantage of the cost benefits of these runtime environment, whether on the mainframe (specialty engines) or off, your modernization journey is off to a good start.

To listen to a webinar discussing this topic, go to https://youtu.be/2b8XrOovHn4

Posted in DBA | Tagged , , | 1 Comment

IBM POWER and SAP HANA: A Powerful and Effective Combination

As organizations looks for differentiators to improve efficiency and improve cost effectiveness, the combination of IBM Power Systems and SAP HANA can provide a potent platform with a significant return on investment.

Why IBM Power Systems?

IBM Power Systems is a family of server computers that are based on IBM’s POWER processors. The POWER processor is actually a series of high-performance microprocessors from IBM, all called POWER followed by a number designating its generation. For example, POWER1, POWER2, POWER3 and so forth up to the latest POWER10, which was announced in mid-August 2020 and is scheduled for availability in 2021.

What makes IBM Power Systems different than typical x86 architecture servers is the RISC, or Reduced Instruction Set Computer, architecture based on IBM research that began in the 1970s. POWER microprocessors were designed specifically for servers and their intrinsic processing requirements.

In contrast, x86 CPUs were initially built for and catered to the personal computer market. They are designed as general-purpose processors that can be used for a variety of workloads, even for home PCs. As the processing power of the x86 microprocessors advanced over time, they were adapted for usage in servers.

So, looking at the two alternatives today, both x86 and IBM Power Systems seem to be competitive architectures for servers running enterprise workloads. However, POWER microprocessors were designed to services high-performance enterprise workloads, such as database management systems, transaction processing and ERP systems. Although x86 microprocessors can be used for those type of workloads, too, it is not typically as efficient because of its general-purpose design, as opposed to the POWER processors specific design for enterprise computing.

IBM Power Systems deliver simultaneous multithreading (SMT), a technique for improving the overall efficiency of CPUs permitting multiple independent threads to execute and utilize the resources of the processor architecture. With IBM Power Systems SMT8 every processor can run eight threads in parallel, which is about 4 times higher than its competitors. Symmetric multithreading helps to mask memory latency, and increase efficiency and throughput of computations.

Virtualization is another differentiator for POWER, because they were built to support virtualization from the get-go. POWER features a built-in hypervisor that operates very efficiently. On the other hand, x86 was not originally designed for virtualization, which means you need to use a third-party hypervisor (e.g. VMware),

Scalability is another issue where IBM POWER excels versus x86. Although you can scale both, x86 scaling typically requires adding more servers. With POWER, the chips themselves are designed to scale seamlessly without having to add hardware (although you can if you so desire).

The bottom line is that the POWER architecture provides benefits for modern workloads, such as for big data analytics and Artificial intelligence (AI). Which brings us to SAP HANA.

Why SAP HANA?

SAP HANA is an in-memory database management system that delivers high-speed data access. It can offer efficient, high performance data access due to its usage of memory and its storage of data in column-based tables as opposed to the row-based tables of a traditional SQL DBMS. Such a columnar structure can often deliver faster performance when queries only need to access certain sets of columns.

SAP HANA provides native capabilities for machine learning, spatial processing, graph, streaming analytics, time series, text analytics/search, and cognitive services all within the same platform. As such, it is ideal for implementing modern next-generation Big Data, IoT, translytical, and advanced analytics applications. 

SAP S/4HANA is the latest iteration of the SAP ERP system that uses SAP HANA as the DBMS, instead of requiring a third-party DBMS (such as Oracle or Db2). It is a completely revamped version of their ERP system.

Organizations implement SAP HANA as both a standalone, highly-efficient database system, and also as part of the SAP S/4HANA ERP environment. And for both of these HANA applications, IBM Power Systems is the ideal hardware for ensuring optimal performance, flexibility, and versatility.

Why IBM POWER + SAP HANA?

IBM Power Systems are particularly good at powering large computing workloads. Their ability to take advantage of large amounts of system memory and to be logically partitioned make them ideal for implementing SAP HANA.

If you need something that can take advantage of 64 TB of memory on board and can host up to 16 production SAP HANA LPARs, the high-end POWER E980 is a good choice. Earlier this year (2020), SAP announced support of Virtual Persistent Memory on IBM Power Systems for SAP HANA workloads. What this means is that using the PowerVM hypervisor located on the firmware it is possible to support up to 24TB for each LPAR. Virtual Persistent Memory is available only on IBM Power Systems for SAP HANA.

There are many benefits that can accrue after adopting Virtual Persistent Memory on IBM Power Systems and SAP HANA. For example, it provides faster restart and faster shutdown processing, which expands the outage window for change control, thereby potentially enabling more work to be done during the outage. Alternatively, the duration of the change control window may be able to be shrunk, thereby reducing the outage to make changes.

And let’s not forget to mention the SMT8 capability of IBM Power Systems, which will improve cache per core, thereby improving SAP HANA performance on a Power machine as compared with other machines.

Of course, there are also midrange IBM Power Systems such as the E950 that can be used if your requirements are not at the high-end.

Cost

Of course, a server can be powerful and efficient, but if it is not also cost-effective it will be difficult for organizations to adopt it. Forrester Research conducted a three-year financial impact study and concluded that IBM Power Systems for SAP HANA delivers a cost-effective solution.

The study required multiple customer interviews and data aggregation, which resulted in Forrester determining the following benefits of running SAP HANA on IBM Power Systems as opposed to other platforms:

  • Avoided cost of system downtime (36%) – the composite organization avoided 4 hours of planned and unplanned downtime per month
  • Reduced cost of power and cooling (4%) – the composite organization saved nearly 438,000 KwH of power per year
  • Avoided cost of alternate server architecture (49%) – other architectures required as many as 20 systems, as compared to an architecture with only 3 IBM Power Systems servers
  • Reduced cost of managing and maintaining infrastructure (11%) – System administrators saved 60% of their productivity due to a reduced management and maintenance burden

The net/net shows a 137% return on investment (ROI) with a 7 month payback.

It is also important to note that IBM offers subscription-based licensing for Power Systems where you pay only for what you use. With this flexible capacity on demand option your organization can stop overpaying for resources you do not use. A metering system is used to determine your usage and you will be billed accordingly, instead of paying for the entire capacity up-front.

Use Cases

There are many examples of customers deploying IBM Power Systems to achieve substantial benefits and returns on their investment.

One example of using IBM Power Systems to reduce footprint and simplify its infrastructure is Würth Group. Located in Germany, Würth Group is a worldwide wholesaler of fasteners and tools with approximately 120,000 different products. The company deployed IBM Power Systems and was able to slim down the number of physical servers and SAP HANA instances from seven to one, an 86% reduction cutting power consumption and operating costs.

Danish Defence has implemented SAP HANA on IBM Power Systems to support military administration and operations with rapid, reliable reporting. As a result, they achieved up to 50% faster system response times enabling employees to work more productively. Additionally, processes completed 4 hours ahead of schedule meaning that reports are always available at the start of each day. And at the same time, they achieved a 60% reduction in in storage footprint, thereby reducing power requirements and cooling costs.

And perhaps most-telling, SAP themselves have replaced their existing SAP HANA HEC platform with the IBM POWER9. According to Christoph Herman, SVP and Head of SAP HANA Enterprise Cloud, “SAP HANA Enterprise Cloud on IBM Power Systems will help clients unlock the full value of SAP HANA in the cloud, with the possibility of enhancing the scalability and availability of mission-critical SAP applications while moving workloads to SAP HANA and lowering TCO.”

Summary

Whether implementing on-premises, in the cloud, or as part of a hybrid multicloud environment, the combination of IBM Power Systems and SAP HANA can deliver a high performance, cost-effective, environment for your ERP and other workloads.

Posted in analytics, Big Data, business planning, data, DBMS, ERP, IBM, In-Memory, optimization, performance, SAP HANA | Tagged , , , , , | Leave a comment

Inside the Data Reading Room – Fall 2020 Edition

Welcome to yet another edition of Inside the Data Reading Room, a regular feature of my blog where I take a look at recent data management books. In today’s post we’ll examine three new books on various data-related topics, beginning with data agglomeration.

You may not have heard of data agglomeration but you’ll get the idea immediately – at least at a high level – when I describe it as gathering data in wireless sensor networks. For more details, you’ll want to read A Beginner’s Guide to Data Agglomeration and Intelligent Sensing by Amartya Mukherjee, Ayan Kumar Panja and Nilanjan Day (Academic Press, 2020, ISBN 978-0-12-620341-5). The authors are all professors who specialize in networking, IoT, and data issues.

The book offers provides a concise treatment of the topic starting out with an overview of the various types of sensors and transducers and how they are used. I always find it easier to learn-by-example, and this book is nice because the authors provide a variety of good examples.

Reading this book will provide you with descriptions and explanations of pertinent concepts like wireless sensor networks, cloud platforms, device-to-cloud and sensor cloud architecture but more importantly, it also describes how to gather and aggregate data from wireless sensor networks.

If you or your organization are involved gathering data from sensors, such as in IoT systems, this book will be a great help to you as you design and implement your applications.

Next up from the shelves of the Data Reading Room we have Rohit Bhargava’s Non Obvious Mega Trends (IdeaPress, 2020, ISBN 978-1-64687-002-8).  

For those who do not know about this book series, every year since 2011 Rohit Bhargava has been publishing what he calls The Non Obvious Trend Report. He began writing these reports in response to the parade of annual articles talking about “the next big trends in the upcoming year,” which he found either to be too obvious (e.g. mobile phones still useful) or too self-serving (e.g. drone company CEO predicts this is the year of the drone) to be useful. In response, he created the Non Obvious Trend Report with the goal of being unbiased and digging deeper for nuances and trends missed elsewhere.

To a large extent, he succeeded. So much so that this book represents the 10th in the series. But what makes this particular book a must-have is that not only does it introduce 10 new trends, but it also documents and reviews all of the trends over the past decade.

For readers of this blog, Chapter 11, Data Abundance, will likely be the most useful chapter (although the entire book is great for research). In Chapter 11 he describes what data abundance is, how understanding it can be used to your advantage, as well as the various trends that have led to the evolution of data abundance.

I look forward to each new, annual edition of Non Obvious, but I think this year’s edition stands out as one that you will want to have on your bookshelf long-term.

The final book for today is Systems Simulation and Modeling for Cloud Computing and Big Data Applications edited by J. Dinesh Peter and Steven L. Fernandes (Academic Press, 2020, ISBN 978-0-12-620341-5).

Models and simulations are an important foundation for many aspects of IT, including AI and machine learning. As such, knowledge of them will be beneficial for data professionals and this book provides an education in using System Simulation and Modeling (SSM) for tasks such as performance testing and benchmarking.

The book analyzes the performance of several big data and cloud frameworks, including benchmarks such as BigDataBench, BigBench, HiBench, PigMix, CloudSuite and GridMix.

If you are dealing with big data and looking for ways to improve your testing and benchmarking through simulation and modeling, this book can be of help.

Posted in Big Data, book review, books, data, simulation, trends | Leave a comment