Inside the Data Reading Room – Summer 2021 Edition

As regular readers of my blog know, I periodically post about some of the computer- and data-related books that I’ve been reading in a series I like to call “Inside the Data Reading Room.”

In today’s post we’ll examine three new books on the following topics: Python, data stewardship and deep learning. Let’s dig right in and begin…

The first book is a new edition of Black Hat Python, 2nd Edition: Python Programming for Hackers and Pentesters by Justin Seitz and Tim Arnold (No Starch Press, 2021, ISBN 978-1-7185-0112-6). The authors are both long-time Python developers.

Although I did not read the first edition of this book, I am impressed with the wide range of guidance and sample for using Python to create useful tools for security professionals. And make no mistake, the book is not geared toward beginners looking for a how-to learn Python book, but is designed for security analysts who use Python. The book is less than 200 pages long, but it contains a wealth of useful security techniques on topics ranging from exfiltration, web-hacking, forensics, Windows privileges, and more.

The code and examples in this edition of the book has been updated to Python 3.x, so it can be relied upon to be up-to-date. If you work as a computer security professional and want to code in Python, this is definitely a book that belongs on your bookshelf.

Moving along, the second book I’ll talk about today is also a second edition, Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance by David Plotkin (Academic Press, 2021, ISBN: 978-0-12-822132-7). If you work in the field of data management, chance are you know of David Plotkin. He has worked in the field for decades and has written extensively on data, metadata, and related issues.

With the second edition of his book, Plotkin has written the quintessential book on data stewardship. He defines data stewardship and how it fits together with data governance, delivering best practices, policies, and procedures that make sense. The book offers up practical advice on implementing data stewardship and defines the roles and responsibilities of the data steward. But he also devotes a chapter offering guidance on training for data stewards, and another chapter on the metrics that can be used to measure the performance of data stewards.

And if your organization is new or just starting to implementing data governance and stewardship, then you absolutely must read Chapter 4, “Implementing Data Stewardship.” It will step you through the process of championing, communicating, and gaining support for data stewardship.

If your organization is serious about data governance and data stewardship, this book belongs in your company library.

Finally, let’s turn our attention to the third and final book I’ll be talking about today, Deep Learning for Data Analytics: Foundations, Biomedical Applications, and Challenges edited by Himansu Das, Chittaranjan Pradhan, and Nilanjan Dey (Academic Press, 2020, ISBN: 978-0-12-819764-6). The editors are all computer science professors at different colleges and universities with a background in analytics and machine learning.

This book is not for the novice. It focuses on advanced models, architectures, and algorithms in deep learning, which is a branch of AI and machine learning. It is probably most useful for university students studying AI, machine learning, and deep learning, as opposed to a book for the masses.

However, if you are interested in learning about applications of deep learning across a variety of different subject areas, this book offers a focused study on the design and implementation of deep learning concepts using data analytics techniques in large scale environments.

That’s all for now… and I wish you countless joyful hours reading about data and related technology until the next time we go inside the data reading room!

Posted in AI, analytics, book review, books, data governance, Machine Learning | Leave a comment

World Backup Day – March 31, 2021

Today, March 31, 2021, is World Backup Day. With that in mind, let me use today’s blog post to remind you of the importance of backing up your databases… and testing your backups!

Of all the DBA roles and responsibilities the recoverability of your organization’s data is perhaps the most important. This means that you should understand the recovery requirements of all the databases you manage. This can only be established by communicating with the end users of the data stored in the database.

As you work with the consumers of the data to map out your backup plans, you will need to balance two competing demands: the need to take image copy backups frequently enough to assure reasonable recovery time, and the need to not interrupt daily business. The DBA must be capable of balancing these two objectives based on usage criteria and the capabilities of the DBMS.

Of course, some data may not need to be backed up at all… but this is rare. Data of this type is recreated or loaded to the database periodically. For example, if the data gets refreshed from another source you may not really need to worry about backing it up. Of course, you should always verify such things before just moving ahead without a backup plan!

To plan and establish a viable backup strategy and schedule, you must analyze your databases and data to determine their nature and value to the business. To do so, answer the following questions for each database object.

  • How much daily activity occurs against the data?
  • How often does the data change?
  • How critical is the data to the business?
  • Can the data be recreated easily?
  • What kind of access do the users need? Is 24/7 access required?
  • What is the cost of not having the data available during a recovery? What is the dollar value associated with each minute of downtime?

You can use the answers to these questions to work out a reasonable backup strategy and implement it.

But before ending this post, I want to circle back to a comment I made near the beginning of the post. And that is “test your backups before you need them for a production recovery!” The last thing you want to happen is to have made the wrong (or invalid) backups only to need them to recover from a problem. So put in place a strategy for testing your backups periodically… or risk having them not work properly when you need them!

I can think of no better high-level advice on World Backup Day than what I’ve outlined here… take some time, evaluate your DBA priorities, and be sure to review your database backup and recovery plans… you’ll be glad you did should the unthinkable occur!

Posted in backup & recovery | Leave a comment

Database Fundamentals

I would guess that most of the readers of this blog understand the basic concepts and fundamentals of database technology. However, many folks who think they understand the basics often do not have the knowledge and understanding they believe they have. Therefore, today’s post serves as a very brief introduction to the fundamentals of database management systems.

What Is a Database?

The answer to this question may surprise some readers. Oracle is not a database; neither are Db2, PostgreSQL, MongoDB, MySQL, or SQL Server. Each of these is a DBMS, or database management system. You can use Oracle or Db2 or SQL Server to create a database, but none of these themselves are databases. Many people, even skilled professionals, confuse the overall system – the DBMS – with the creation of the system – databases.

So, what is a database? A database is a structured set of persistent data. A phonebook is a database. However, within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same set of fields where each field is a certain data type and length. In short, a database is an organized store of data where the data is accessible by named data elements.

A DBMS is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving, and storing information in a database. DBMS products are usually responsible for data integrity, data access control, automated rollback, restart and recovery.

Thinking abstractly, you might think of a database as a file folder, and a DBMS as the file cabinet holding the labeled folders. You implement and access database instances using the capabilities of the DBMS.  Your payroll application uses the payroll database, which may be implemented using a DBMS such as Oracle Database 21c, Db2, MongoDB, or SQL Server.

Why is this distinction important? Using precise terms in the workplace avoids confusion. And the less confused we are the more we can avoid problems and issues that lead to over-budget projects, improperly developed systems, and lost productivity. Therefore, precision should be important to all of us.

Why Use a DBMS?

The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations.

Historically, there are four DBMS data models: hierarchical, network, relational. and object-oriented.

A DBMS can be distinguished by the model of data upon which it is based. A data model is a collection of concepts used to describe data. A data model has two fundamental components: its structure, which is the way data is stored, and its operations, which is the way that data can be manipulated. The major DBMS products utilize four different data models:

  1. Network (or CODASYL)
  2. Hierarchical
  3. Relational
  4. Object-oriented

The network data model is structured as a collection of record types and the relationships between these record types. All relationships are explicitly specified and stored as part of the structure of the DBMS. Another common name for the network model is CODASYL. CODASYL is named after the Conference on Data Systems Languages, the committee that formulated the model in the early 1970s. Data is manipulated using the location of a given record and following links to related records. IDMS is an example of a DBMS based on the network model.

The hierarchical data model arranges data into structural trees that store data at lower levels subordinate to data stored at higher levels. A hierarchical data model is based on the network model with the additional restriction that access into a record can only be accomplished in one way. IMS is an example of a DBMS based on the hierarchical model.

The relational data modelconsists of a collection of tables (more properly, relations) wherein the columns define the relationship between tables. The relational model is based on the mathematics of set theory. Contrary to popular belief, the relational model is not named after “relationships,” but after the relations of set theory. A relation is a set with no duplicate values. Data can be manipulated in many ways, but the most common way is through SQL. DB2, Oracle, and SQL Server are examples of DBMS products based on the relational model.

The object-oriented (OO) data modelconsists of a collection of entities, or objects, where each object includes the actions that can take place on that object. In other words, an object encapsulates data and process. With OO systems, data is typically manipulated using an OO programming language. Progress Software’s ObjectStore and Intersystems’ Cache are examples of DBMS products based on the OO model.

Each of these four data models is referred to as a data model for the sake of simplicity. In reality, only the relational and network models have any true, formal data model specification. Different models of data lead to different logical and structural data organizations. The relational model is the most popular data model because it is the most abstract and easiest to apply to data, while providing powerful data manipulation and access capabilities.

Other Types of DBMS

Although the four data models discussed heretofore are the predominant types of DBMS, there are other types of DBMS with varying degrees of commercial acceptance.

A column-oriented DBMS, sometimes called a column store, is a  DBMS that stores its content by column rather than by row. This has advantages for data warehouses where aggregates are computed over large numbers of data items. Of course, a column-oriented DBMS is not based on any formal data model and can be thought of as a special physical implementation of a relational DBMS. Sybase IQ and Greenplum are examples of column stores.

The NoSQL database system is another type of DBMS that has gained traction in the market, usually in Big Data applications. NoSQL DBMSes are characterized by their flexible schema and non-reliance on SQL, however many NoSQL offerings have added support for SQL due to its ubiquity. With Key-Value data stores, a piece of data is associated with a key. The data is not rigidly structured and does not have to conform to a schema such as in a typical database design.

There are four types of NoSQL DBMS products:

  1. Document
  2. Key/Value
  3. Wide-column
  4. Graph

A document store manages and stores data at the document level. A document is essentially an object and is commonly stored as XML, JSON, BSON, etc. A document database is ideally suited for high performance, high availability, and easy scalability. You might consider using a document store for web storefront applications, real-time analytical processing, or to front a blog or content management system. They are not very well-suited for complex transaction processing as typified by traditional relational applications, though. MongoDB is the most popular document database, but others include Couchbase, RavenDB and MarkLogic.

The key/value database system is useful when all access to the database is done using a primary key. There typically is no fixed data model or schema. The key is identified with an arbitrary “lump” of data. A key/value pair database is useful for shopping cart data or storing user profiles. It is not useful when there are complex relationships between data elements or when data needs to be queried by other than the primary key. Examples of key/value stores include Riak, Berkeley DB, and Aerospike.

A wide-column store uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. Examples of wide-column stores include Apache Cassandra, Amazon DynamoDB, and DataStax Enterprise.

Finally, we have the graph database, which uses graph structures with nodes, edges, and properties to represent and store data. In a graph database every element contains a direct pointer to its adjacent element and no index lookups are necessary. Social networks, routing and dispatch systems, and location aware systems are the prime use cases for graph databases. Some examples include Neo4j, GraphBase, and Meronymy.

NoSQL database systems are popular with organizations that face different data challenges than can be solved using traditional RDBMS solutions. Cassandra and MongoDB are examples of NoSQL key-value database systems.

There are other DBMS implementations as well, such as the inverted list structure of the original Adabas and even the dBase format popularized by the PC DBMS, dBase II and dBase III.

Most commercial DBMS implementations today are relational.

Advantages of Using a DBMS

Additionally, a DBMS provides a central store of data that can be accessed by multiple users, from multiple locations. Data can be shared among multiple applications, rather than having to be propagated and stored in new files for every new application. Central storage and management of data within the DBMS provides

  • Data abstraction and independence.
  • Data security.
  • A locking mechanism for concurrent access
  • An efficient handler to balance the needs of multiple applications using the same data
  • The ability to swiftly recover from crashes and errors
  • Robust data integrity capabilities
  • Simple access using a standard API
  • Uniform administration procedures for data

Levels of Data Abstraction

A DBMS can provide many views of a single database schema. A view defines what data the user sees and how that user sees the data. The DBMS provides a level of abstraction between the conceptual schema that defines the logical structure of the database and the physical schema that describes the files, indexes, and other physical mechanisms used by the database. Users function at the conceptual level—by querying columns within rows of tables, for example—instead of having to navigate through the many different types of physical structures that store the data.

A DBMS makes it much easier to modify applications when business requirements change. New categories of data can be added to the database without disruption to the existing system.

Data Independence

A DBMS provides a layer of independence between the data and the applications that use the data. In other words, applications are insulated from how data is structured and stored. The DBMS provides two types of data independence:

  • Logical data independence—protection from changes to the logical structure of data
  • Physical data independence—protection from changes to the physical structure of data

As long as the program uses the API (application programming interface) to the database as provided by the DBMS, developers can avoid changing programs because of database changes.

Note: The primary API to relational databases is SQL. In general, most application SQL statements need not change when database structures change (e.g., a new column is added to a table).

Data Security

Data security prevents unauthorized users from viewing or updating the database. The DBMS uses IDs and passwords to control which users are allowed access to which portions of the database. For example, consider an employee database containing all data about individual employees. Using the DBMS security mechanisms, payroll personnel can be authorized to view payroll data, whereas managers could be permitted to view only data related to project history.

Concurrency Control

A DBMS can serve data to multiple, concurrently executing user programs. This requires a locking mechanism to deliver concurrency control because the actions of different programs running at the same time could conceivably cause data inconsistency. For example, multiple bank ATM users might be able to withdraw $100 each from a checking account containing only $150. A DBMS ensures that such problems are avoided because the locking mechanism isolates transactions competing for the same exact data.

Database Logging

The DBMS uses database logging to record “before” and “after” images of database objects as they are modified. It is important to note that the database log captures information about everydata modification (except in circumstances as determined by the DBA). The information on the database logs can be used to undo and redo transactions. Database logging is handled transparently by the DBMS—that is, it is done automatically.

Ensuring Atomicity and Durability

A DBMS can be used to assure the all-or-nothing quality of transactions. This is referred to as atomicity, and it means that data integrity is maintained even if the system crashes in the middle of a transaction. Furthermore, a DBMS provides recoverability. After a system failure, data can be recovered to a state that existed either immediately before the crash or at some other requisite point in time.

Data Integrity

The DBMS provides mechanisms for defining rules that govern the type of data that can be stored in specific fields or columns. Only data that conforms to the business rules will ever be stored in the database. Furthermore, the DBMS can be set up to manage relationships between different types of data and to ensure that changes to related data elements are accurately implemented.

Data Access

A DBMS provides a standard query language to enable users to interactively interrogate the database and analyze its data. For relational databases, this standard API is SQL, or Structured Query Language. However, SQL is not a requirement for a DBMS to be relational. Furthermore, many DBMS products ship with analytical tools and report writers to further simplify data access.

Summary

This section on DBMS fundamentals is necessarily brief because the focus of this book is on database administration and most readers will find this material to be familiar. If you require additional details on the basic operations and qualities of DBMSs and databases, please refer to the Bibliography for an extensive list of DBMS-related books. My favorites include:

A primary benefit of a DBMS is its ability to maintain and query large amounts of data while assuring data integrity and consistency. It offers transparent recovery from failures, concurrent access, and data independence. In fact, most modern computer applications rely on DBMS and database technology to manage data. Understanding the topic is of benefit to all IT professionals.

Posted in DBMS, education | Leave a comment

A Cold Spell in Texas Makes Me Think About Contingency Planning and the DBMS

Whenever there is an event in the news like the winter storm that wreaked havoc on Texas last week, it makes me think about contingency planning and things like database disaster recovery. The paltry amount of snow we got in Texas probably makes folks “Up North” chuckle, but it was a real problem because Texas almost never gets snow nor does it get as cold for as many days in a row as it did last week. So my first thought is that what qualifies as a “disaster” will differ based on your location and circumstances.

Anyway, when a “disaster” like this hits it is a good time to review your disaster contingency plans… well, before the disaster would have been better, but being human, the wake of a disaster always causes awareness to rise, so let’s discuss disaster recovery in terms of databases.

A disaster recovery plan is like insurance — you’re glad you have it, but you hope you don’t need it. With insurance, you pay a regular fee so that you are covered if you have an accident. A disaster recovery plan is similar because you pay to implement your disaster recovery plan by designating a disaster recovery site, shipping backup copies of the data off-site, preparing recovery jobs, and practicing the recovery procedures.

Database disaster recovery must be an integral component of your overall business recovery plan. A disaster recovery plan must be global in scope. It must handle business issues such as alternate locations for conducting business, communication methods to inform employees of new locations and procedures, and publicity measures to inform customers how to transact business with the company post-disaster. A component of that plan must be the overall plan for resuming data processing and IT operations. And finally, a component of that plan is the resumption of DBMS operations.

In order for your database disaster recovery plan to be effective, you will need to develop and adhere to a written plan. This plan must document all of the routine precautionary measures required to assure the recoverability of your critical data in the event a disaster occurs. Image copy backups or disk backups need to be made as directed and sent to the remote site as quickly as possible. Reports need to be printed and sent off-site. Missing any little detail can render a disaster recovery plan ineffective.

When practicing the disaster recovery plan, make sure that each team member follows the written instruction precisely. Of course, it is quite liklely that things will come up during the practice sessions that were missed or undocumented in the plan. Be sure to capture all of these events and update the written plan after the disaster recovery test. Keep in mind that during an actual disaster you may need to rely on less experienced people, or perhaps consultants and others who are not regular employees. The more failproof the written plan can be the better the chance for a successful disaster recovery will be.

Your disaster recovery procedures will be determined in large part by the method you use to back up your data. If you rely on pack backups, then your recovery will be one disk volume at a time. If you create database image copies, you will probably use the DBMS’s recover utility or a third party recover tool. Of course, you might combine several different techniques for off-site backups depending on the sensitivity and criticality of the data.

The following tips can be helpful as you develop or review your database contingency plans:

Order of Recovery

Make sure the operating system and DBMS are installed at the correct version and maintenance level before proceeding with any database object recovery at the disaster site. Be sure to follow the recovery steps rigorously as documented in the written plan.

Data Latency

How old is the data? If you take nightly backup tapes to another location, your data could be up to 24 hours old. Sometimes having data that old is unacceptable, but sending backup media to off-site storage more than once a day is too expensive. One solution is to get the data to another location digitally—via log shipping or replication, for example. Database logs at the time of the disaster may not be available to apply at the off-site recovery location. Some data may not be fully recoverable and there is really no way around this. The quicker backup copies of database objects and database logs are sent off-site, the better the disaster recovery will be in terms of data currency.

Remember Other Vital Data

Creating offsite backups for database objects may not be sufficient to ensure a complete disaster recovery plan for each application. Be sure to back up related data and send it offsite as well. Additional data and files to consider backing up for the remote site include DDL libraries for database objects, recovery and test scripts, application program source and executable files, stored procedure program source and executable files, user-defined function source and executable files, libraries and passwords for critical third party DBA tools, and other related data files used by the application.

Beware of Compression

If your site uses tape-compression software, be sure that the remote recovery site uses the same tape-compression software. If it does not the image copy backups will not be readable at the remote site. Turn off compression at the primary site for the disaster recovery image copy backups if the remote site cannot read compressed tape files.

Post-Recovery Image Copies

Part of the disaster recovery process should be to create an image copy backup for each database object after it has been recovered at the remote site. Doing enables easier recoverability of the data should an error occur after processing begins at the remtoe site. Without the new image copy backups, the disaster reocvery procedure would have to be performed again if an error occurs after remote site processing begins.

Disaster Prevention

DBAs and IT professionals in general create procedures and enforce policies. Many of these procedures and policies, such as a disaster recovery plan, are geared toward dealing with errors once they occur. Having such procedures and policies is wise. But it is just as wise to establish procedures and policies to prevent problems in the first place. Although you cannot implement procedures to stop an earthquake or flood, you can implement policies to help avoid man-made disasters. For example, enforce frequent password changes to mitigate data loss due to malicious hackers.

Another good idea is to document and diseminate procedures to end users teaching them how to deal with error messages. For example, you cannot expect every user to understand the impact of responding to every error message. Guidelines can help avoid errors – and man-made disasters.

Summary

Only with comprehensive up-front planning, regular testing, and diligent maintenance will a disaster recovery plan be useful. Be sure you have one for your site… or your disaster recovery plan might become a two-step process in the event of a disaster:

1) Update resume

2) Go job-hunting!

Posted in backup & recovery, contingency planning | Leave a comment

DBA Automation

Just a quick blog post today to point you at a couple of other resources that I have created on the topic of automating database administration.

First I’d like to point you to a webinar that I delivered earlier this month (January 2021) titled Automating Database Administration Is No Longer Optional.

You can view the presentation at the link above, in which I discuss data management trends (data growth, DevOps, heterogeneity, complexity, staffing), and how they are driving the need for more automation of DBA tasks and processes. These clear industry trends equate to a demanding, complex environment for managing and administering database systems. Yet DBAs are still required to monitor, optimize, tune, backup, recover, change structures, and move data both on premises and to the cloud. To be successful requires intelligent automation of DBA tasks and procedures.

I also recently published a blog post for IDERA under the same title discussing these issues for those who are not interested in watching the entire presentation.

At any rate, I hope you’ll take a look at one (or both) of these and then weigh in with your comments on automating DBA tasks. I want to hear how you are automating, whether you need more (or less?) DBA automation, how it impacts your job requirements, and anything else you’d like to add. I’d really like to get a conversation going about the need for automation, and perhaps more importantly, the progress you and your organization has made toward DBA automation.

Posted in automation, DBA | Leave a comment

Seasons Greetings 2020

Hello everybody out there in data-land! As always, a big “Thank you” for being a reader of my blog. I hope that you will continue reading in 2021 and that the upcoming year is better for all of us than this year…

And as per my custom, here’s a short post to end the year wishing everybody “out there” a very happy holiday season!

No matter which holidays you celebrate, I hope they bring you joy, contentment, and let you recharge for an even better year next year…

And after the debacle that was 2020, how could 2021 be anything but an improvement!

Happy, Merry, Joyous holidays… see you next year!

Posted in DBA | Leave a comment

What is the Autonomous Digital Enterprise?

Modern organizations are transforming their businesses by adopting and integrating digital technologies and services into the fabric of their operations. This is generally referred to as “digital transformation.” And it is an imperative for success as more business is conducted online, over the web, and using personal devices such as phones and tablets. Businesses have to engage with customers using the same technology and interfaces that their customers use all the time or they risk losing revenue.

But digital transformation is not, in and of itself, sufficient to ensure success. BMC Software has identified what they call the autonomous digital enterprise as the next step on the journey. This means embracing and instilling intelligent automation capabilities throughout the organization.

In an autonomous digital enterprise, automation is a complementary business function that works with – not in place of – humans. By exploiting automation organizations can:

  • Execute with fewer errors
  • Free up employees from mundane tasks
  • Lower costs
  • Improve customer interaction

Note here the term “intelligent” being used with the term “automation.” IT professionals have been automating for a long time. Indeed, everything that IT folks do can be considered some form of automation when compared to manual tasks. But intelligent automation takes things further. It relies on data to drive decision-making, reducing the amount of time it takes to react, thereby reducing latency and saving time.

With artificial intelligence and machine learning capabilities being coupled with automation, the accuracy and ability of automation to intuit what needs to be done improves. As does the agility to effectively implement improvements and correctives measures.

Intelligence enables organizations to transcend impediments to success. For example, in an autonomous digital enterprise, DevOps practices are integrated throughout the enterprise enabling rapid and continuous delivery of applications and services. This requires technology and intelligent automation, but also a shift in organizational mindset to embrace change and instill it everywhere in the company.

An autonomous digital enterprise will have automation everywhere… intelligent automation that improves the customer experiences, speeds up decision-making and implementation, and interacts with customers the way they expect.

This vision if the autonomous digital enterprise is both audacious and compelling – and it is well worth examining for your organization.

Posted in AI, automation, DevOps, digital transformation, enterprise computing | Leave a comment

Data Summit Fall 2020 Presentation Now Available for Replay: Modern Database Administration

I was honored to deliver a presentation at this year’s Data Summit conference on the changing world of the DBA. I spoke for about a half hour on Thursday, October 22nd on DBA and database systems trends and issues.

The conference sessions were conducted live, but were recorded as they were delivered. And now my session can be viewed here!

I hope you take a moment to watch the presentation and consider the issues I bring up.

And, if you are interested in useful tools that help with the trends I discuss, stay around after my presentation (which does not talk about any particular vendor tools) to hear a strategist from Quest give their perspective on the issues and their DBA tools that can help.

Finally, I hope you will comment here on the blog if you have any questions, comments, or additional issues you’d like to discusss!

Posted in AI, analytics, Big Data, data, data breach, Data Growth, DBA, DBMS, DevOps, IoT, Machine Learning, review, speaking engagements, tools, trends | Leave a comment

Craig Mullins Presenting at Data Summit Fall 2020

Keeping abreast of what is happening in the world of data management and database systems is important for all IT professionals these days. Data is at the center of everything that modern organizations do and the amount of data we have to store, manage, and access is growing at an ever-increasing pace. It can be difficult to keep up with it all.

If you find yourself in need of being up to speed on everything going on in the world of data, you should plan on attending the Data Summit Fall 2020 virtual event. Held in person in years past, this year the event is offered as a free online webinar series running from October 20 thru 22, 2020.

logo2019

And this year I will be speaking again at the event, but hopefully more of you will be able to attend than in years past, since there is no travel involved! My presentation will be on the changing world of the DBA (Thursday, October 22nd, at Noon Eastern time. I’ll discuss how the DBA’s job is impacted by big data and data growth, NoSQL, DevOps, the cloud, and more.

I hope to see you there!

Posted in DBA | Leave a comment

COBOL V4.2 to COBOL V6 Migration – The Cost of Procrastination

Today’s post is a guest article written by Dale Vecchio, IT modernization expert and former Gartner analyst.

While no one can argue that the COBOL language has had tremendous staying power over the last 50-60 years, its biggest attribute these days is best summed up in the expression “leave well enough alone”! Yeah, COBOL is here and the applications still work. But the costs of staying wedded to this 3GL procedural language are increasing. As COBOL 4.2 reaches end-of-support, the conversion to COBOL V6 is a non-trivial exercise. Even IBM admitted as much in a 2018 presentation, “Migrating to Enterprise COBOL v6”. Organizations have been upgrading their COBOL versions for a number of decades, but this jump seems particularly onerous. For example, IBM reports that a customer perceived migrating from COBOL V3 to V4 had a difficulty level of “3”, while upgrading from V4 to V6 had a difficulty level of “20”!! Of course, there are improvements in COBOL V6, but they come at a price. Any mainframe organization that is not at current hardware/software levels may find they need to upgrade just to be able to support this version of COBOL. COBOL v6, by IBM’s own admission will require 20x more memory at compile time and will take 5x to 12x longer to compile! But probably most problematic is that 25% of customers migrating to COBOL v6 ran into migration difficulties dues to “invalid data”. One of the many challenges of mainframe modernization is that organizations either “cheated” or simply got away with “unsupported features” in COBOL. Earlier versions of COBOL programs may have accepted data formats that are no longer “valid” in v6. These problems are the most difficult to find, since the program MAY appear to work, but generate wrong results. The best that could happen is the program will fail and then your limited development staff can “go fishing” in a 30-40 year old COBOL program trying to figure out what the heck the problem is!! IBM’s view on this seems to be, “well, you created the problem so you fix it!” The amount of effort necessary to migrate to v6 is greatly exacerbated by this data problem, since it is likely to dramatically increase the testing needed.

Consequently the entire argument that it’s “safer” to stay on COBOL and just upgrade is a specious one. Perhaps the most common modernization strategy of the last 20 years, procrastination, is no longer a viable choice! Prolonging the usage of a procedural, 3-GL language, against the backdrop of a declining skills pool is increasingly risky. I can assure you that many organizations I have spoken to around the world over the last 20 years have the DESIRE to modernization, on or off the mainframe, but the risks and costs have been seen as simply too high. These migration risks are quickly becoming balanced by the risks of NOT modernizing. The modern IT world is increasingly one of Linux, cloud, open source and Java. The innovation is in these areas. The skills are in these areas. No one is saying anything bad about the mainframe here – only that there are acceptable options for running enterprise workload that do NOT require the legacy OS or transactional environments of the past.

While Java is not the only path to a modern IT application environment, it is certainly one of the most common. So the trick is to figure out how to move in that direction, while mitigating the risks. If you are going to have to invest in some of your COBOL applications, why not evolve to a modern Linux world? There are plenty of issues to deal with when modernizing applications, so reducing the risks in some areas is a good idea. Easing your applications into a modern devops environment that is plentiful with skilled developers is a worthwhile investment. You don’t have to modernize every COBOL application any more than you need to upgrade everyone to v6! Modernization is a journey, but you’ll never reach your destination if you don’t take the first step. Code transformation solutions that give you decent , performant Java programs, that can be managed by a devops tool chain, and enhanced by Java developers are a worthwhile consideration. Code transformation solutions that are syntactic, line-by-line transformations are NOT the answer – ones that refactor COBOL in Java classes and methods are! Let’s be realistic – someof your COBOL applications have very little enhancements made annually. If you can get them transformed into Java, and they can then take advantage of the cost benefits of these runtime environment, whether on the mainframe (specialty engines) or off, your modernization journey is off to a good start.

To listen to a webinar discussing this topic, go to https://youtu.be/2b8XrOovHn4

Posted in DBA | Tagged , , | 1 Comment