Inside the Data Reading Room – New Year 2016 Edition

Regular readers of my blog know that I periodically take the time to review recent data-related books that I have been reading. This post is one of those blogs!

Today, I will take a quick look at several books that I think you will enjoy, starting with Repurposing Legacy Data: Innovative Case Studies by Jules J. Berman (Elsevier, ISBN 978-0-12-802882-7). This short book offers up a quick read and delivers on the promise of its title. It leads the reader through example case studies showing how organizations can take advantage of their “old” data. In the day and age of Big Data and Data Science the techniques and tactics explored in this fine book are worth investigating and further.

Next up is a book that tackled MDM titled Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice by Mark Allen and Dalton Cervo (Morgan Kaufmann, ISBN 978-0-12-800835-5). Allen and Cervo offer up practical implementation guidance using hands-on examples and guidelines for ensuring productive and successful multi-domain MDM. Along the way you’ll learn how to improve your data quality, lower your maintenance costs, reduce risks, and improve data governance. There is a complimentary companion site for the book that offers additional MDM reference and training materials.

I’ve also enjoyed reading Cognitive Computing and Big Data Analytics by Judith Hurwitz, Marcia Kaufman, and Adrian Bowles (Wiley, ISBN 978-1-118-89662-4).  The book does a good job of instructing readers on cognitive computing, from the basics of what it is, its various components (e.g. machine learning, natural language processing, etc.), its growth due to the rise of big data analytics , and examples of projects showing how it works and it promise. As an IBM supporter I particularly enjoyed the chapter in IBM Watson. But really, the entire book is worthwhile and if you have any interest at all in how computers can gain cognitive capabilities, you should pick up a copy of this book.

Finally, for today, we have a DevOps book by the title of DevOps: A Software Architect’s Perspective by Len Bass, Ingo Weber and Liming Zhu (Addison Wesley, ISBN 978-0-13-404984-7). DevOps is a somewhat new movement espousing collaboration and communication between software developers and those providing operational and administrative IT support. The word is a combination of DEVelopment and OPerations, and there is a lot of hype out there about DevOps. This book does a reasonable job of explaining the concept of DevOps (frankly, I am not one of the people who thinks it is really a monumental change) and how it can benefit your organization. If you’ve been with IT for sometime, do not expect to be wowed with new information. Instead, the authors do a credible job of explaining DevOps and a lot of development/administration best practices.

That’s it for today. If you’ve read any of these books please leave a comment with your thoughts… and let me know if there are any books you’d like to see reviewed in future editions of Inside the Data Reading Room here on the Data & Technology Today blog!

Posted in analytics, Big Data, book review, books, Data Quality, legacy data, MDM | 1 Comment

Keeping Up With the DBMS

 

One of the more troubling trends for DBAs is keeping up with the latest version of their DBMSs. Change is a fact of life and each of the major DBMS products change quite rapidly. A typical release cycle for DBMS software is 18 to 24 months for major releases with constant bug fixes and maintenance delivered in between major releases. Indeed, keeping DBMS software up-to-date can become a full-time job.

The troubling aspect of DBMS release migration these days is that increasingly, the majority of organizations are not on the most recent version or release of the software. Think about it. The most recent version of Oracle is Database 12c, but many organizations have yet to migrate to it even though it was released in July 2013. Things are much the same for Microsoft SQL Server and IBM DB2 users, too. For example, many mainframe organizations are running DB2 10 for z/OS (and even older, unsupported versions) instead of being up-to-date on DB2 11 for z/OS (which was released in October 2013).

This happens for many reasons including the desire to let others work out the inevitable early bugs, the lack of compelling new features that would drive the need to upgrade immediately, and lack of time to adequately upgrade as often as new releases are unleashed on us.

The DBA team must develop an approach to upgrading DBMS software that conforms to the needs of their organizations and minimizes the potential for disrupting business due to outages and database unavailability.

You may have noticed that I use the terms version and release somewhat interchangeably. That is fine for a broad discussion of DBMS upgrading, but a more precise definition is warranted. Versions typically are very broad in scope, with many changes and new features. A release is typically minor, with fewer changes and not as many new features. But DBAs must meticulously build implementation plans for both.

In many cases, upgrading to a new DBMS version can be treated as a special case of a new installation. All of the procedures required of a new installation apply to an upgrade: you must plan for appropriate resources, you need to reconsider all system parameters, and you need to ensure that all supporting software is appropriately connected. But there is another serious issue that must be planned for, and that is existing users and applications. An upgrade needs to be planned so as to cause as little disruption to the existing users as possible. Therefore, upgrading can be a tricky and dificult task.

Keeping the DBMS running and up-to-date without incurring significant application outages requires an on-going effort that will consume many DBA cycles. The approach undertakan must conform to the needs of their organization, while at the same time minimizing business impact and avoiding the need to change applications.

Upgrading to a new DBMS release offers both rewards and risks. By moving to a newer DBMS release developers will be able to use the new features and functionality delivered in the new release. For purchased applications, you need to be cognizant of the requirements of application releases on specific DBMS versions. Additionally, new DBMS releases tend to deliver enhanced performance and availability features that can optimize existing applications. Often the DBMS vendor will provide better support and respond to problems faster for a new release of their software. DBMS vendors are loath to allow bad publicity to creep into the press about bugs in a new and heavily promoted version of their products. Furthermore, over time, DBMS vendors will eliminate support for older versions and DBAs must be aware of the support timeline for all DBMSs they manage.

An effective DBMS upgrade strategy will balance the benefits against the risks of upgrading to arrive at the best timeline for migrating to a new DBMS version or release. An upgrade to the DBMS almost always involves some level of disruption to business operations. At a minimum, as the DBMS is being upgraded databases will not be available. This can result in downtime and lost business opportunities if the DBMS upgrade has to occur during normal business hours (or if there is no planned downtime). Other disruptions can occur including the possibility of having to convert database structures, the possibility that previously supported features were removed from the new release (thereby causing application errors), and delays to application implementation timelines.

The cost of an upgrade can be a significant barrier to DBMS release migration. First of all, the cost of the new version must be planned for (price increases for a new DBMS version can amount to as much as 10 to 25 percent). You also must factor in the costs of planning, installing, testing, and deploying not just the DBMS but also any applications using databases. Finally, be sure to include the cost of any new resources (memory, storage, additional CPUs, etc.) required by the DBMS to use the new features delivered by the new DBMS version.

Also, in many cases the performance benefits and improvements implemented in a new DBMS release requires the DBA or programmers to apply invasive changes. For example, if the new version increases the maximum size for a database object, the DBA may have to drop and re-create that object to take advantage of the new maximum.

Another potential risk is the possibility that supporting software products may lack immediate support for a new DBMS release. Supporting software includes the operating system, transaction processors, message queues, purchased application, DBA tools, development tools, and query and reporting software.

And we haven’t even touched on applying maintenance to the DBMS. Maintenance and fixpacks occur frequently and can consume a LOT of DBA time and effort. Some companies have even begun to contract with DBA services companies to handle their maintenance and fixpack planning and implementation.

The bottom line is that keeping up with new DBMS releases and functionality has become a very significant component of the DBA’s job.

Posted in change management, DBMS, fixpacks, maintenance | 1 Comment

Data Technology Today – 2015 in review

The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 54,000 times in 2015. If it were a concert at Sydney Opera House, it would take about 20 sold-out performances for that many people to see it.

Click here to see the complete report.

Posted in DBA | 1 Comment

Happy Holidays 2015

Just a short post to end the year wishing all of my readers everywhere a very happy holiday season – no matter which holidays you celebrate, I hope they bring you joy, contentment, and let you recharge for an even better year next year!

happy-holidays

So enjoy the holidays and come in January when we continue to explore the world of data and database technology…

Posted in DBA | 1 Comment

Using SQL to Count Characters


If you write SQL on a regular basis, it is very important to know the functions that are supported by your DBMS. In general, there are three types of built-in functions that can be used to transform data in your tables:

  • Aggregate functions, sometimes referred to as column functions, compute, from a group of rows, a single value for a designated column or expression.
  • Scalar functions are applied to a column or expression and operate on a single value.
  • Table functions can be specified only in the FROM clause of a query and return results resembling a table.

Understanding the built-in functions available to you can make many coding tasks much simpler. Functions, many times, can be used instead of coding your own application program to perform the same tasks. You can gain a significant advantage using built-in functions because you can be sure they will perform the correct tasks with no bugs… as opposed to your code which requires time to code, stringent debugging, and in-depth testing. This is time you can better spend on developing application specific functionality.

At any rate, I was recently asked how to return a count of specific characters in a text string column. For example, given a text string, return a count of the number of commas in the string.

This can be done using a combination of two scalar functions, LENGTH and REPLACE, as shown here:

SELECT LENGTH(TEXT_COLUMN) - LENGTH(REPLACE(TEXT_COLUMN, ',' ''))

The first LENGTH function simply returns the length of the text string. The second iteration of the LENGTH function in the expression returns the length of the text string after replacing the target character (in this case a comma) with a blank.

So, let’s use a string literal to show a concrete example:

SELECT LENGTH('A,B,C,D') - LENGTH(REPLACE('A,B,C,D', ',', ''))

This translates into 7 – 4… or 3. And, indeed, there are three commas in the string.

When confronted with a problem like this it is usually a good idea to review the list of built-in SQL functions to see if you can accomplish your quest using SQL alone.

Posted in DBA, functions, SQL | 1 Comment

New IT Salary Details from TechTarget

TechTarget conducts an annual IT Salary and Careers Survey regarding salaries for IT technicians and executives, and their most recent salary survey for 2015 shows some heartening results for those of us who toil in the IT ranks. The survey was conducted from June to September 2015 and there were 1,783 U.S. respondents.

The average base salary for all respondents, regardless of position, came in at $100,333, and the average total compensation (salary plus bonuses) was about 10 percent higher at $110,724. So the average base salary of IT professionals is a six figure number, which is whole lot better than many other industries these days.

What about the details? Well, I leave it to you to click over to the detailed article on the TechTarget site… but since many of the readers of this blog are DBAs, here are the TechTarget results for database administrators:

  • DBA average base salary 2015: $102,437
  • DBA average total compensation (salary+bonus) 2015: $108,661

Of course, as with all salary details, the exact salary numbers will vary by geography and experience level.

Posted in DBA, salary | 1 Comment

My New Series of Articles on Data Warehouse at TechTarget

Just a short post today to alert you to a new series of articles that I am writing on data warehouse platforms for the SearchDataManagement portal at TechTarget. I will be writing four articles discussing data warehousing architecture, platforms, products, and trends. I will also be authoring ten accompanying product overviews for leading data warehouse DBMSes, DWaaS providers, and data warehouse appliances.

Check out the first article in the series here:

The benefits of deploying a data warehouse platform

Posted in Big Data, data warehouse | 2 Comments

Teradata: Thinking Big with New Wave of Data Offerings

This week I am at the Teradata Partners 2015 conference in Anaheim, California and not only is there a lot of useful and interesting information being presented here, but Teradata has announced a slew of new products for managing and analyzing data.

The first new offering is Teradata Listener, which is designed to help organizations respond to the challenges presented by the Internet of Things (IoT) and streaming big data sources. Teradata Listener is intelligent, self-service software with real-time “listening” capabilities to follow multiple streams of sensor and IoT data wherever it exists globally, and then propagate the data into multiple platforms in an analytical ecosystem.

The world is full of connected devices generating a massive and constant stream of data. But less than 1 percent of this data is analyzed in any way. And all projections are that this trend will continue with more connected devices generating more data.

With Teradata Listener, “customers can now take full advantage of IoT data generated from nearly an unlimited number and type of devices. In addition, Teradata enables customers to combine IoT data with business operations and human behavioral data to maximize analytic value,” said Hermann Wimmer, co-president, Teradata.

Teradata Listener is now available in beta, and will be generally available globally in the first quarter of 2016.

But this was not the only announcement made by Teradata. The company also announced Teradata Aster Analytics on Hadoop, an integrated analytics solution featuring a set of more than 100 business-ready, distinctly different analytics techniques and seven vertical industry applications to run directly on Hadoop. This allows organizations to seamlessly address business problems with an integrated analytics solution.

The flexibility and simplicity of these capabilities enables everyday business analysts to perform as data scientists by tackling the organization’s most challenging problems. Teradata Aster Analytics on Hadoop allows users to combine machine learning, text, path, pattern, graph, and statistics within a single workflow. Teradata offers flexible Aster Analytics deployments that include the Teradata Big Analytics Appliance, Hadoop , the software only version, or in the Teradata Cloud.

Teradata Aster Analytics on Hadoop will be shipped globally in the second quarter of 2016.

Also announced was the release (on Monday, October 19, 2015) ofthe Teradata Integrated Big Data Platform 1800 is designed to support IoT capabilities. It enables customers to perform complex analytics at scale and is available at a cost-effective price of approximately $1,000 per terabyte of compressed data. The Teradata Database, running on the Teradata Integrated Big Data Platform, provides access to data in many formats, including XML, name-value pair, BSON (Binary JSON) and JSON from web applications, sensors, and Internet of Things-connected machines.

Next up from Teradata is the Teradata Active Enterprise Data Warehouse 6800, a data warehouse platform built to run the Teradata Database to its fullest capabilities including Teradata Virtual Storage and Teradata Active System Management. The massively parallel processing (MPP) architecture of the platform matches the parallel, shared nothing architecture of the Teradata Database.

The overall computational power of the Teradata Active Enterprise Data Warehouse is boosted by 25 percent with the latest technology from the Intel® Xeon® Processor E5-2600v3 Family. It offers up to 15 percent faster in-memory processing with the latest DDR4 SDRAM cards with up to 1.5 terabytes of memory per cabinet. To support tactical queries that require sub-second access to frequently used data, the solid state drive (SSD) size is now four times larger and offers up to 128 terabytes of hot data space per cabinet.

The Teradata Active Enterprise Data Warehouse 6800 and the Teradata Integrated Big Data Platform 1800 are now available worldwide.

The company not only provides the database technology and analytics solutions, but it also provides applications for marketing that bring together the data management, analytics and big data capabilities of Teradata’s solutions. Teradata announced global availability of the newest version of Teradata Integrated Marketing Cloud, a powerful data hub comprising integrated solutions that are already helping more than one-third of the S&P Global 100 drive revenue and improve customer engagement through data-driven integrated marketing.

Teradata’s latest release helps marketers to individualize their marketing campaigns and connect one-to-one with customers by unifying customer-interaction data across paid, earned and owned channels, at scale.  Now, from campaign inception through every customer interaction and response, Teradata provides marketers the most agile and comprehensive integrated data-driven marketing platform available on the market.

I warned you up front that there were a lot of announcements!  The company also released new versions of its Teradata Unified Data Architecture™ (UDA), and the Teradata QueryGrid™: Teradata Database-to-Presto, which enables cross-platform SQL queries initiated from Apache® Hadoop™ or the Teradata Database. In addition, Teradata streamlines data exploration and analytics by integrating a Teradata Database, Teradata Aster Analytics, and Hadoop in a single Teradata® UDA Appliance.

The Teradata UDA Appliance is an enterprise-class appliance to enable a flexible combination of the Teradata Database, Teradata Aster Analytics, and Hadoop to meet customer workload requirements. All software is installed into one cabinet, providing the advantages of an analytic ecosystem in a smaller data center footprint. It is a fully configurable analytic ecosystem, which can be deployed as a Teradata Unified Data Architecture. The appliance can easily adapt to the customer’s requirements to promote data-driven decisions on large, fast-moving structured, semi-structured, and unstructured data. It supports an open model for data acquisition and discovery, and integrates data from all parts of the organization, while matching a variety of workload requirements to the right data sets.

“Since Teradata introduced the UDA over three years ago, we have seen rapid adoption of this approach,” said Oliver Ratzesberger, president, Teradata Labs. “Our new products will further our customers’ ability to develop the most flexible, integrated, and powerful analytics platform in the world.”

Both the Teradata QueryGrid: Teradata Database-to-Presto software and the Teradata UDA Appliance will be available globally in the first quarter 2016.

Teradata also announced a new managed services business for Hadoop Data Lakes, as well as promoted announcements from partners including real-time processing to the Teradata UDA from SQLstream, a partnership with Stonebranch to automate the data supply chain beyond the warehouse, and a data analytics handbook developed by Celebrus and Teradata (which can be downloaded here http://www.celebrus.com/analytics-handbook).

So, all in all, there has been a lot of activity from Teradata that is well worth looking into if you have big data, analytics, data warehousing or marketing needs.

Posted in DBA | 1 Comment

How Might Big Data Impact the Role of the DBA?

Red background with diagrams information and numbers superimpos uid

I was recently asked the question that is the title of this blog post and I thought, hmmm, now there’s an interesting topic for my blog. So after answering I wrote down some of my thoughts and I have assembled them here to share with you today. If you have any further thoughts on this topic, please share them in the comments area below!

So, how might life change for DBAs as organizations embrace Big Data? That’s a loaded question. Life is always changing for DBAs! The DBA is at the center of new application development and therefore is always learning new technologies – not always database-related technologies. Big Data will have a similar impact. There is a lot of new technology to learn. Of course, not every DBA will have to learn each and every type of technology.

DBAs should be learning NoSQL DBMS technologies, but not with an eye toward replacing relational. Instead, at least for the time-being, NoSQL technologies (Key/Value, column, document store, and graph) are currently very common in big data and advanced analytics projects. My view is that these products will remain niche solutions, but the technology will be widely adopted. How will that happen? Well, relational DBMSs will add functionality to combat the NoSQL offerings, just like they did to combat the Object-Oriented DBMS offerings in the 1990s. So instead of just offering a relational engine, a DBMS (such as Oracle or DB2) will offer additional engines, such as key/value or document stores.

That means that DBAs who spend the time to learn what the NoSQL database technologies do today will be well-prepared for the multi-engine DBMS of the future. Not only will the NoSQL-knowledgeable DBA be able to help implement projects where organizations are using NoSQL databases today, but they will also be ahead of their peers when NoSQL functionality is added to their RDBMS product(s).

DBAs should also learn about Hadoop, MapReduce and Spark. Now Hadoop is not a DBMS, but it is likely to be a long-term mainstay for data management, particularly for managing big data. An education in Hadoop and MapReduce will bolster a DBA’s career and make them more employable long-term. And Spark looks like it is here for the long run, too. So learning how Spark can speed up big data requests with in-memory capabilities is also a good career bet.

It would also be a good idea for DBAs to read up on analytics and data science. Although most DBAs will not become data scientists, some of their significant users will be. And learning what your users do – and want to do with the data – will make for a better DBA.

And, of course, a DBA should be able to reasonably discuss what is meant by the term “Big Data.” Big Data is undoubtedly here to stay. Of course, the industry analyst firms have come up with their definitions of what it means to be processing “Big Data”, the most famous of which talks about “V”s. As interesting as these definitions may be, and as much discussion as they create, the definitions don’t really help to define what benefit organizations can glean from Big Data.

So, with that in mind, and if we wanted to be more precise, it would probably make sense to talk about advanced analytics, instead of Big Data. Really, the analytics is the motivating factor for Big Data. We don’t just store or access a bunch of data because we can… we do it to learn something that will give us a business advantage… that is what analytics is. Discovering nuggets of reality in mounds and mounds of data. But I am not in favor of that. Why?

Well, more than half the battle is getting the attention of decision makers, and the term Big Data has that attention in most organizations. As a data proponent, I think that the data-focused professionals within companies today should be trying to tie all of the data management and exploitation technologies to the Big Data meme in order to the attention of management and be able to procure funding. C’mon, as a DBA doesn’t it make sense to take advantage of an industry meme with the word “data” right in it? By doing so we can better manage the data (small, medium and big) that we are called upon to manage!

Finally, I would urge DBAs to automate as many data management tasks as possible. The more automated existing management tasks become, the more available DBAs become to learn about, and work on the newer, more sexy projects.

Posted in DBA | Tagged , , | 1 Comment

Automating the DBA Out of Existence?

auto

Automation and autonomics are at the forefront of database and system administration and management these days. And that is a good thing because automation and autonomics can minimize the amount of time, error and human effort involved in assuring and maintaining efficient database systems and applications.

And, yes, there is also a lot of vendor hype about self-managing database systems, from the DBMS vendors themselves, and from ISVs selling performance and maintenance solutions and tools. So I suppose it stands to reason that folks will start to ask questions like this: If the DBMS and databases are going to manage themselves, will anyone need a DBA?

But don’t get too excited about the extinction of the DBA! There are many reasons why DBAs are not going anywhere anytime soon. Self-managing databases systems are indeed a laudable goal, but we are very far away from a “lights-out” DBMS environment. Many of the self-managing features require using the built-in tools from the DBMS vendor, such as Oracle Enterprise Manager or IBM Data Studio. But many organizations prefer to use heterogeneous solutions that can administer databases from multiple vendors all from a single console. And many of these tools have had self-managing features for years. And we still need DBAs…

Most performance management solutions allow you to set performance thresholds. But these thresholds are only as good as the variables you set and the actions you define to be taken when the threshold is tripped. Some software is intelligent; that is, it “knows” what to do and when to do it. Furthermore, it may be able to learn from past actions and results. The more intelligence that can be built into a self-managing system, the better the results typically will be. This brings us to autonomics… but what is autonomic computing?

Autonomics is more than mere automation… Autonomic computing refers to the self-managing characteristics of distributed computing resources, adapting to unpredictable changes while hiding intrinsic complexity to operators and users. And yes, the more our systems can manage themselves, the better things should become.

But who among us currently trusts software to work like a grizzled veteran DBA? The management software should be configurable so that it alerts the DBA as to what action it wants to take. The DBA can review the action and give a “thumbs up” or “thumbs down” before the corrective measure is applied. In this way, the software can earn the DBA’s respect and trust. When DBAs trust the software, they can turn it on so that it self-manages “on the fly” without DBA intervention. But today, in most cases, a DBA is required to set up the thresholds, as well as to ensure their on-going viability.

Furthermore, database backup and recovery will need to be guided by the trained eye of a DBA. Perhaps the DBMS can become savvy enough to schedule a backup when a system process occurs that requires it. Maybe the DBMS of the future will automatically schedule a backup when enough data changes. But sometimes backups are made for other reasons: to propagate changes from one system to another, to build test beds, as part of program testing, and so on. A skilled professional is needed to build the proper backup scripts, run them appropriately, and test the backup files for accuracy.

And what about recovery? How can a damaged database know it needs to be recovered? Because the database is damaged any self-managed recovery it might attempt is automatically rendered suspect. I mean, if the database was all that smart to begin with why is it damaged and in need of recovery, right? Here again, we need the wisdom and knowledge of the DBA.

And there are many other DBA duties that cannot be completely automated. Of course, the pure, heads-down systems DBA may eventually become a thing of the past. Instead, the modern DBA will need to understand multiple DBMS products, not just one. And that includes non-relational big data solutions like Hadoop and NoSQL database systems.

DBAs furthermore must have knowledge of the business impact of each database under their care (for more details see Business Eye for the DBA Guy). And DBAs will need better knowledge of logical database design and data modeling — because it will advance their understanding of the meaning of the data in their databases.

So, no, we won’t automate the DBA out of existence. It can’t be done… but we can make the job of DBA more interesting and useful, as we remove much of the mundane and repetitive components through automation and autonomics.

Posted in DBA | Tagged , | 1 Comment