Data volume and higher transaction velocities associated with modern applications are driving change into organizations across all industries. This is happening for a number of reasons. Customer and end user expectations for interacting with computerize systems has changed. And technology is changing to accommodate these requirements. Furthermore, larger and larger amounts of data are being generated and made available, both internally and externally to our businesses. Therefore, the desire and capability to store large amounts of data continues to expand.
One clear goal of most organizations is to be able to harness all of this data – regardless of its source or size – and to glean actionable insight from it. This is known as analytics. Advanced analytical capabilities can be used to drive a wide range of applications, from operational applications such as fraud detection to strategic analysis such as predicting patient outcomes. Regardless of the applications, advanced analytics provides intelligence in the form of predictions, descriptions, scores, and profiles that help organizations better understand behaviors and trends.
Furthermore, the desire to move up the time-to-value for analytics projects will result in a move to more real-time event processing. Many use cases can benefit from early detection and response, meaning that identification needs to be as close to real time as possible. By analyzing reams of data and uncovering patterns, intelligent algorithms can make reasonably solid predictions about what will occur in the future. This requires being adept enough to uncover the patterns before changes occur. This does not always have to happen in real time.
Issues in Deploying Advanced Analytics
When implementing an analytics project it is not uncommon to encounter problems along the way. One of the first issues that needs to be addressed when adopting analytics in the cognitive era is having organization leaders who will embrace the ability to make decisions based on data instead of gut feelings based on the illusion of having data. Things change so fast these days that it is impossible for humans to keep up with all of the changes. Cognitive computing applications that rely on analytics can ingest and understand vast amounts of data and keep up with the myriad of changes occurring daily…if not hourly. Armed with advice that is based on a thorough analysis of up-to-date data, executives can make informed decisions instead of what amounts to the guesses they are making today.
However, most managers are used to making decisions based on their experience and intuition without necessarily having all of the facts. When analytics-based decision making is deployed management can feel less involved and might balk. Without the buy-in at an executive level, analytics projects can be very costly without delivering an ROI, because the output (which would deliver the ROI) is ignored.
Another potential difficulty involves managing and utilizing large volumes of data. Businesses today are gathering and storing more data than ever before. New data is created during customer transactions and to support product development, marketing, and inventory. And many times additional data is purchased to augment existing business data. This explosion in the amount of data being stored is one of the driving forces behind analytics. The more data that can be processed and analyzed, the better the advanced analysis can be at finding useful patterns and predicting future behavior.
However, as data complexity and volumes grow, so does the cost of building analytic models. Before real modeling can happen, organizations with large data volumes face the major challenge of getting their data into a form from which they can extract real business information. One of the most time-consuming steps of analytic development is preparing the data. In many cases, data is extracted, and a subset of this data is used to create the analytic data set where these subsets are joined together, merged, aggregated, and transformed. In general, more data is better for advanced analytics.
There are two aspects to “more data”: (1) data can increase in depth (more customers, transactions, etc.), and (2) data can grow in width (where subject areas are added to enhance the analytic model). At any rate, as the amount of data expands, the analytical modeling process can elongate. Clearly performance can be an issue.
Real-time analytics is another interesting issue to consider. The adjective real-time refers to a level of responsiveness that is immediate or nearly immediate. Market forces, customer requirements, governmental regulations, and technology changes collectively conspire to ensure that data that is not up-to-date is not acceptable. As a result, today’s leading organizations are constantly working to improve operations and with access to and analysis of real-time data.
For example, consider the challenge of detecting and preventing fraud. Each transaction must be analyzed to determine its validity. The organization waits for approval while this is done in real-time. But if you err on the side of safety, valid transactions may be declined which will cut into profit and perhaps more importantly, upset your customer. The advanced analytics approach leverages predictive analysis to scrutinize current transactions along with historical data to ensure transactions that may appear suspicious aren’t the norm for this customer. The challenge is doing this in real-time.
Nimble organizations need to assess and respond to events in real-time based on up-to-date and accurate information, rules, and analyses. Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. If, at the moment information is created (or soon thereafter) in operational systems, it is sensed and acted upon by an analytical process, real-time analytics have transpired.
As good as real-time analytics sounds, it is not without its challenges to implement. One such challenge is reducing the latency between data creation and when it is recognized by analytics processes.
Time-to-market issues can be another potential pitfall of an advanced analytics project. A large part of any analytical process is the work involved with gathering, cleansing, and manipulating data required as input to the final model or analysis. As much of 60% to 80% of the man-effort during a project goes toward these steps. This up-front work is essential though to the overall success of any advanced analytics project.
From a technology perspective, managing the boatload of data and the performance of operations against that data can be an issue. Larger organizations typically rely on a mainframe computing environment to process their workload. But even in these cases the mainframe is not the only computing platform in use. And the desire to offload analytics to other platforms is often strong. However, for most mainframe users, most of the data resides on the mainframe. If analytics is performed on another platform moving large amounts of data to and from the mainframe can become a bottleneck. Good practices, and good software will be needed to ensure that efficient and effective data movement is in place.
But before investing in a lot of data movement off of the mainframe, consider evaluating the cost of keeping the data where it is and moving the processes to it (the analytics) versus the cost of moving the data to the process. Usually, the former will be more cost effective.
Taking advantage of more in-memory processes can also be an effective approach for managing analytical tasks. Technologies like Spark, which make greater use of memory to store and process data, are gaining in popularity. Of course, there are other in-memory technologies worth pursuing as well.
Another technology that is becoming more popular for analytics is streaming data software. Streaming involves the ingestion of data – structured or unstructured – from arbitrary sources and the processing of it without necessarily persisting it. This is contrary to our common methodology of storing all data on disk.
Although any digitized data is fair game for stream computing, it is most common for analyzing measurements from devices. As the data streams it is analyzed and processed in a problem-specific manner. The “sweet spot” for streaming is situations in which devices produce large amounts of instrumentation data on a regular basis. The data is difficult for humans to interpret easily and is likely to be too voluminous to be stored in a database somewhere. Examples of types of data that are well-suited for stream computing include healthcare, weather, telephony, stock trades, and so on.
By analyzing large streams of data and looking for trends, patterns, and “interesting” data, stream computing can solve problems that were not practical to address using traditional computing methods. To put it in practical terms, think about your home fire detectors. These devices are constantly up and running, waiting for a condition. When fire or smoke is detected, an alarm is sounded. Now if this was to be monitored remotely, you wouldn’t want to store all of the moments in time when there was no fire… but you care a lot about that one piece of data when the fire is detected, right?
Consider a healthcare example. One healthcare organizations is using an IBM stream computing product, InfoSphere Streams, to help doctors detect subtle changes in the condition of critically ill premature babies. The software ingests a constant stream of biomedical data, such as heart rate and respiration, along with clinical information about the babies. Monitoring premature babies as a patient group is especially important because certain life-threatening conditions, such as infection, may be detected up to 24 hours in advance by observing changes in physiological data streams. The biomedical data produced by numerous medical instruments cannot be monitored manually nor can a never-ending stream of values for multiple patients be stored long term.
But the stream of healthcare data can be constantly monitored with a stream computing solution. As such, many types of early diagnoses can be made that would take medical professionals much longer to make. For example, a rhythmic heartbeat can indicate problems (like infections); a normal heartbeat is more variable. Analyzing an ECG stream can highlight this pattern and alert medical professionals to a problem that might otherwise go undetected for a long period. Detecting the problem early can allow doctors to treat an infection before it causes great harm.
A stream computing application can get quite complex. Continuous applications, composed of individual operators, can be interconnected and operate on multiple data streams. Again, think about the healthcare example. There can be multiple streams (blood pressure, heart, temperature, etc.), from multiple patients (because infections travel from patient to patient), having multiple diagnoses.
The Bottom Line
There are many new and intriguing possibilities for analytics that require an investment in learning and new technology. But the return on the investment is potentially quite large in terms of gaining heretofore unknown insight into your business, and also in better servicing your customers. After all, that is the raison d’être for coming to work each day!