Craig Mullins Presenting at Data Summit 2019

The world of data management and database systems is very active right now. Data is at the center of everything that modern organizations do and the technology to manage and analyze it is changing rapidly. It can be difficult to keep up with it all.

If you find yourself in need of being up to speed on everything going on in the world of data, you should plan on attending Data Summit 2019, May 21-22 in Boston, MA.

logo2019

Craig Mullins will be talking about the prevailing database trends during his talk The New World of Database Technologies (Tuesday at Noon).  Keep abreast of the latest trends and issues in the world of database systems, including how the role of DBA is evolving and participating in those trends.

This presentation offers an overview of the rapidly changing world of data management and administration as organizations digitally transform. It examines how database management systems are changing and adapting to modern IT needs.

Issues covered during this presentation include cloud/DBaaS, analytics, NoSQL, IoT, DevOps and the database, and more. We’ll also examine what is happening with DBAs and their role within modern organizations. And all of the trends are backed up with references and links for your further learning and review.

I hope to see you there!

Posted in DBA | Leave a comment

Craig Mullins to Deliver Database Auditing Webinar – May 15, 2019

Increasing governmental and industry regulation coupled with the need for improving the security of sensitive corporate data has driven up the need to track who is accessing data in corporate databases. Organizations must be ever-vigilant to monitor data usage and protect it from unauthorized access.

Each regulation places different demands on what types of data access must be monitored and audited. Ensuring compliance can be difficult, especially when you need to comply with multiple regulations. And you need to be able to capture all relevant data access attempts while still maintaining the service levels for the performance and availability of your applications.

Register for the next Idera Geek Sync webinar, Database Auditing Essentials: Tracking Who Did What to Which Data When, on Wednesday May 15 @ 11 am CT to be delivered by yours truly.

As my regular readers know, database access auditing is a topic I have written and spoken about extensively over the years, so be sure to tune in to hear my latest thoughts on the topic.

You can learn more about the issues and requirements for auditing data access in relational databases. The goal of this presentation is to review the regulations impacting the need to audit at a high level, and then to discuss in detail the things that need to be audited, along with pros and cons of the various ways of accomplishing this.

Register here →

Posted in auditing, compliance, Database security, DBA, speaking engagements | Leave a comment

Inside the Data Reading Room – 1Q2019

It has been awhile since I have published a blog post in the Inside the Data Reading Room series, but that isn’t because I am not reading any more!  It is just that I have not been as active reviewing as I’d like to be. So here we go with some short reviews of data and analytics books I’ve been reading.

Let’s start with Paul Armstrong’s Disruptive Technologies: Understand Evaluate Respond.  Armstrong is a technology strategist who has worked for and with many global companies and brands (including Coca Cola, Experian, and Sony, among others). In this book he discusses strategies for businesses to work with new and emerging technologies.

Perhaps the strongest acclaim that I can give the book is that after reading the book, you will feel that its title is done justice. Armstrong defines what a disruptive technology is and how embrace the change required when something is “disruptive.”

The books offers up a roadmap that can be used to assess, handle, and resolve issues as you identify upcoming technology changes and respond to them appropriately. It idendifies a decision-making framework that can be used that is based on the dimensions of Technology, Behaviour and Data (TBD).

The book is clear and concise, as well as being easy to read. It is not encumbered with a lot of difficult jargon. Since technology is a major aspect of all businesses today (digital transformation) I think both technical and non-technical folks can benefit from the sound approach as outlined in this book.

Another interesting book you should take a look at if you are working with analytics and AI is Machine Learning: A Constraint-Based Approach by Marco Gori. This is a much weightier tome that requires attention and dilgence to digest. But if you are working with analytics, AI, and/or machine learning in any way, the book is worth reading.

The book offers an introductory approach for all readers with an in-depth explanation of the fundamental concepts of machine learning. Concepts such as neural networks and kernel machines are explained in a unified manner.

Information is presented in a unified manner is based on regarding symbolic knowledge bases as a collection of constraints. A special attention is reserved to deep learning, which nicely fits the constrained- based approach followed in this book.

The book is not for non-mathematicians or those only peripherally interested in the subject. Over more than 500 pages the author

There is also a companion web site that procides additional material and assistance.

The last book I want to discuss today is Prashanth H. Southekal’s Data for Business Performance. There is more data at our disposal than ever before and we continue to increase the rate at which we manufacture and gather more data. Shouldn’t we be using this data to improve our businesses? Well, this book provides guidance and techniques to derive value from data in today’s business environment.

Southekal looks at deriving value for three key purposes of data: decision making, compliance, and customer service. The book is structured into three main sections:

  • Part 1 (Define) builds fundamental concepts by defining the key aspects of data as it pertains to digital transformation. This section delves into the different processes that transform data into a useful asset
  • Part 2 (Analyze) covers the challenges that can cause organizations to fail as they attempt to deliver value from their data… and it offers solutions to these challenges that are practical and can be implemented.
  • Part 3 (Realize) provides practical strategies for transforming data into a corporate asset. This section also discusses frameweorks, procedures, and guidelines that you can implement to achieve results.

The book is well-organized and suitable for any student, business person, or techie looking to make sense of how to use data to optimize your business.

If you’ve read any of these books, let me know what you think… and if you have other books that you’d like to see me review here, let me know. I’m always looking for more good books!

Posted in AI, book review, books, business planning, data, data governance, Machine Learning | Leave a comment

Navicat Enables DBAs to Adopt Modern Platforms and Practices

Database administration is a tried and true IT discipline with well-defined best practices and procedures for ensuring effective, efficient database systems and applications. Of course, as with every discipline, best practices must constantly be honed and improved. This can take on many forms. Sometimes it means automating a heretofore manual process. Sometimes it means adapting to new and changing database system capabilities. And it can also mean changing to support new platforms and methods of implementation.

To be efficient, effective, and up-to-date on industry best practices, your DBA team should be incorporating all of these types of changes. Fortunately, there are tools that can help, such as Navicat Premium which can be used to integrate all of these forms of changes into your database environment.

What is Navicat Premium? Well, it is a data management tool that supports and automates a myriad of DBA tasks from database design through development and implementation. Additionally, it supports a wide range of different database management systems, including MariaDB, Microsoft SQL Server, MongoDB, MySQL, Oracle Database, PostgreSQL, SQLite and multiple cloud offerings (including Amazon, Oracle, Microsoft, Oracle, Google, Alibaba, Tencent, MongoDB Atlas and Huawei).

The automation of the DBA tasks using Navicat reduce the amount of time, effort, and human error involved in implementing and maintaining efficient database systems. And for organizations that rely on multiple database platforms – which is most of them these days – Navicat helps not only with automation, but with a consistent interface and methodology across the different database technologies you use.

Navicat can also assist DBAs as their organizations adapt to new capabilities and new platforms. For example, cloud computing.

Although Navicat Premium is typically installed on your desktop, it connects not only to on-premises databases, but also cloud databases such as Amazon RDS, Amazon Aurora, and Amazon Redshift. Amazon removes the need to set up, operate, and scale a relational database, allowing you to focus on the database design and management. Together with an Amazon instance, Navicat Premium can help your DBAs to deliver a high-quality end-to-end database environment for your business applications.

Let’s face it, you probably have a complex data architecture with multiple databases on premises, as well as multiple different databases in the cloud. And almost certainly you are using more than one flavor of DBMS. Without a means to simplify your administrative tasks things are going to fall through the cracks, or even worse, be performed improperly. Using Navicat Premium your DBA team will have an intuitive GUI to manipulate and manage all of your database instances – on premises and in the cloud with a set of comprehensive features for database development and maintenance

You can navigate the tree of database structures just like for on premises data. And then connect to the database in the cloud to access and manage it, as we see here for “Amazon Aurora for MySQL connection”:

navicat_cloud_connection

Perhaps one of the more vexing issues with cloud database administration is data movement. Navicat Premium provides a Data Transfer feature that automates the movement of data across database platforms – local to local, local to cloud, or to an SQL file.

Another important consideration is the ability to collaborate with other team members, especially for organizations with remote work teams. The Navicat Cloud options provides a central space for your team to collaborate on connection settings, queries and models. Multiple co-workers can contribute to any project, creating and modifying work as needed. All changes are synced automatically, giving all team members the latest information.

For example, here we see the Navicat Cloud Navigation pane:

navicat_cloud_navigationpane

Another reality of modern computing is that a lot of work is done on mobile devices, such as phones and tablets. DBA work is no longer always conducted on a laptop or directly on the database server. Being able to perform database administration tasks from mobile devices enables DBAs to react quickly, wherever they are whenever their help is needed. You can run Navicat on iOS to enable your mobile workforce to use the devices they always have with them.

When migrating from the large screen common on PCs and laptops, to the smaller screen, common on mobile phones and tablets, you do not want the same layout because it can be difficult to navigate on the smaller devices. Users want the interface to conform to the device, and that is what you get with Navicat iOS.

Let’s look at some examples. Here we see a data grid view for a MySQL table as it would look on an iPhone and an iPad:

02.product_01_mysql_ios_gridview

But you may want to design databases from your mobile device. That is possible with Navicat iOS… here we see the Object Designer interface on the iPhone and iPad:

02.product_01_mysql_ios_objectdesigner

Another common task is building SQL queries, which is also configured appropriately for the mobile experience, as shown here:

02.product_01_mysql_ios_sqlbuilder

Adapting to mobile technologies is important because, mobile workers are here to stay. And we need to be ready to support them with robust software designed to operate properly in a mobile, modern workforce.

The Bottom Line

We must always be adapting to new and changing requirements by adopting tools and methodologies that not only automate tasks, but also incorporate new and modern capabilities. Take a look at what Navicat can do to help you accomplish these goals.

Posted in cloud, database design, DBA, mobile, SQL | Leave a comment

Common Database Design Errors

Before we begin today’s blog post, wherein I explain some of the more common mistakes that rookies and non-database folks make (heck, even some database folks make mistakes), I first want to unequivocally state that your organization should have a data architecture team that is responsible for logical and conceptual modeling… and your DBA team should work in tandem with the data architects to ensure well-designe databses.

OK, so what if that isn’t your experience? Frankly, it is common for novices to be designing databases these days, so you aren’t alone. But that doesn’t really make things all that much better, does it?

The best advice I can give you is to be aware of design failures that can result in a hostile database. A hostile database is difficult to understand, hard to query, and takes an enormous amount of effort to change.

So with all of that in mind, let’s just dig in and look at some advice on things not to do when you are designing your databases.

Assigning inappropriate table and column names is a common design error made by novices. Database names that are used to store data should be as descriptive as possible to allow the tables and columns to self-document themselves, at least to some extent. Application programmers are notorious for creating database naming problems, such as using screen variable names for columns or coded jumbles of letters and numbers for table names. Use descriptive names!

When pressed for time, some DBAs resort to designing the database with output in mind. This can lead to flaws such as storing numbers in character columns because leading zeroes need to be displayed on reports. This is usually a bad idea with a relational database. It is better to let the database system perform the edit-checking to ensure that only numbers are stored in the column.

If the column is created as a character column, then the developer will need to program edit-checks to validate that only numeric data is stored in the column. It is better in terms of integrity and efficiency to store the data based on its domain. Users and programmers can format the data for display instead of forcing the data into display mode for storage in the database.

Another common database design problem is overstuffing columns. This actually is a normalization issue. Sometimes a single column is used for convenience to store what should be two or three columns. Such design flaws are introduced when the DBA does not analyze the data for patterns and relationships. An example of overstuffing would be storing a person’s name in a single column instead of capturing first name, middle initial, and last name as individual columns.

Poorly designed keys can wreck the usability of a database. A primary key should be nonvolatile because changing the value of the primary key can be very expensive. When you change a primary key value you have to ripple through foreign keys to cascade the changes into the child table.

A common design flaw is using Social Security number for the primary key of a personnel or customer table. This is a flaw for several reasons, two of which are: 1) a social security number is not necessarily unique and 2) if your business expands outside the USA, no one will have a social security number to use, so then what do you store as the primary key?

Actually, failing to account for international issues can have greater repercussions. For example, when storing addresses, how do you define zip code? Zip code is USA code but many countries have similar codes, though they are not necessarily numeric. And state is a USA concept, too.

Of course, some other countries have states or similar concepts (Canadian provinces). So just how do you create all of the address columns to assure that you capture all of the information for every person to be stored in the table regardless of country? The answer, of course, is to conduct proper data modeling and database design.

Denormalization of the physical database is a design option but it can only be done if the design was first normalized. How do you denormalize something that was not first normalized? Actually, a more fundamental problem with database design is improper normalization. By focusing on normalization, data modeling and database design, you can avoid creating a hostile database.

Without proper upfront analysis and design, the database is unlikely to be flexible enough to easily support the changing requirements of the user. With sufficient preparation, flexibility can be designed into the database to support the user’s anticipated changes. Of course, if you don’t take the time during the design phase to ask the users about their anticipated future needs, you cannot create the database with those needs in mind.

Summary

Of course, these are just a few of the more common database design mistakes. Can you name more? If so, please discuss your thoughts and experiences in the comments section.

Posted in data, data modeling, database design, DBA | Tagged | Leave a comment

Happy New Year 2019

Just a quick post today to wish everybody out there a very Happy New Year!

Happy-New-Year-

I hope you have started 2019 off with a bang and that the year is successful and enjoyable for one and all!

Posted in Happy New Year | Leave a comment

FaunaDB: A multi-model, distributed database system with ACID consistency

Although relational, SQL database systems continue to dominate the DBMS market, modern database management has shifted to encompass additional types of database systems. This is exemplified in the rise of the NoSQL database system to serve the needs of modern applications that are not as well-suited for existing relational, SQL database systems.

What used to be rather simple – choosing from three or four market leading SQL DBMS products – has now become confusing and difficult trying to understand the morass of different DBMS types and offerings on the market.

A Multi-Model Approach

Well, one solution to avoid the confusion is to select a multi-model DBMS offering. A multi-model database system supports multiple types of database models, such as relational, document, graph, wide column, and key/value. FaunaDB is an example of a multi-model DBMS capable of managing both relational and NoSQL data, and designed to support modern, scalable, real-time applications.

FaunaDB combines the scale and flexibility of NoSQL with the safety and data integrity of relational systems. The company refers to this as Relational NoSQL. Unlike many NoSQL database systems, FaunaDB delivers ACID compliance. You can scale transactions across multiple shards and regions while FaunaDB guarantees the accuracy and integrity of your data and transactions.

FaunaDB enables your developers to write sophisticated transactions using languages they already know. And you can pull data from document, relational, graph, and temporal data sets all from within a single query.

Since FaunaDB is NoSQL, you won’t be using SQL to access databases. The Fauna Query Language (FQL) is the primary interface for interacting with a FaunaDB cluster. FQL is not a general-purpose programming language, but it provides for complex, manipulation and retrieval of data stored within FaunaDB. The language is expression-oriented: all functions, control structures, and literals return values. This makes it easy to group multiple results together by combining them into an Array or Object, or map over a collection and compute a result – possibly fetching more data – for each member.

A query is executed by submitting it to a FaunaDB cluster, which computes and returns the result. Query execution is transactional, meaning that no changes are committed when something goes wrong. If a query fails, an error is returned instead of a result.

FQL supports a comprehensive set of data types in four categories: simple types, special types, collection type and complex types. A simple data type is one that is native to FaunaDB and also native to JSON, such as Boolean, Null, Number and String. Special data types in FaunaDB extend the limited number of native JSON data types; Bytes, Date, Query, Ref, Set and Timestamp. A complex data type is a composite of other existing data types, such as an Object or Instance. And the collection data type is able to handle multiple items while maintaining order, such as Array and Page.

Consistency

Perhaps the most impressive aspect of FaunaDB is how it enables strict serializability for external transactions. By supporting serializable isolation, FaunaDB can process many transactions in parallel, but the final result is the same as processing them one after another. The FaunaDB distributed transaction protocol processes transactions in three phases:

  • In the first, speculative phase, reads are performed as of a recent snapshot, and writes are buffered.
  • The second phase uses a consensus protocol to insert the transaction into a distributed log. At this point, the transaction gets a global transaction identifier that indicates its equivalent serial order relative to all other concurrent transactions. This is the only point at which global consensus is required.
  • Finally, the third phase checks each replica verifying the speculative work. If there are no potential serializability violations, the work is made permanent and buffered writes are written to the database. Otherwise, the transaction is aborted and restarted.

This software approach is novel and allows for the scaling of transactions across multiple shards and regions while guaranteeing transactional correctness and data accuracy. Contrast this with other database systems, such as Google Spanner, that rely on distributed clock synchronization to ensure data consistency.

The FaunaDB approach is based on a 2012 Yale University paper titled “Calvin: Fast Distributed Transactions for Partitioned Database Systems.” You can download that paper here. And if you are interested in additional details, consult this blog post: Consistency without Clocks: The FaunaDB Distributed Transaction Protocol.

Multi-Tenancy

Many database systems provide multi-tenant capabilities. They can contain multiple databases, each with their own access controls. FaunaDB takes this further by allowing any database to have multiple child databases. This enables an operator to manage a single large FaunaDB cluster, create a few top-level databases, and give full administrative access of those databases to associated teams. Each team is free to create as many databases as they need without requiring operator intervention. As far as the team is concerned, they have their own full FaunaDB cluster.

Temporality

Strong temporal support is an additional capability of FaunaDB. Traditionally, a database system stores only data that is valid at the current point-in-time; it does not track the past state of the data. Most data changes over time, and different users and applications can have requirements to access that data at different points in time. Temporal support makes it possible to query data “as of” different past states.

All records in FaunaDB are temporal. When instances are changed, instead of overwriting the prior contents, a new instance version at the current transaction timestamp is inserted into the instance history, and marked as a create, update, or delete event. This means that with FaunaDB, all reads can be executed consistently at any point in the past or transformed into a change feed of events between any two points in time. This is useful for many different types of applications, such as auditing, rollback, cache coherency, and others.

 Strong Security

Data protection and security has become more important as data breaches continue to dominate the news. Regulation and data governance practices dictate that organizations implement strong protective measure on sensitive data.

FaunaDB implements security at the API level. Access to the FaunaDB API uses  access keys, which authenticate connections as having particular permissions. This access key system applies to administrator- and server-level connections, as well as to object- and user-level connections.

In other words, reading or writing instances of user-defined classes in FaunaDB requires a server key, or an instance token with appropriate permissions.

Delivery Models

FaunaDB can run anywhere you need it to run: on-premises, in your cloud, the public cloud, even multiple clouds. Basically, FaunaDB can run anywhere you can run a JVM.

The FaunaDB Serverless Cloud enables developers to implement and elastically scale cloud applications with no capacity planning or provisioning. FaunaDB Cloud provides essential features that enable developers to safely build and run serverless applications without configuring or operating infrastructure.

The serverless approach uses an event-driven architecture where developers code functions and deploy them to the infrastructure. The functions only consume resources when they are invoked, at which point they run within the architecture. A serverless architecture is conducive to modern development practices because it can eliminate many of the difficulties developers face reconciling their database infrastructure with today’s development methods.

Summing Things Up

Prior to founding Fauna in 2012, the team at FaunaDB was part of the team that developed the infrastructure at Twitter. And FaunaDB is already being used at many leading enterprises. Check out these write ups about FaunaDB usage at NVIDIA, ShiftX, and VoiceConnect. Others are available at Fauna’s web site.

So, if you are looking for a multi-model, secure NoSQL database platform with strong consistency, horizontal scalability, multi-tenenacy and temporal capabilities, that can run on-premise and in the cloud, consider taking a look at FaunaDB.

Posted in cloud, data availability, DBMS, Isolation Level, NoSQL, relational, temporal | Leave a comment

SQL Performance and Optimization

Just a quick post today to refer my readers to a series of blog posts that I recently made to the IDERA database community blog.

This four-part blog series took a look into SQL performance and optimization from a generic perspective.  By that I mean that I did not focus on any particular DBMS, but on the general things that are required of and performed during the optimization of SQL.

Part one – Relational Optimization, introduces and explains the general concept of relational optimization and what it entails;

Part two – Query Analysis and Access Path Formulation, examines the process of analyzing SQL queries and introduces the types of access that can be performed on a single table;

Part three – Multiple Table Access Methods – takes a look at optimization methods for combining data from more than one table

And finally, part four – Additional Considerations – concludes the series with an overview of several additional aspects of SQL optimization that we have yet to discuss.

If you are looking for a nice overview of SQL and relational optimization without DBMS-specific details, give these posts a read!

 

Posted in optimization, performance, SQL | 4 Comments

My Data Quotes – 2018

I am frequently approached by journalists and bloggers for my thoughts on the data-related news of the day… and I am usually happy to discuss data with anybody! Some of these discussions wind up getting quoted in news articles and posts. I like to try to keep track of these quotes.

With that in mind, I thought I’d share the articles where I have been quoted (so far) in 2018:

I may be missing some, so if you remember chatting with me last year and you don’t see your piece listed above please ping me to let me know…

And if you are interested in some of the older pieces where I’ve been quoted I keep a log of them on my web site at mullinsconsulting.com/quoted.html.  (Note, some of the older articles/posts are no longer available, so some of the links are inoperable.)

Posted in DBA | Leave a comment

Teradata Analytics Universe 2018 and Pervasive Data Intelligence

I spent last week in Las Vegas at the Teradata Analytics Universe conference, Teradata’s annual user conference. And there was a lot to do and learn there.

 

IMG_0182

Attendees heading to the Expo Hall at the Teradata Analytics Universe conference in Las Vegas, NV — October 2018

 

The major message from Teradata is that the company is a “new Teradata.” And the message is “Stop buying analytics,” which may sound like a strange message at a conference with analytics in its name!

But it makes sense if you listen to the entire strategy. Teradata is responding to the reality of the analytics marketplace. And that reality centers around three findings from a survey the company conducted of senior leaders from around the world:

  1. Analytics technology is too complex. 74 percent of senior leaders said their organization’s analytics technology is complex; 42 percent said that analytics is not easy for their employees to use and understand.
  2. Users don’t have access to all the data they need. 79 percent of said they need access to more company data to do their job effectively.
  3. Data scientists are a bottleneck. Only 25 percent said that, within their enterprise, business decision makers have the skills to access and use intelligence from analytics without the need for data scientists.

 

WhereAreDataScientists_02x600

 

To respond to these challenges, Teradata says you should buy “answers” not “analytics.” And they are correct. Organizations are not looking for more complex, time-consuming, difficult-to-use tools, but answers to their most pressing questions.

Teradata’s calls their new approach “pervasive data intelligence,” which delivers access to all data, all the time, to find answers to the toughest challenges. This can be done on-premises, in the cloud, and anywhere in between.

A big part of this new approach is founded on Teradata Vantage, which provides businesses the speed, scale and flexibility they need to analyze anything, deploy anywhere and deliver analytics that matter. At the center of Vantage is Teradata’s respected analytics database management system, but it also brings together analytic functions and engines within a single environment. And it integrates with all the popular open source workbenches, platforms, and languages, including SQL, R, Python, Jupyter, RStudio, SAS, and more.

“Uncovering valuable intelligence at scale has always been what we do, but now we’re taking our unique offering to new heights, unifying our positioning while making our software and consulting expertise available as-a-service, in the cloud, or on-premises,” said Victor Lund, Teradata CEO.

Moving from analytical silos to an analytics platform that can deliver pervasive data intelligence sounds to me like a reasonable way to tackle the complexity, confusion, and bottlenecks common today.

Check out what Teradata has to offer at teradata.com

Posted in analytics, data, Teradata, tools | Leave a comment