Today’s blog post is actually a reprint from a previous blog I wrote (that has since been deactivated). I am republishing the material because it was evidently useful enough to be cited in Wikipedia (on the page defining synthetic data).
At any rate, today we will tackle the subject of production data. I received an e-mail on that topic that made me stop and think a bit… so I thought I’d blog about it. Basically, the e-mail posed the question in the title of this blog entry – “What is production data?”
The e-mail read as follows:
I'm looking for a one paragraph definition of "production data". What do you think of this: "Production data is data recorded for the purpose of controlling/managing/reporting/researching events, processes or states."
I'm trying to get around the belief that data recorded by a development team to manage its projects and resources is somehow less than production data. To me it should be regarded as the development team's "production data" and so I'm looking for a definition that satisfactorily encompasses that belief, as well as encompassing regular business production data.
You know, I do not recall ever seeing an actual definition of the term “production data.” The above definition is a good starting point, but I do not think it is sufficient.
The author of the e-mail makes a good point about different types of production data. The data used by an application development team to conduct their business (writing computer programs to support business processes) is definitely production data… to the application development team, not to a business user. So it all depends on the perspective of the viewer, I suppose.
Here is my definition of production data:
- Production data is information that is persistently stored and used by professionals to conduct business processes. It must be accurate, documented, and managed on an on-going basis to ensure its value to the organization.
I say information instead of data because any data must be defined and in context in order to be useful for production work. And I say persistent because even though there may be many forms of transitory data used by production processes, it is the data that is stored over periods of time that needs to be managed. The definition does not include the requirement for policies regarding retention, audit, and other governance-related issues, but perhaps that should be included, too. Even if there are no government or industry mandated regulations regarding production data, there will be local business policies (even if they are as simple as “the data must be available during customary business hours”).
The definition should serve the needs of the e-mailer, though.
What do you think? Did I miss anything?
Pingback: Tweets that mention What is Production Data? | Data and Technology Today -- Topsy.com
Pingback: Log Buffer #209, A Carnival of the Vanities for DBAs | The Pythian Blog
How about something like
“Production data is any data whose loss or unavailability would impact the ability for part of the organisation to perform its duties.” That ties ‘production’ to some sense of ownership or responsibility. It may even be useful to identify ‘local production data’ which is solely used within one part of the organisation and ‘shared/global production data’ where some co-ordination is necessary.
Insightful article:D Going to want a good amout of time to ponder this stuff!
Fantastic content.. Will need some time to toy with the article=D
This is my first time I have visited here. I found a lot of interesting information in your blog. From the volume of comments on your posts, I guess I am not the only one! keep up the good work.
OH MY GOD! That’s all I got to say