Today’s blog post is actually a reprint from a previous blog I wrote (that has since been deactivated). I am republishing the material because it was evidently useful enough to be cited in Wikipedia (on the page defining synthetic data).
At any rate, today we will tackle the subject of production data. I received an e-mail on that topic that made me stop and think a bit… so I thought I’d blog about it. Basically, the e-mail posed the question in the title of this blog entry – “What is production data?”
The e-mail read as follows:
I'm looking for a one paragraph definition of "production data". What do you think of this: "Production data is data recorded for the purpose of controlling/managing/reporting/researching events, processes or states."
I'm trying to get around the belief that data recorded by a development team to manage its projects and resources is somehow less than production data. To me it should be regarded as the development team's "production data" and so I'm looking for a definition that satisfactorily encompasses that belief, as well as encompassing regular business production data.
You know, I do not recall ever seeing an actual definition of the term “production data.” The above definition is a good starting point, but I do not think it is sufficient.
The author of the e-mail makes a good point about different types of production data. The data used by an application development team to conduct their business (writing computer programs to support business processes) is definitely production data… to the application development team, not to a business user. So it all depends on the perspective of the viewer, I suppose.
Here is my definition of production data:
- Production data is information that is persistently stored and used by professionals to conduct business processes. It must be accurate, documented, and managed on an on-going basis to ensure its value to the organization.
I say information instead of data because any data must be defined and in context in order to be useful for production work. And I say persistent because even though there may be many forms of transitory data used by production processes, it is the data that is stored over periods of time that needs to be managed. The definition does not include the requirement for policies regarding retention, audit, and other governance-related issues, but perhaps that should be included, too. Even if there are no government or industry mandated regulations regarding production data, there will be local business policies (even if they are as simple as “the data must be available during customary business hours”).
The definition should serve the needs of the e-mailer, though.
What do you think? Did I miss anything?