The Debt of Dirty Data
--Hey, why can't our Insurance and Billing system just talk to our Admissions system? We just get too many errors in patient names and numbers, and fixing all of this is becoming a nightmare.
--Oh, alright let's integrate them then.
Duplicate data, multiple databases, and the need to communicate new records and updated records, has some organizations in a frenzy. The mantra "just integrate them" makes it seem like there's a technology that will make this happen like magic.
It's not just the technology, it's the data.
Even if you follow database normalization rules (arcane, mathematically-based rules for designing a database for the non-database types), that doesn't mean you have clean, tight data. Even still, in all of the databases I've seen, over the years, I've still yet to see a perfectly designed database anyway. Add to this mess, a customer record looks like different things to different people and systems, making two systems "talk" to each other requires yet a whole 'nother system.
Let's start with clean data. As a simple example walkthrough consider a customer address.
--Address is easy, the Post Office has been handling addresses for hundreds of years now.
Is it? What should be the maximum length of the first address line? If you say infinite, then how does that show up when you print an envelope? Do you want to keep track of physical versus box addresses? Why not have a field for PO_BOX_NO? This way we don't have to type P.O. Box every time in the first address line which reduces errors.
Through internet shopping exposure, people now have no trouble justifying tracking a Billing Address and a Mailing Address. But wouldn't it make sense to keep track of previous addresses? If your system doesn't keep track of history, it's worth considering tracking Previous Billing Address and Previous Mailing address. Customers move and some day you may have to prove to them or show when you updated their address.
--Okay, but this is trivial.
Right. What if someone gives us the wrong zip code, shouldn't we be able to verify address against zip code to make sure the data is clean?
--Yes, all systems do that.
No. Unfortunately, addresses change all of the time which requires an update from the Post Office MIPS database, so you need a subscription and a way to update your verification database.
--Let's skip that for now, we have a deadline.
And now, once you've skipped this validation, when you have a non-trivial customer database, you will have unclean data, eventually.
Lesson 1: All data should be validated to be clean.
Lesson 2: Validation reveals hidden costs in implementing a database.
Note: We are not even talking about a database management system (DBMS), but just a database.
And I will add the following.
Lesson 3: Not ensuring clean data now, pushes the costs into the future like using a credit card that will have to be paid back with interest.
So, system integration requires you to pay off the dirty-data debt with interest on top of all of the other visible costs--just one reason system integration is not just technology.
No comments:
Post a Comment