Data Integrity Testing

  • Pradyumn Sharma
  • July 18, 2017

Tags: ,

Periodically, examine the stored data in your system to find out and report if any errors have crept in; and fix them automatically where possible.

The Story of My First Financial Accounting System

Back in mid-1980s, I implemented my first Financial Accounting System for a client. It was not a very large application; I built it, single-handed, in about three months’ time, using COBOL as the programming language and ISAM (Indexed Sequential Access Method) files for storing data.

A few weeks after going live, the client complained that the Trial Balance was not tallying.

In Financial Accounting, the sum of debits and the sum of credits have to be the same, for every transaction. Logically, therefore, the Trial Balance, which prints the net debit and credit balances for each account head on a given date, should always have the same total for the debit and credit columns.

IDEs, debuggers, data browsing tools were almost non-existent in those days. It took me a few hours of work to find out that for some transactions, data was not correctly stored in the files; the debits and credits were not matching for those transactions. It took some more time to find out the cause of the bug and fix it. And even more work to correct the errors in the data in the files.

Unfortunately, the problem surfaced again a couple of months later. And again. And again. Sometimes it did not even seem like a programming error on my part. For some inexplicable reason, occasionally, a record would just not get written to a file. Maybe I did not catch an error in the code properly. Or maybe an insert just “fell off” without an error being reported by the ISAM file system. I don’t remember clearly now. But I remember the frustration of seeing the beast raise its ugly head time and again.

Periodically Checking for Data Integrity

Finally, I decided to write a batch program that would read all the transaction records from the ISAM files and examine their contents for data integrity. More specifically, if for any transaction, the sum of debits and the sum of credits did not match, it would print an error report. I requested the client to run that batch program once every day and call me up if it reported any errors.

Based on these reports, I gradually found and fixed the bugs in my code. As the error reports stopped reporting even the occasional mismatches, and Trial Balance (and other reports) continued to produce correct output, the client started running the batch program less and less frequently.

Making a Habit of Data Integrity Testing

When I worked on my next project (an Inventory Management System), I used dBase III Plus. It was a popular, entry-level relational database system back in the day. Programming was much easier with dBase III Plus than with COBOL. Database operations were also much easier to handle compared to ISAM files. However, there was no referential integrity validation, no other constraints, no triggers, no stored procedures, no atomic transactions, no error handling in dBase III Plus.

In the application that I was building, there were derived values in tables. A POHeader table would, for example, store the sum of values from PODetails table. And things would go wrong sometimes, due to programming errors. A record would be added or modified in the PODetails table, but the sum would not be correctly updated in the POHeader table.

I remembered the lessons from the previous project. I decided to implement a Data Integrity Testing program in this system too. In fact, now, my program not only checked and printed a list of such integrity problems in the data, but also sought the permission from the user to automatically correct those problems wherever possible.

I saw immense benefits from this approach. It soon became my habit. In almost every application that I built, I implemented a Data Integrity Testing program as soon as possible. Looking back, now I realize this was a kind of test automation of a very rudimentary type.

Much later, when I built my first application using a robust relational database like Oracle, I kept up with this habit. After all, not all validations and calculations of derived values can be (or should be) handled by table constraints or triggers.

Lessons Learned and Recommendations

Data Integrity checks remind me of similar processes in our lives – regular housekeeping, getting your vehicle a regular ‘service’ or ‘check-up’ or a frequently observed phenomenon in IT – computer virus scans! So even when your house is in order or your vehicle is running fine or the incoming files and connections are scanned properly, we perform the ‘maintenance’ tasks to ensure that things keep running smoothly. Similarly a discrepancy in data, if not caught in time may cause numerous problems down the line, so it’s best to look at the data regularly and address all the inconsistencies that may have crept in.

Errors do creep in in the stored data. Mostly due to programming errors. Sometimes (though rarely) due to bugs in the database platforms. Many of these errors remain unnoticed for a long time, and then sometimes they hit hard.

Guard yourself against such risks by implementing a Data Integrity Testing story at the earliest. Consider the following possibilities:

  • Data within a record that may be internally inconsistent. For example, value = quantity * rate. If you store the value field in a record as well, is it really equal to “quantity * rate”?
  • Inconsistencies between parent and child (or header and details) tables. Let’s say you have PurchaseOrders and PurchaseOrderLineItems tables in your application. There is a 1-to-many relationship between these tables. Further, let us suppose that you store the sum of values for all PurchaseOrderLineItems records for a PurchaseOrder in the corresponding PurchaseOrders record. Check if the derived value in the PurchaseOrders record is equal to the sum of values from the PurchaseOrderLineItems records.

  • Inconsistencies across peer records in a table or across tables. For example, if you store the fact that “A paid 5 bitcoins to B” in one record, and store that “B received 5 bitcoins from A” in another record, are these two records mutually consistent.
  • Inconsistencies in any other derived values. For example, if, in a financial accounting system, you have multiple tables for storing different types of transactions, and some table where you maintain the current balance for each account head, is the value for the current balance field correctly stored?

Also consider:

  • An appropriate frequency for running the data integrity testing batch option? Run it too frequently and the performance of the system may be adversely affected. Run it too infrequently and the errors remain hidden for a long time. One idea could be to begin with a reasonably high frequency, such as once a day. If this has no noticeable performance impact, you can continue to run it on a daily basis. Otherwise, you could gradually reduce the frequency over a period of time, as the confidence increases (due to no errors being found for a very long period of time).
  • User Experience concerns or search and reporting needs sometimes make it necessarry to duplicate the same data across two databases or two information stores and in this scenario as well, it would be worthwhile to maintain a batch (or cron) job that validates data integrity. This is particularly important in micro-services scenarios and where ‘API is the product’.
  • If it is possible to fix the errors automatically, ask the user to specify whether such errors should be fixed automatically, or only based on manual action. Oftentimes you may find it valuable to inform the user what the fix will be and provide a way to ‘reverse’ the fix.
  • Generating an automatic alert (email / dashboard) whenever any such errors are found. Of course, data security and privacy concerns must be appropriately dealt with – so your email may have information about the integrity issue but you may want to remove sensitive information from the email anyway.

Closing Thoughts

As products and applications mature, as the business needs and technology evolves and your project grows over time, bad data may prove to be the Achilles’ heel of the operation hampering scale, reporting or even day-to-day operations. So in the spirit of test driven development (Test Early, Test Often) it is really valuable to test data integrity against known business rules on a regular basis.

12 responses to “Data Integrity Testing”

  1. Aerie coupon says:

    Great wordpress blog here.. It’s hard to find quality writing like yours these days. I really appreciate people like you! take care

  2. Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to far added agreeable from you! By the way, how could we communicate?

  3. SEXY GIRL 【身體保養系列】珠光美肌保濕乳液 (金銅色)的商品介紹 SEXY GIRL,身體保養系列,珠光美肌保濕乳液 (金銅色)

  4. You currently know for that reason significantly on the subject of the following subject matter, manufactured my family individually accept it as true from numerous a lot of attitudes. The for instance women and men aren’t intrigued till it really is think about execute having Girl gaga! Your special items wonderful.. koleksi youtube indonesia Continuously preserve it way up!

  5. 關于Ion Magnum技術: 它的作用是加快脂肪代謝而轉化成肌肉。專業設計的微電流模擬大腦到肌肉的正常神經傳導。乙醯膽鹼及ATP(産生能量的物質)都是由神經末梢釋放的。神經元共振導致神經末梢持續不斷的釋放ATP,甚至能達到正常釋放量的500。 Ion Magnum應用的是世界定級的神經生理學技術,它加快脂肪燃燒的速度,增强肌肉收縮,提高基礎代謝率(指的是你靜息狀態下消耗卡路里的速率)。複雜的微電流包含2000次與正常生理過程的相互作用,由此達到人體自然狀態下所不能達到的效果。它可以加快能量的轉化,增强體力和運動能力。 Ion Magnum目前由位於英國的創新科學研究中心開發、製造,該中心是由歐盟提供資金支持的。該設備及其組件均是由英國頂級的科學家手工製作的。該産品有CE標志,CE標志是歐洲共同市場的安全標志。 Ion Magnum基於最新的起搏器技術。

  6. ORION是一部擁有三重長脈衝 755nm/1064nm/532nm的激光儀,提供專業和最新的技術,強調其穩定性和便利性的能力。另外,最佳的參數是基於各種臨床結果提供的。三長脈衝激光系統 -Long pulsed Nd:YAG (1064nm) -Long pulsed Alexandrite (755nm) -Long Pulsed KTP (532nm) 先進技術 – 氣冷卻系統(ACD) – 智能面板 – 三波長 – 高電源 應用 -Long pulsed Nd:YAG (1064nm)●脫毛●嫩膚●血管病變●腿部靜脈曲張●痤瘡●灰指甲●疣-Long pulsed Alexandrite (755nm)●脫毛●美白肌膚●色素性病變●黑頭●黃褐斑●疤-Long Pulsed KTP (532nm)●血管病變●酒渣鼻●色素性病變●太陽雀斑●美白肌膚●鮮紅斑痣●血管瘤

  7. 除皺特點 純度高,效果好 見效迅速,無創無痛 安全精準,表情自然 唯一在臨床有20多年的注射材料,全球超過11000000人使用 適用範圍 魚尾紋、額頭紋、眉間紋、鼻紋和頸部皺紋都可以祛除; 最適合於早期的、不太明顯的皺紋 不須使用任何的鎮靜劑或局部麻醉劑,且Botox除皺治療後可立即繼續進行正常的活動。 Botox除皺可以利用午休時間約診即可,完全不影響工作的安排。BOTOX肉毒桿菌素是一種高度純化的蛋白質,經過注射之後,能夠使導致動態皺紋的肌肉得到放鬆,它能夠阻斷導致肌肉收縮的神經細胞,使面部線條變得平滑並防止新皺紋形成。全球銷量第一品牌 在中國唯一獲得SFDA、FDA批准用於醫療美容的肉毒桿菌素 唯一一個擁有40年安全記錄的肉毒桿菌素 在70個國家獲得批准使用 BOTOX肉毒杆菌素

  8. 貨品資料 says:


  9. Restylane乃一系列用於美顏的長效透明質酸產品。 Restylane療程能有效撫平皺紋、美化臉型輪廓及唇部,並可改善膚質及肌膚的彈性。Restylane透明質酸作用原理 Restylane透明質酸均以非動物來源的透明質酸作為原料,獨有的NASHA™專利技術研製而成。微量Restylane透明質酸凝膠到皮膚中,可以讓您的面部肌膚恢復豐盈飽滿。 Restylane透明質酸中的透明質酸與人體內的天然透明質酸十分相似,使用前無需進行皮膚測試,引起過敏反應的機率亦極低。

  10. T字位 says:

    CO2激光(CO2 laser / 二氧化碳激光)的波長是10600nm,照射到皮膚上會被皮膚的水份吸收,瞬間將皮膚有問題的組織氣化。CO2激光可以用來消除各種皮膚問題包括疣、癦、痣、脂溢性角化、粉刺和汗管瘤等。當治療後,您的皮膚需要5至10天時間復原。期間保持傷口清潔便可。優點:CO2激光所切割的深度比刮除術較深,亦較精準。可以處理較深層的皮膚問題,例如去除癦、油脂粒、肉粒、疣、珍珠疣、角質增生等。其復原亦較快,減少留下疤痕的機會。它最大的優點在於,能夠進一步減少激光治療過程當中的熱損傷反應,提升了激光治療的安全性,治療過程中幾乎沒有疼痛感

  11. 4D says:

    Restylane乃一系列用於美顏的長效透明質酸產品。 Restylane療程能有效撫平皺紋、美化臉型輪廓及唇部,並可改善膚質及肌膚的彈性。Restylane透明質酸作用原理 Restylane透明質酸均以非動物來源的透明質酸作為原料,獨有的NASHA™專利技術研製而成。微量Restylane透明質酸凝膠到皮膚中,可以讓您的面部肌膚恢復豐盈飽滿。 Restylane透明質酸中的透明質酸與人體內的天然透明質酸十分相似,使用前無需進行皮膚測試,引起過敏反應的機率亦極低。

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2017 Pragati Software Pvt. Ltd. All Rights Reserved.