I started writing and speaking about the significance of the EU General Data Protection Regulation (GDPR) for testing about five years ago. My alarm then, at the implications of the tightening legislation, was frequently met with two forms of response:
1. The skeptic: “Big organizations will simply group together and resist this in the courts. Nothing will change in practice and there’s no way that national data protection agencies will be able to demand so much change so quickly, let alone levy fines this big.”
2. The gambler: “Fines will still only be levied following high-profile data breaches. There’s no way agencies are going to start performing regular audits, let alone audit my company. Besides, the chances of us suffering a data breach are slim to none – it’s never happened before!”
2019: An Issue That Can’t Be Ignored
Fast forward five years and the announcement last week of two eye-watering fines casts doubt on both responses. The punishments are a reminder of the real threat of data breaches, but also a serious statement of intent regarding the enforcement of the GDPR.
First, the UK’s Information Commissioner’s Office (ICO) announced a record fine of £183 million for British Airways, relating to the harvesting of 500,000 customer details by attackers. That reflects roughly 1.5 percent of BA’s annual worldwide turnover for the previous year, smashing the ICO’s previous record fine of £500,000. National enforcement agencies appear willing to impose the full force of the GDPR’s deterrents.
The announcement of an intended £99.2 million fine for Marriott International came a day later, relating to the exposure of 339 million guests’ information. 30 million of the guests’ records belong to Europeans, but Marriott is a US company. This dispels further skepticism regarding the ability of national agencies to enforce the GDPR’s global scope.
Authorities in each instance point to a lack of sufficient security measures, and also to the responsibility organizations of every size have for the data they process. So, how does this relate to testing practices?
We Need to Talk About TDM Practices…
From a QA perspective, one glaring practice screams security risk: The use of production data in test and development environments. This has long been warned against from a data privacy perspective, yet 65 percent of organizations still use potentially sensitive production data in testing.[i]
Production data does appear an obvious place to source production-like data for testing. The issue is that test and development environments are necessarily less secure than production so that any sensitive data stored in them increases the risk of a data breach.
Then there are the rights of European Data Citizens, which have been strengthened by the GDPR. These rights apply regardless of whether a data breach has occurred, and present further challenges for current QA practices.
The Rights to Data Erasure and Data Portability are good examples. An EU Data Subject can request all that all their data is erased “without delay,” and can also ask for a complete copy of their data stored by an organization.
This presents a logistical nightmare for current Test Data Management (TDM) practices. Many organizations store data across test environments, in unmanaged formats like spreadsheets on testers’ local machines. Such organizations struggle to know where certain data is kept, and will, therefore, struggle to identify, copy and delete it on demand.
Improving Data Security and Test Data Quality
The good news is that using production data in test environments is frequently avoidable. Synthetic test data generation is today capable of generating realistic test data for even complex systems, rapidly mirroring the data dependencies found in production.
Quality synthetic test data is built from a model of the metadata found in production. It reflects even complex patterns in data like temporal trends, all while remaining wholly fictitious. It, therefore, supports accurate and stable test execution, without the risk of exposing sensitive information.
The benefit of increased security is furthermore coupled with a significant quality gain for QA. Synthetic data can be generated for the numerous data combinations not found in existing production data, including the negative scenarios and outliers needed for complete test coverage.
Testers can furthermore test new and unreleased functionality for which there is no user-generated data, working to identify defects prior to a release. Improving data security in testing is not therefore just a logistical issue: it can drive up test coverage, improving the quality of software and reducing defect remediation efforts.
Organizations will not be able to switch to using wholly synthetic test data overnight. Nonetheless, an effective TDM strategy should aim to replace production data gradually with fictitious test data. This “hybrid approach” continues working with production data where needed, in time replacing all test data sources with fictitious, coverage-enhanced equivalents. Testers and data protection officers (DPOs) can then enjoy peace of mind, all while improving application quality.
Thanks for reading!
[Image: KRiemer, Pixabay]