Synthetic test data. From development to testing.

QA
synthetic test data
Generate synthetic test data
Provision to any distibuted architecture
March 04, 2024 , 6 min read
Synthetic test data

In the IT industry, everyone is currently discussing synthetic test data, test data management, and related topics. One key concept gaining traction is the synthetic test data platform. This article aims to clarify these concepts and explore how generate synthetic test data can revolutionize the testing process.

Before we delve deeper into the topic of synthetic test data, let me provide a brief explanation of what test data represents in this specific context.

Feel free to skip this section if you are already familiar with the software development cycle and QA work.

Software development cycle

Or you might also come across the term CI/CD (which is part of DevOps), standing for Continuous Integration and Continuous Delivery. All of these fancy terms essentially refer to the widely adopted practice of how the software is delivered to the end customers.

For our article, we will simplify that process into three steps:

  • Coding: Where the software logic is being put together, by programmers when they type code – easy (or sometimes not).
  • Testing: This code has to be tested somewhere so it wouldn't end up full of misbehaviors. Effective testing often relies on using a synthetic test data platform to provide accurate and relevant data.
  • Deployment: After successful testing, we can deploy it, meaning move and expose it to the end users.

Then you repeat this cycle until the software is perfect (which is never).

Environments

Now, the last crucial point to understand is that Testing and Deployment are carried out in different environments.

You can think of an environment as a duplicate of the software, where one copy is exclusively for testing and the development team, while the second one is stable and intended for end users.

The two primary types of environments are known as test environments (test) and production environments (prod).

How do we get test data on the test environments?

Now that you have a grasp of how software development works and the concept of environments, the next logical question would be: How do we obtain data for the test environment?

That's an excellent question, and I'll present you with a few possible answers.

Data from esing test environments

These types of data are created simply by using the software in the test environment. This is an example of non-synthetic data, as they are organically generated, typically by QA engineers. However, for more scalable and flexible solutions, a synthetic test data platform is often preferred.

Although there are usually no issues with the quality of test data in this case, this approach is not scalable or robust enough for larger, enterprise-ready solutions, and it can also be quite time-consuming.

Data from prod environments

One of the most common ways to obtain data for your test environment is by copying and anonymizing the production data. However, this approach may not always align with the requirements of modern software development.

Although this might appear to be a sensible approach, it comes with several drawbacks, especially in terms of test data anonymization alternatives. Let me highlight one of the significant issues – this approach is not compatible with the software development lifecycle mentioned earlier.

Why?

Because in order to properly test and deploy a new feature, let's say, you need to have a new data-model structure with corresponding data in place. That is not possible since you get your test data from the production.

Insert synthetic test data

Finally, we get to the actually 100% synthetic test data.

It is an artificial version of data that shares the same properties as your real data. This data could then be inserted into the systems under test. There are multiple ways how this data could be generated. It could simply follow the desired pattern, or even utilize the newest AI/ML technology to give statistically accurate data-sets for desired systems. Insertions happen automatically on the test environment before the actual testing, so we would avoid all of the pitfalls of the previously mentioned types of test data. A well-designed synthetic test data platform makes this process seamless and efficient.

Conclusion

Thank you for hanging with me, and hopefully you now have a grasp of understanding what is and what isn't synthetic test data. For a more advanced and automated approach, consider leveraging a synthetic test data platform that offers self-service test data solutions, ensuring you always have the right data at the right time.

Don't forget to also check the Sixpack product, which provides an elegant solution for generate synthetic test data within your test systems that solves all of the mentioned problems, and even more.

Cheers.

Read more: How to choose the right test data for your project