Automation Framework

Data Strategies for a Test Automation Framework

Dennis Whalen

Feb 11, 2020 • 5 min read

When building a QA test automation framework, it's critical to define the strategy for dealing with test data. For example, what’s wrong with this scenario?

Given the user successfully logs into registration system
When the user searches for course number “ALG-4316”
Then the course is displayed
And the course has title of “Intermediate Algebra”
And the course has credit hours of “4”

Clearly the scenario contains hard-coded test data: course number, course name, and credit hours.

When first building test automation it's easy to start like this. We're testing in the dev environment, we know this course exists in the database, so let's start building some automated tests! Although it's easy to get started like this, it can quickly become impossible to sustain.

Let's take a look at some alternatives for dealing with test data.

Option 1 - Hard-coded test data

Overview

The architecture behind the hard-coded test data option is not too difficult to understand. With this option, the test data is defined within the test process and stored in data files or hard-coded directly in the test.

Alt Text

Pros:

Start building and completing tests quickly

Cons:

Long term test maintainability and brittleness
Issues running tests in different environments/databases
Could require a database restore strategy prior to running the test

When appropriate?

Although all tests may contain some hard-coded test data, it’s really only appropriate for [1] static data that will not change across environments, for [2] data that is not relevant to the test, or for [3] test setups where you can always restore the database to a known state before testing.

OPTION 2 - Real time test data extract

Let’s try the gherkin again and remove the hard-coded test data:

Given the user successfully logs into registration system
When the user searches for an active course
Then the course is displayed
And the course title is correctly displayed
And the course credit hours are correctly displayed

With this option we are no longer defining the test data in the test. The automation code that’s executed behind this test will be responsible for identifying the necessary test data and expected results. But how?

Overview

With real time test data extract, the test process will extract the necessary test data from the application as we execute the test. So instead of hard-coding the course number, we can first extract a valid course number and associated course detail from the application, and then use that data in our test.

As shown in the diagram below, the data can be extracted via an existing application API or by reading it directly from the application database.

Real time test data extract

Pros:

Tests are not reliant on specific hard-coded test data
Tests will work across multiple test environments

Cons:

Need a reliable mechanism to read from the application (API or DB)
Assumes the database already contains data you need
Could have issues with concurrent tests grabbing same test data

When appropriate?

This option is appropriate when [1] the necessary test data exists in the database, [2] there are no concerns with concurrent tests potentially using the same data, and [3] we have read access to the database, either through API or direct database access.

OPTION 3 - Real time seeding

Overview

With real time test data seeding, the test process will inject the necessary test data into the application as we execute the test. So instead of testing with an existing course number, we can first inject a valid course number and associated detail into the application, and then use that data in our test.

As shown in the diagram below, the data can be injected via an existing application API or by inserting it directly from the application database.

Alt Text

Pros:

Tests are not reliant on specific hard-coded test data
Tests will work across multiple test environments
No data sharing issues with concurrent tests

Cons:

Need a mechanism to write to database (API or direct)
Seeding data may be more complex that just extracting existing data

When appropriate?

This option is appropriate when [1] the necessary test data may not already exist in the database, [2] tests will run concurrently and can’t share the same data, and [3] we have write access to the database, either through API or direct database access.

OPTION 4 - Decoupled extract

The options describe above may work well for functional testing, but what about performance testing? With performance testing we’ll simulate multiple concurrent users to verify the application can handle the required workload and respond within the expected response times.

Interacting with the application to acquire test data during a performance test will just apply more load to the application. This load is not representative of a real-world scenario, and would likely skew the test results and server performance statistics.

To get around this issue we can decouple the test data creation process from the testing process.

Alt Text

As represented above, the necessary test data is injected into the application prior to the start of the performance testing. In addition to injecting the test data, we’ll also write that test data to a data repository on a server that is separate from the application.

When the performance test starts, it will read test data from that test data repository. This strategy will allow the performance tests to be driven by test data without seeding or extracting it during the test.

Although this process is probably the most complex method for dealing with test data, it does provide an additional bonus; the test data generated for performance testing can also be consumed by the functional UI and API tests, as depicted above.

Conclusion

As you define the test data strategy for your project, there is clearly not one correct way to do it. In fact, you will likely use various options on the same project, depending on the needs of specific tests.

Regardless of your strategy, be sure to think about it up front, and refine it as you learn more about what works and what doesn’t.

Option 1 - Hard-coded test data

Overview

When appropriate?

OPTION 2 - Real time test data extract

Overview

When appropriate?

OPTION 3 - Real time seeding

Overview

When appropriate?

OPTION 4 - Decoupled extract

Conclusion

Sign up for more like this.