Improving Test Data Generation with Symbolic Execution and Mutation Testing

Generating effective test cases is crucial for thorough software testing. However, creating test cases that capture all potential scenarios can be a complex task. In this blog, I explore the use of Klee's symbolic execution engine to create test cases and examine how mutation testing can improve the quality of these cases.

How Klee's Symbolic Execution Works

Symbolic execution is a method for exploring all possible execution paths in a program. Klee, a symbolic execution engine built on LLVM, enables the generation of test data, which refers to the test inputs for a specific test case. Klee's compatibility with LLVM makes it a versatile tool for various programming languages.

Yet, the question remains: how do we determine the effectiveness of these test data? Mutation testing is introduced to test the resilience of code against minor modifications, ensuring that the test data is truly effective.

int multiplyByTen(int b)
{
  int a;
  a = b * 10;
  if (a == 20)
  {
      printf("a is 20");
  }
  else
  {
      return a;
  }
}

int main()
{
    int b;
    klee_make_symbolic(&b, sizeof(b), "b");
    multiplyByTen(b);
}

When Klee's symbolic execution engine is executed on the function multiplyByTen, it symbolically processes the function with the symbolic variable b. It then identifies the if statement and collects the constraints. In this case, the function has two constraints: b * 10 == 20 for the if statement and b * 10 != 20 for the else statement. After these constraints are collected, it attempts to solve them using a constraint solver such as the Z3 theorem prover, a powerful tool known for its efficiency and robustness.

Running Klee's symbolic execution on the multiplyByTen function with the symbolic variable b allows Klee to generate test data. Klee identifies conditional branches and gathers constraints — b * 10 == 20 for the if branch and b * 10 != 20 for the else. These constraints are then solved by Z3's constraint solver to generate the actual test data.

Klee outputs a .query file with the derived constraints. This file, understandable by constraint solvers, represents the logic to be tested.

array b[4] : w32 -> w8 = symbolic
(query [(Eq 20
             (Mul w32 10
                      (ReadLSB w32 0 b)))]
        false)

The query showcases the constraint b * 10 == 20. The solver is tasked with finding a counterexample, which, in this case, returns b = 2, demonstrating the function's potential test data. Klee continues this process for all paths, generating test data such as b = 2 and b = 0 for different branches of the function.

When examining boundary values, like in Boundary Value Analysis, b = 2 emerges as a critical point for the multiplyByTen function, revealing the significance of testing values around this boundary to ensure comprehensive coverage.

The Role of Mutation Testing

Applying mutation testing to the test data generated by Klee's engine provides an additional layer of scrutiny by creating variations of the original code, known as mutants, and evaluating the test data's ability to identify these variations.

// mutant 1: change == to >=
int multiplyByTenMutant1(int b)
{
    int a;
    a = b * 10;
    if (a >= 20)
    {
        printf("a >= 20");
    }
    else
    {
        return a;
    }
}

Interestingly, if you run the mutant 1 code through Klee's symbolic execution engine, it will generate the same test data as the original code.

// mutant 2: change == to <=
int multiplyByTenMutant2(int b)
{
    int a;
    a = b * 10;
    if (a <= 20)
    {
        printf("a <= 20");
    }
    else
    {
        return a;
    }
}

However, for the mutant 2 code, it generates a new set of test data: b=3, b=0. These mutants simulate common programming errors in conditionals, providing a meaningful test scenario.

Evaluating the Mutation Score

Mutation score is the ratio of detected mutants to the total, offering a quantitative assessment of the test data's effectiveness in uncovering faults.

Consider comparing mutation coverage before and after integrating new test data, such as b=3. Using a matrix helps visualize the relationship between test data and mutant detection:

Test Data	Mutant 1 (>=)	Mutant 2 (`<=`)
b=0	Survived	Killed
b=2	Survived	Survived
b=3	Killed	Survived

Without b=3, the mutation score is 0.5 (1 out of 2 mutants were killed). However, with b=3, the mutation score increases to 1.0 (2 out of 2 mutants were killed). Hence, we can conclude that the newly generated test data, b=3, is indeed good test data.

Conclusion

Complementary Nature of Mutation Testing and Symbolic Execution
- They collaboratively contribute to the development of thorough test cases. While symbolic execution generates initial cases, mutation testing fine-tunes code variation detection.
Mutation Testing uncovers new test scenarios
- It identifies additional scenarios not covered by initial testing strategies, such as the b=3 case, highlighting its utility in revealing crucial boundary conditions.

Can mutation testing refine symbolic execution-generated test data quality? Yes. Is this method scalable to real-world applications? The computational intensity and scalability challenges of both techniques make widespread application challenging, yet the combined approach's potential needs further exploration.