Techniques |
Applicability |
TSR
|
Industry Motivation
|
Experiment subject(s) |
Industrial Partner |
Programming Language |
SIR and Defects4J
Generated test suite (500K+ tests)
Research dataset, medium to large scale
Synthetic data, very large scale |
|
C, Java |
Effectiveness Metrics |
Efficiency Metrics |
Other Metrics |
Fault Detection Loss
|
Execution time, Scalability
|
|
Information Approach |
Algorithm Approach |
Open Challenges |
Test code
|
Similarity / distance-based
|
Filter out test cases that do not impact modified files; more efficient heuristics.
|
Abstract
Test suite reduction approaches aim at decreasing software regression testing costs by selecting a representative subset from large-size test suites. Most existing techniques are too expensive for handling modern massive systems and moreover depend on artifacts, such as code coverage metrics or specification models, that are not commonly available at large scale. We present a family of novel very efficient approaches for similaritybased test suite reduction that apply algorithms borrowed from the big data domain together with smart heuristics for finding an evenly spread subset of test cases. The approaches are very general since they only use as input the test cases themselves (test source code or command line input).We evaluate four approaches in a version that selects a fixed budget B of test cases, and also in an adequate version that does the reduction guaranteeing some fixed coverage. The results show that the approaches yield a fault detection loss comparable to state-of-the-art techniques, while providing huge gains in terms of efficiency. When applied to a suite of more than 500K real world test cases, the most efficient of the four approaches could select B test cases (for varying B values) in less than 10 seconds.