Scalable Approaches for Test Suite Reduction

Year	Authors	Publisher	DOI
2019	Cruciani, Emilio Miranda, Breno Verdecchia, Roberto Bertolino, Antonia	IEEE	10.1109/ICSE.2019.00055

Year

Authors

Publisher

DOI

2019

Cruciani, Emilio Miranda, Breno Verdecchia, Roberto Bertolino, Antonia

IEEE

10.1109/ICSE.2019.00055

Techniques	Applicability
TSR	Industry Motivation

Techniques

Applicability

TSR

Industry Motivation

Experiment subject(s)	Industrial Partner	Programming Language
SIR and Defects4J Generated test suite (500K+ tests) Research dataset, medium to large scale Synthetic data, very large scale		C, Java

Experiment subject(s)

Industrial Partner

Programming Language

SIR and Defects4J Generated test suite (500K+ tests) Research dataset, medium to large scale Synthetic data, very large scale

C, Java

Effectiveness Metrics	Efficiency Metrics	Other Metrics
Fault Detection Loss	Execution time, Scalability

Effectiveness Metrics

Efficiency Metrics

Other Metrics

Fault Detection Loss

Execution time, Scalability

Information Approach	Algorithm Approach	Open Challenges
Test code	Similarity / distance-based	Filter out test cases that do not impact modified files; more efficient heuristics.

Information Approach

Algorithm Approach

Open Challenges

Test code

Similarity / distance-based

Filter out test cases that do not impact modified files; more efficient heuristics.

Supplementary Material
https://zenodo.org/record/2550079 https://github.com/ICSE19-FAST-R/FAST-R/tree/v1.0.0

Supplementary Material

https://zenodo.org/record/2550079 https://github.com/ICSE19-FAST-R/FAST-R/tree/v1.0.0

Abstract

Test suite reduction approaches aim at decreasing software regression testing costs by selecting a representative subset from large-size test suites. Most existing techniques are too expensive for handling modern massive systems and moreover depend on artifacts, such as code coverage metrics or specification models, that are not commonly available at large scale. We present a family of novel very efficient approaches for similaritybased test suite reduction that apply algorithms borrowed from the big data domain together with smart heuristics for finding an evenly spread subset of test cases. The approaches are very general since they only use as input the test cases themselves (test source code or command line input).We evaluate four approaches in a version that selects a fixed budget B of test cases, and also in an adequate version that does the reduction guaranteeing some fixed coverage. The results show that the approaches yield a fault detection loss comparable to state-of-the-art techniques, while providing huge gains in terms of efficiency. When applied to a suite of more than 500K real world test cases, the most efficient of the four approaches could select B test cases (for varying B values) in less than 10 seconds.