Comparing and Combining Analysis-Based and Learning-Based Regression Test Selection

Year	Authors	Publisher	DOI
2022	Zhang, Jiyang Liu, Yu Gligoric, Milos Legunsen, Owolabi Shi, August	IEEE/ACM

Year

Authors

Publisher

DOI

2022

Zhang, Jiyang Liu, Yu Gligoric, Milos Legunsen, Owolabi Shi, August

IEEE/ACM

Techniques	Applicability
TCS	Industry Motivation

Techniques

Applicability

TCS

Industry Motivation

Experiment subject(s)	Industrial Partner	Programming Language
Open-source, unclear scale 10 open-source Java programs		Java

Experiment subject(s)

Industrial Partner

Programming Language

Open-source, unclear scale 10 open-source Java programs

Java

Effectiveness Metrics	Efficiency Metrics	Other Metrics
Selection/reduction count/percentage, Testing time	Execution time, Total/End-to-end time

Effectiveness Metrics

Efficiency Metrics

Other Metrics

Selection/reduction count/percentage, Testing time

Execution time, Total/End-to-end time

Information Approach	Algorithm Approach	Open Challenges
Change-based	Machine learning-based	- difficulty in creating the training set (they use mutation) '- combining other criteria for selection

Information Approach

Algorithm Approach

Open Challenges

Change-based

Machine learning-based

- difficulty in creating the training set (they use mutation) '- combining other criteria for selection

Supplementary Material
https://github.com/EngineeringSoftware/predictiverts

Supplementary Material

https://github.com/EngineeringSoftware/predictiverts

Abstract

Regression testing--rerunning tests on each code version to detect newly--broken functionality-is important and widely practiced. But, regression testing is costly due to the large number of tests and the high frequency of code changes. Regression test selection (RTS) optimizes regression testing by only rerunning a subset of tests that can be affected by changes. Researchers showed that RTS based on program analysis can save substantial testing time for (medium-sized) open-source projects. Practitioners also showed that RTS based on machine learning (ML) works well on very large code repositories, e.g., in Facebook's monorepository. We combine analysis-based RTS and ML-based RTS by using the latter to choose a subset of tests selected by the former. We first train several novel ML models to learn the impact of code changes on test outcomes using a training dataset that we obtain via mutation analysis. Then, we evaluate the benefits of combining ML models with analysis-based RTS on 10 projects, compared with using each technique alone. Combining ML-based RTS with two analysis-based RTS techniques-Ekstazi and STARTS-selects 25.34% and 21.44% fewer tests, respectively. CCS CONCEPTS * Software and its engineering $\rightarrow$Software testing and debugging.