Data-driven test selection at scale

Year	Authors	Publisher	DOI
2021	Mehta, Sonu Farmahinifarahani, Farima Bhagwan, Ranjita Guptha, Suraj Jafari, Sina Kumar, Rahul Saini, Vaibhav Santhiar, Anirudh	ACM	10.1145/3468264.3473916

Year

Authors

Publisher

DOI

2021

Mehta, Sonu Farmahinifarahani, Farima Bhagwan, Ranjita Guptha, Suraj Jafari, Sina Kumar, Rahul Saini, Vaibhav Santhiar, Anirudh

ACM

10.1145/3468264.3473916

Techniques	Applicability
TCS	Industry Motivation Industry Evaluation Industry Author Practitioner Feedback

Techniques

Applicability

TCS

Industry Motivation Industry Evaluation Industry Author Practitioner Feedback

Experiment subject(s)	Industrial Partner	Programming Language
Industrial proprietary, very large scale 22 large scale repositories at Microsoft (up to 60 million test suites!?)	Microsoft (USA)	Language-agnostic

Experiment subject(s)

Industrial Partner

Programming Language

Industrial proprietary, very large scale 22 large scale repositories at Microsoft (up to 60 million test suites!?)

Microsoft (USA)

Language-agnostic

Effectiveness Metrics	Efficiency Metrics	Other Metrics
Testing time, Cost-benefit model

Effectiveness Metrics

Efficiency Metrics

Other Metrics

Testing time, Cost-benefit model

Information Approach	Algorithm Approach	Open Challenges
	Machine learning-based	Developer experience, complexity of code changes, weighted test selection

Information Approach

Algorithm Approach

Open Challenges

Machine learning-based

Developer experience, complexity of code changes, weighted test selection

Abstract

Large-scale services depend on Continuous Integration/Continuous Deployment (CI/CD) processes to maintain their agility and code-quality. Change-based testing plays an important role in finding bugs, but testing after every change is prohibitively expensive at a scale where thousands of changes are committed every hour. Test selection models deal with this issue by running a subset of tests for every change. In this paper, we present a generic, language-agnostic and lightweight statistical model for test selection. Unlike existing techniques, the proposed model does not require complex feature extraction techniques. Consequently, it scales to hundreds of repositories of varying characteristics while capturing more than 99% of buggy pull requests. Additionally, to better evaluate test selection models, we propose application-specific metrics that capture both a reduction in resource cost and a reduction in pull-request turn-around time. By evaluating our model on 22 large repositories at Microsoft, we find that we can save 15%−30% of compute time while reporting back more than ≈99% of buggy pull requests.