Record Details

You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems

ScholarsArchive at Oregon State University

Field Value
Title You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems
Names Groce, Alex (creator)
Kulesza, Todd (creator)
Zhang, Chaoqiang (creator)
Shamasunder, Shalini (creator)
Burnett, Margaret (creator)
Wong, Weng-Keen (creator)
Stumpf, Simone (creator)
Das, Shubhomoy (creator)
Shinsel, Amber (creator)
Bice, Forrest (creator)
McIntosh, Kevin (creator)
Date Issued 2014-03 (iso8601)
Note ©2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by the Institute of Electrical and Electronics Engineers and can be found at: http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=32.
Abstract How do you test a program when only a single user, with no expertise in software testing, is able to determine if the
program is performing correctly? Such programs are common today in the form of machine-learned classifiers. We consider the
problem of testing this common kind of machine-generated program when the only oracle is an end user: e.g., only you can
determine if your email is properly filed. We present test selection methods that provide very good failure rates even for small test
suites, and show that these methods work in both large-scale random experiments using a “gold standard” and in studies with
real users. Our methods are inexpensive and largely algorithm-independent. Key to our methods is an exploitation of properties
of classifiers that is not possible in traditional software testing. Our results suggest that it is plausible for time-pressured end
users to interactively detect failures—even very hard-to-find failures—without wading through a large number of successful (and
thus less useful) tests. We additionally show that some methods are able to find the arguably most difficult-to-detect faults of
classifiers: cases where machine learning algorithms have high confidence in an incorrect result.
Genre Article
Topic machine learning
Identifier Groce, A., Kulesza, T., Zhang, C., Shamasunder, S., Burnett, M., Wong, W., ... & McIntosh, K. (2014). You are the only possible oracle: Effective test selection for end users of interactive machine learning systems. IEEE Transactions on Software Engineering, 40(3), 307-323. doi:10.1109/TSE.2013.59

© Western Waters Digital Library - GWLA member projects - Designed by the J. Willard Marriott Library - Hosted by Oregon State University Libraries and Press