Wednesday, October 19, 2011

Paper Reading #21: Human model evaluation in interactive supervised learning

References
Human model evaluation in interactive supervised learning by Rebecca Fiebrink, Perry R. Cook, and Daniel Trueman. Published in the CHI '11 Proceedings of the 2011 annual conference on Human Factors in Computing Systems

Author Bios

  • Rebecca Fiebrink is currently an assistant professor in Computer Science at Princeton University. She holds a PhD from Princeto.
  • Perry R. Cook is a professor emeritus at Princeton University in Computer Science and the Department of Music
  • Daniel Trueman is a musician, primarily with the fiddle and the laptop.   He currently teaches composition at Princeton University.
Summary

Hypothesis      
The researchers propose a system that allows trainers to supervise machine learning and provide valuable information regarding why certain output is generated and suggesting ways to fix problems. The hypothesis is that a system such as this can give users more of the information they want and help them build better machine learning applications.

Methods
There was a total of three studies of people using supervised learning. In the first study focused on the design process with the seven composers with the goals to refine the Wekinator. The participants mostly spent the time to meet regularly and discuss the software in terms of its usefulness to his/her specific work and possible improvements. The second study required students to use the Wekinator for an assignment where supervised learning was necessary for music performance systems. The students were asked to use the input device to make two gesture controlled music performance systems. The last study was done with a professional musician to produce a gesture recognition system for a cello bow that was equipped with sensors. The point of the study was to produce a gesture classification for data capturing to create musically appropriate classes.
Results
Students in Study B retrained the algorithm an average of 4.1 times per task (σ = 5.1), and the cellist in C retrained an average of 3.7 times per task (σ = 6.8). For Study A, participants’ questionnaires indicated that they also iteratively retrained the models, and they almost always chose to modify the models only by editing the training data set. In all studies, retraining of the models was nearly always fast enough to enable uninterrupted interaction with the system. In Study A, composers never used cross-validation. In B and C, cross-validation was used occasionally; on average, students in B used it 1.0 times per task (σ = 1.5), and the cellist in C used it 1.8 times per task (σ = 3.8). Participants in A only used direct evaluation; participants in B performed direct evaluation an average of 4.8 (σ = 4.8) times per task, and the cellist in C performed direct evaluation 5.4 (σ = 7.6) times per task .There was no objectively right or wrong model for evaluating correctness. Users found the software useful, highly agreed that the Wekinator allowed them to create more expressive models than other techniques.They held an implicit error cost function that variably penalized model mistakes based on both the type of mis-classifications and their locations in the gesture space.
Contents
The researchers present work studying how users evaluate and interact with supervised learning systems.  They examine what sort of criteria is used in the evaluation and present observations of different techniques, such as cross-validation and direct validation.  The purpose of the research is both to make judgments of algorithm performance and improve training models, in addition to providing more effective training data.
Discussion
I think the researchers achieve their goal of proving that supervised learning benefits the users in building a better model. This is interesting because machine learning is still in a young stage of life and studies like this make adoption easier and more practical than ever before.

No comments:

Post a Comment