Human model evaluation in interactive supervised learning by Rebecca Fiebrink, Perry R. Cook, and Daniel Trueman. Published in the CHI '11 Proceedings of the 2011 annual conference on Human Factors in Computing Systems
Author Bios
- Rebecca Fiebrink is currently an assistant professor in Computer Science at Princeton University. She holds a PhD from Princeto.
- Perry R. Cook is a professor emeritus at Princeton University in Computer Science and the Department of Music
- Daniel Trueman is a musician, primarily with the fiddle and the laptop. He currently teaches composition at Princeton University.
Summary
Hypothesis
The researchers propose a system that allows trainers to supervise
machine learning and provide valuable information regarding why certain
output is generated and suggesting ways to fix problems. The hypothesis
is that a system such as this can give users more of the information
they want and help them build better machine learning applications.
Methods
There was a total of three studies of people using supervised learning. In the first study focused on the design process with the seven composers with the goals to refine the Wekinator. The participants mostly spent the time to meet regularly and discuss the software in terms of its usefulness to his/her specific work and possible improvements. The second study required students to use the Wekinator for an assignment where supervised learning was necessary for music performance systems. The students were asked to use the input device to make two gesture controlled music performance systems. The last study was done with a professional musician to produce a gesture recognition system for a cello bow that was equipped with sensors. The point of the study was to produce a gesture classification for data capturing to create musically appropriate classes.
There was a total of three studies of people using supervised learning. In the first study focused on the design process with the seven composers with the goals to refine the Wekinator. The participants mostly spent the time to meet regularly and discuss the software in terms of its usefulness to his/her specific work and possible improvements. The second study required students to use the Wekinator for an assignment where supervised learning was necessary for music performance systems. The students were asked to use the input device to make two gesture controlled music performance systems. The last study was done with a professional musician to produce a gesture recognition system for a cello bow that was equipped with sensors. The point of the study was to produce a gesture classification for data capturing to create musically appropriate classes.
Results
Students
in Study B retrained the algorithm an average of 4.1 times per task (σ =
5.1), and the cellist in C retrained an average of 3.7 times per task
(σ = 6.8). For Study A, participants’ questionnaires indicated that they
also iteratively retrained the models, and they almost always chose to
modify the models only by editing the training data set. In all studies,
retraining of the models was nearly always fast enough to enable
uninterrupted interaction with the system. In
Study A, composers never used cross-validation. In B and C,
cross-validation was used occasionally; on average, students in B used
it 1.0 times per task (σ = 1.5), and the cellist in C used it 1.8 times
per task (σ = 3.8). Participants
in A only used direct evaluation; participants in B performed direct
evaluation an average of 4.8 (σ = 4.8) times per task, and the cellist
in C performed direct evaluation 5.4 (σ = 7.6) times per task .There
was no objectively right or wrong model for evaluating correctness. Users
found the software useful, highly agreed that the Wekinator allowed
them to create more expressive models than other techniques.They held an
implicit error cost function that variably penalized model mistakes
based on both the type of mis-classifications and their locations in the
gesture space.
Contents
The researchers present work studying how users evaluate and interact with supervised learning systems. They examine what sort of criteria is used in the evaluation and present observations of different techniques, such as cross-validation and direct validation. The purpose of the research is both to make judgments of algorithm performance and improve training models, in addition to providing more effective training data.
DiscussionThe researchers present work studying how users evaluate and interact with supervised learning systems. They examine what sort of criteria is used in the evaluation and present observations of different techniques, such as cross-validation and direct validation. The purpose of the research is both to make judgments of algorithm performance and improve training models, in addition to providing more effective training data.
I think the researchers achieve their goal of proving that supervised learning benefits the users in building a better model. This is interesting because machine learning is still in a young stage of life and studies like this make adoption easier and more practical than ever before.
No comments:
Post a Comment