Record Details

Improved text classification through label clustering

ScholarsArchive at Oregon State University

Field Value
Title Improved text classification through label clustering
Names Vanderschuere, Christopher (creator)
Fern, Xiaoli (advisor)
Date Issued 2015-06-02 (iso8601)
Note Honors Bachelor of Science (HBS)
Abstract This paper introduces an approach to text classification for semi-structured label systems that have poor performance with standard methods. With the perspective that perfect classification for such a system is unattainable, we demonstrate an automated procedure to isolate the learnable elements of the problem. Through analysis of an example dataset, we identify attributes of the label system that hinder performance and demonstrate through manual methods that minimizing these attributes will lead to improved performance. Further we present that label clustering effectively minimizes these attributes. We then show that with a combination of frequency, co-occurrence, and document similarity we are able to construct label clusters in an automated fashion. Finally, we demonstrate that by using label clusters we are able to improve classification performance without excessively limiting the label space.
Genre Thesis
Access Condition http://creativecommons.org/licenses/by-nc/3.0/us/
Topic machine learning
Identifier http://hdl.handle.net/1957/56076

© Western Waters Digital Library - GWLA member projects - Designed by the J. Willard Marriott Library - Hosted by Oregon State University Libraries and Press