Indiana University

Study on Parallel SVM Based on MapReduce

TitleStudy on Parallel SVM Based on MapReduce
Publication TypeConference Proceedings
Year of PublicationSubmitted
Date Published07/2012
AuthorsZhanquan, S., and G. Fox
Refereed DesignationUnknown
Conference NameThe 2012 International Conference on Parallel and Distributed Processing Techniques and Applications
Series TitleProceedings of the 2012 International Conference on Parallel and Distributed Processing Techniques and Applications
Conference LocationLas Vegas NV USA
Publication Languageeng
KeywordsLarge scale data, MapReduce, Parallel SVM, Twister
Abstract Support Vector Machines (SVM) are powerful classification and regression tools. They have been widely studied by many scholars and applied in many kinds of practical fields. But their compute and storage requirements increase rapidly with the number of training vectors, putting many problems of practical interest out of their reach. For applying SVM to large scale data mining, parallel SVM are studied and some parallel SVM methods are proposed. Most currently parallel SVM methods are based on classical MPI model. It is not easy to be used in practical, especial to large scale data-intensive data mining problems. MapReduce is an efficient distribution computing model to process large scale data mining problems. Some MapReduce software were developed, such as Hadoop, Twister and so on. In this paper, parallel SVM based on iterative MapReduce model Twister is studied. The program flow is developed. The efficiency of the method is illustrated through analyzing practical problems.
URLFollow Link