Distributed LIBLINEAR: Libraries for Large-scale Linear Classification on Distributed Environments
We now support
-
MPI LIBLINEAR (released in July, 2023 and based on LIBLINEAR 2.47) and
-
Spark LIBLINEAR (released in August, 2015 and based on LIBLINEAR 1.96).
The development of distributed LIBLINEAR is still in its early
stage. Your comments are very welcome.
Introduction
MPI LIBLINEAR is an extension of LIBLINEAR on distributed environments.
The usage and the data format are the same as LIBLINEAR. Currently seven solvers are supported:
-
L2-regularized logistic regression (primal truncated Newton)
-
L2-regularized L2-loss linear SVM (dual)
-
L2-regularized L2-loss linear SVM (primal truncated Newton)
-
L2-regularized L1-loss linear SVM (dual)
-
Crammer & Singer multi-class classification
-
L1-regularized logistic regression (primal limited common directions)
-
L2-regularized logistic regression (dual)
-
L2-regularized logistic regression (primal limited common directions)
-
L2-regularized L2-loss linear SVM (primal limited common directions)
NOTICE: This extension can only run on Unix-like systems. Python and Matlab interfaces are not supported.
Spark LIBLINEAR is a Spark implementation based on LIBLINEAR
and integrated with Hadoop distributed file system.
This package is developed using Scala.
Currently it supports only two solvers:
-
L2-regularized logistic regression (primal)
-
L2-regularized L2-loss linear SVM (primal)
Download
MPI LIBLINEAR can be obtained by downloading the
zip file
Spark LIBLINEAR can be obtained by downloading the
zip file or
tar.gz file.
Please read the COPYRIGHT notice before using MPI LIBLINEAR and Spark LIBLINEAR.
MPI LIBLINEAR Documentation
For users who are interested in running MPI LIBLIEAR, we provide a practical guide of setting up its distributed environment.
You may also check our FAQ for MPI LIBLINEAR if you encounter any problems.
Technical details are in the following papers.
-
Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. Distributed Newton Method for Regularized Logistic Regression, PAKDD 2015.
-
C.-p. Lee, and K.-W Chang. Distributed Block-diagonal Approximation Methods
for Regularized Empirical Risk Minimization, MLJ 2020. (Supersedes the ICML 2015 version.)
-
C.-p. Lee, P.-W. Wang, W. Chen, and C.-J. Lin. Limited-memory common-directions method for large-scale optimization: convergence, parallelization, and distributed optimization, technical report, 2020. (Supersedes the SDM 2017 version.)
-
W.-L. Chiang, Y.-S. Li, C.-p. Lee, and C.-J. Lin.
Limited-memory Common-directions Method for Distributed L1-regularized Linear Classification
, SIAM International Conference on Data Mining, 2018.
Supplementary materials.
Spark LIBLINEAR Documentation
Technical details are in the following paper.
C.-Y. Lin, C.-H. Tsai, C.-P. Lee, and C.-J. Lin. Large-scale Logistic Regression and Linear Support Vector Machines Using Spark, IEEE International Conference on Big Data 2014 (supplementary materials).
For Spark LIBLINEAR users, we provide a guide for building distributed environments on VirtualBox.
For users who want to run Spark on Amazon EC2, please check a useful guide on Running Spark on EC2
to build the environment.
It automatically sets up Spark, Shark and HDFS on the cluster for you.
If you already have one Spark cluster, please check the running guide.
For implementation API, you can check the following document
Please send comments and suggestions to Chih-Jen Lin.