Distributed LIBLINEAR: Libraries for Large-scale Linear Classification on Distributed Environments

Machine Learning Group at National Taiwan University
Contributors

We now support

MPI LIBLINEAR (released in July, 2023 and based on LIBLINEAR 2.47) and
Spark LIBLINEAR (released in August, 2015 and based on LIBLINEAR 1.96).

The development of distributed LIBLINEAR is still in its early stage. Your comments are very welcome.

Introduction

MPI LIBLINEAR is an extension of LIBLINEAR on distributed environments. The usage and the data format are the same as LIBLINEAR. Currently seven solvers are supported:

L2-regularized logistic regression (primal truncated Newton)
L2-regularized L2-loss linear SVM (dual)
L2-regularized L2-loss linear SVM (primal truncated Newton)
L2-regularized L1-loss linear SVM (dual)
Crammer & Singer multi-class classification
L1-regularized logistic regression (primal limited common directions)
L2-regularized logistic regression (dual)
L2-regularized logistic regression (primal limited common directions)
L2-regularized L2-loss linear SVM (primal limited common directions)

NOTICE: This extension can only run on Unix-like systems. Python and Matlab interfaces are not supported.

Spark LIBLINEAR is a Spark implementation based on LIBLINEAR and integrated with Hadoop distributed file system. This package is developed using Scala. Currently it supports only two solvers:

L2-regularized logistic regression (primal)
L2-regularized L2-loss linear SVM (primal)

Download

MPI LIBLINEAR can be obtained by downloading the zip file

Spark LIBLINEAR can be obtained by downloading the zip file or tar.gz file.

Please read the COPYRIGHT notice before using MPI LIBLINEAR and Spark LIBLINEAR.

MPI LIBLINEAR Documentation

For users who are interested in running MPI LIBLIEAR, we provide a practical guide of setting up its distributed environment.

MPI LIBLINEAR Guide

You may also check our FAQ for MPI LIBLINEAR if you encounter any problems.

Technical details are in the following papers.

Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. Distributed Newton Method for Regularized Logistic Regression, PAKDD 2015.
C.-p. Lee, and K.-W Chang. Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization, MLJ 2020. (Supersedes the ICML 2015 version.)
C.-p. Lee, P.-W. Wang, W. Chen, and C.-J. Lin. Limited-memory common-directions method for large-scale optimization: convergence, parallelization, and distributed optimization, technical report, 2020. (Supersedes the SDM 2017 version.)
W.-L. Chiang, Y.-S. Li, C.-p. Lee, and C.-J. Lin. Limited-memory Common-directions Method for Distributed L1-regularized Linear Classification , SIAM International Conference on Data Mining, 2018. Supplementary materials.

Spark LIBLINEAR Documentation

Technical details are in the following paper.

C.-Y. Lin, C.-H. Tsai, C.-P. Lee, and C.-J. Lin. Large-scale Logistic Regression and Linear Support Vector Machines Using Spark, IEEE International Conference on Big Data 2014 (supplementary materials).

For Spark LIBLINEAR users, we provide a guide for building distributed environments on VirtualBox.

VirtualBox Guide

For users who want to run Spark on Amazon EC2, please check a useful guide on Running Spark on EC2 to build the environment. It automatically sets up Spark, Shark and HDFS on the cluster for you.

If you already have one Spark cluster, please check the running guide.

Running Spark LIBLINEAR

For implementation API, you can check the following document

Spark LIBLINEAR API

Please send comments and suggestions to Chih-Jen Lin.