~/spark-1.0.0-bin-hadoop1/
and Hadoop home is ~/hadoop-1.2.1/
.~/spark/
instead of ~/spark-1.0.0-bin-hadoop1/
,
while Hadoop will be placed at
~/ephemeral-hdfs/
insteads of ~/hadoop-1.2.1/
. Please properly change directory names, paths, master name and user name in this guide.
To simplify this guide,
we assume the names of master machine and user are "pineapple0" and "spongebob," respectively.
$ cd ~
$ tar zxvf spark-liblinear-1.95.tar.gz
$ mv spark-liblinear-1.95/ ~/spark-1.0.0-bin-hadoop1/
spark-liblinear-1.95.jar
spark-liblinear-1.95.jar
.
You can find this file at ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/spark-liblinear-1.95.jar
.
If you want to pack it by yourself,
please check README file at ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/README.spark
.
$ ~/hadoop-1.2.1/bin/start-all.sh
heart_scale
for example.
You can find dataset heart_scale
in the directory of Spark LIBLINEAR
and put it into HDFS by ~$ hadoop fs -put ~/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/heart_scale heart_scale
~$ hadoop fs -ls
$ ~/spark-1.0.0-bin-hadoop1/sbin/start-all.sh
$ cd ~/spark-1.0.0-bin-hadoop1/
$ ./bin/spark-shell --jars "/home/spongebob/spark-1.0.0-bin-hadoop1/spark-liblinear-1.95/spark-liblinear-1.95.jar"
scala> import tw.edu.ntu.csie.liblinear._
loadLibSVMData
.scala> val data = Utils.loadLibSVMData(sc, "hdfs://pineapple0:9000/user/spongebob/heart_scale")
scala> val model = SparkLiblinear.train(data, "-s 0 -c 1.0 -e 1e-2")
train()
.predict()
.scala> val LabelAndPreds = data.map { point =>
val prediction = model.predict(point)
(point.y, prediction)
}
scala> val accuracy = LabelAndPreds.filter(r => r._1 == r._2).count.toDouble / data.count
scala> println("Training Accuracy = " + accuracy)