[cloudera@cdh6 ~]$ nano lr_train.csv [cloudera@cdh6 ~]$ nano lr_train.csv [cloudera@cdh6 ~]$ [cloudera@cdh6 ~]$ cat lr_train.csv 21.6 1:208 15.5 1:152 10.4 1:113 31.0 1:227 13.0 1:137 32.4 1:238 19.0 1:178 10.4 1:104 19.0 1:191 11.8 1:130 26.5 1:220 16.0 1:140 9.5 1:100 28.3 1:200 20.1 1:150 22.6 1:170 24.5 1:200 25 1:185 14.3 1:120 [cloudera@cdh6 ~]$ hdfs dfs -put lr_train.csv [cloudera@cdh6 ~]$ nano lr_test.csv [cloudera@cdh6 ~]$ [cloudera@cdh6 ~]$ cat lr_test.csv 16 1:150 9 1:100 28 1:200 20 1:130 [cloudera@cdh6 ~]$ hdfs dfs -put lr_test.csv >>> lr_train = spark.read.format("libsvm").load("lr_train.csv") >>> lr_test = spark.read.format("libsvm").load("lr_test.csv") >>> lr_train.show() +-----+---------------+ |label| features| +-----+---------------+ | 21.6|(1,[0],[208.0])| | 15.5|(1,[0],[152.0])| | 10.4|(1,[0],[113.0])| | 31.0|(1,[0],[227.0])| | 13.0|(1,[0],[137.0])| | 32.4|(1,[0],[238.0])| | 19.0|(1,...