Lets imagine we have data about how user rated our products they have bought.
userID – productID – rate.
So with mahout recommenditembased class we can recommend new products to our users. Here is simple command line example how can we do this.
lets create a file where we are going to put our present data about users, products and rates.
vim intro.csv
1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
Put it into hadoop dfs:
hdfs dfs -moveFromLocal intro.csv input/
We need output directory in hadoop dfs:
[speech@h14 ~]$ hdfs dfs -mkdir output
Now we can run recommend command:
[speech@h14 ~]$ mahout/bin/mahout recommenditembased –input input/intro.csv –output output/recommendation -s SIMILARITY_PEARSON_CORRELATION
Our result will be in hadoop dfs output/recommendation
[speech@h14 ~]$ hdfs dfs -cat output/recommendation/part-r-00000
1 [104:3.9258494]
3 [102:3.2698717]
4 [102:4.7433763]
But if we do not have rates. We have only users and items they have bought. We can still use mahout recommenditembased class.
speech@h14 ~]$ vim boolean.csv
1,101
1,102
1,103
2,101
2,102
2,103
2,104
3,101
3,104
3,105
3,107
4,101
4,103
4,104
4,106
5,101
5,102
5,103
5,104
5,105
[speech@h14 ~]$ hdfs dfs -moveFromLocal boolean.cvs input/
[speech@h14 ~]$ mahout/bin/mahout recommenditembased –input /user/speech/input/boolean.csv –output output/boolean -b -s SIMILARITY_LOGLIKELIHOOD
[speech@h14 ~]$ hdfs dfs -cat /user/speech/output/boolean/part-r-00000
1 [104:1.0,105:1.0]
2 [106:1.0,105:1.0]
3 [103:1.0,102:1.0]
4 [105:1.0,102:1.0]
5 [106:1.0,107:1.0]
[speech@h14 ~]$