Mahout Item based methods

On oluline, millist meetodit kasutada, lahendades erinevaid probleeme.

Üks lihtne näide. Meil on 1000 000 reaga andmehulk (UserID, ItemID, preference)

PearsonCorrelationSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 20:59:38 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 20:59:38 INFO file.FileDataModel: Reading file info…
12/11/16 20:59:40 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 20:59:40 INFO file.FileDataModel: Read lines: 1000209
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 6040 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 333 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 312 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 312 tasks in 2 threads
12/11/16 20:59:40 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 20:59:40 INFO eval.StatsCallable: Approximate memory used: 174MB / 275MB
12/11/16 20:59:40 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 974
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 526
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1837
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2544
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2157
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3781
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 325
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1439
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2211
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 313
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 867
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 864
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 255
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1932
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.8425337616195027
0.8425337616195027
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.510s
[INFO] Finished at: Fri Nov 16 20:59:41 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

real 0m8.268s
user 0m11.295s
sys 0m0.731s
[hduser@vm37 my-app]$

 

EuclideanDistanceSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:01:57 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:01:57 INFO file.FileDataModel: Reading file info…
12/11/16 21:01:59 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:01:59 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 299 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:01:59 INFO eval.StatsCallable: Average time per recommendation: 6ms
12/11/16 21:01:59 INFO eval.StatsCallable: Approximate memory used: 172MB / 274MB
12/11/16 21:01:59 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 304
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1351
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 119
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 437
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 961
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2400
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1679
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1044
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1114
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2620
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1472
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2233
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 815
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2215
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2246
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1896
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3346
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3320
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2855
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1442
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3043
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7924501920488942
0.7924501920488942
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.476s
[INFO] Finished at: Fri Nov 16 21:02:00 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

TanimotoCoefficientSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:03:04 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:03:04 INFO file.FileDataModel: Reading file info…
12/11/16 21:03:06 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:03:06 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:03:06 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:03:07 INFO model.GenericDataModel: Processed 287 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 265 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 265 tasks in 2 threads
12/11/16 21:03:07 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 21:03:07 INFO eval.StatsCallable: Approximate memory used: 179MB / 274MB
12/11/16 21:03:07 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.79503719853545
0.79503719853545
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.632s
[INFO] Finished at: Fri Nov 16 21:03:07 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

LogLikelihoodSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:05:05 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:05:05 INFO file.FileDataModel: Reading file info…
12/11/16 21:05:07 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:05:07 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 297 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:05:08 INFO eval.StatsCallable: Average time per recommendation: 3ms
12/11/16 21:05:08 INFO eval.StatsCallable: Approximate memory used: 178MB / 275MB
12/11/16 21:05:08 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.813376035770476
0.813376035770476
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.558s
[INFO] Finished at: Fri Nov 16 21:05:08 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

Antud tulemustest on näha, et antud andmehulga puhul annab parema tulemuse TanimotoCoefficientSimilarity ja EuclideanDistanceSimilarity, mis kindlasti ei tähenda, et mõnes teises olukorras teisi meetodeid mitte kaaluda.

 

AGA saab alati paremini

SlopeOneRecommender 

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 22:49:49 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 22:49:49 INFO file.FileDataModel: Reading file info…
12/11/16 22:49:51 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 22:49:51 INFO file.FileDataModel: Read lines: 1000209
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 6040 users
12/11/16 22:49:51 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 302 users
12/11/16 22:49:51 INFO slopeone.MemoryDiffStorage: Building average diffs…
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 283 users
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 283 tasks in 2 threads
12/11/16 22:49:55 INFO eval.StatsCallable: Average time per recommendation: 22ms
12/11/16 22:49:55 INFO eval.StatsCallable: Approximate memory used: 277MB / 481MB
12/11/16 22:49:55 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7236027312886469
0.7236027312886469
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 10.150s
[INFO] Finished at: Fri Nov 16 22:49:55 EET 2012
[INFO] Final Memory: 8M/459M
[INFO] ————————————————————————

real 0m11.922s
user 0m15.635s
sys 0m0.975s