Mahaut and UserSimilarity

Teemaks siis mahout ja kahe userID (X ja Y) vaheline sarnasus.

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/similarity/PearsonCorrelationSimilarity.html

Matemaatiline valem:

Vajame selleks org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity pakki, mida java keeles saame endale:

import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

Loome ka objekti similarity:

UserSimilarity similarity = new PearsonCorrelationSimilarity (model); // model on mahout DataModel objekt, mida siinkohal ei hakka lahti joonistama.

Selle tulemusena avaneb meile võimalused kasutada meetodit userSimilarity:

Double sim = similarity.userSimilarity(X,Y); // X ja Y, mis on toodud meetodi argumentidena on DataModel userID-d

java.lang.NoClassDefFoundError: org/apache/mahout/cf/taste/model/DataModel

Probleem

23:32:36 margusja@IRack> vimjava -classpath target/my-app-1.0-SNAPSHOT.jar com.mycompany.app.App
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/mahout/cf/taste/model/DataModel
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2442)
at java.lang.Class.getMethod0(Class.java:2685)
at java.lang.Class.getMethod(Class.java:1620)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:492)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:484)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.model.DataModel
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
… 6 more

Lahendus

Puudus viide org/apache/mahout/cf/taste/model/DataModel, mida saab häda pärast täpsustada:

00:06:39 margusja@IRack> javjava -classpath target/my-app-1.0-SNAPSHOT.jar:target/mahout-core-0.8-20121116.235610-121.jar  com.mycompany.app.App

Failed to execute goal on project my-app: Could not resolve dependencies for project com.mycompany.app:my-app:jar:1.0-SNAPSHOT: Could not find artifact org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT

Probleem

22:52:46 margusja@IRack> cp mvn compile
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building my-app 1.0-SNAPSHOT
[INFO] ————————————————————————
[WARNING] The POM for org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT is missing, no dependency information available
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 0.409s
[INFO] Finished at: Sat Nov 17 22:52:51 EET 2012
[INFO] Final Memory: 2M/81M
[INFO] ————————————————————————
[ERROR] Failed to execute goal on project my-app: Could not resolve dependencies for project com.mycompany.app:my-app:jar:1.0-SNAPSHOT: Could not find artifact org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Lahendus

Tirime kohale jar faili, mida vajame. Antud juhul mahout-core…

Paneme oma reposse

23:10:12 margusja@IRack> vimmvn install:install-file -DgroupId=org.apache.mahout -DartifactId=mahout-core -Dversion=0.8-SNAPSHOT -Dpackaging=jar -Dfile=target/mahout-core-0.8-20121116.235610-121.jar
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building my-app 1.0-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] — maven-install-plugin:2.3.1:install-file (default-cli) @ my-app —
[INFO] Installing /Users/margusja/java/my-app/target/mahout-core-0.8-20121116.235610-121.jar to /Users/margusja/.m2/repository/org/apache/mahout/mahout-core/0.8-SNAPSHOT/mahout-core-0.8-SNAPSHOT.jar
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 1.101s
[INFO] Finished at: Sat Nov 17 23:10:53 EET 2012
[INFO] Final Memory: 3M/81M
[INFO] ————————————————————————

Mahout Item based methods

On oluline, millist meetodit kasutada, lahendades erinevaid probleeme.

Üks lihtne näide. Meil on 1000 000 reaga andmehulk (UserID, ItemID, preference)

PearsonCorrelationSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 20:59:38 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 20:59:38 INFO file.FileDataModel: Reading file info…
12/11/16 20:59:40 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 20:59:40 INFO file.FileDataModel: Read lines: 1000209
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 6040 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 333 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 312 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 312 tasks in 2 threads
12/11/16 20:59:40 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 20:59:40 INFO eval.StatsCallable: Approximate memory used: 174MB / 275MB
12/11/16 20:59:40 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 974
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 526
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1837
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2544
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2157
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3781
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 325
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1439
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2211
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 313
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 867
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 864
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 255
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1932
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.8425337616195027
0.8425337616195027
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.510s
[INFO] Finished at: Fri Nov 16 20:59:41 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

real 0m8.268s
user 0m11.295s
sys 0m0.731s
[hduser@vm37 my-app]$

 

EuclideanDistanceSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:01:57 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:01:57 INFO file.FileDataModel: Reading file info…
12/11/16 21:01:59 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:01:59 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 299 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:01:59 INFO eval.StatsCallable: Average time per recommendation: 6ms
12/11/16 21:01:59 INFO eval.StatsCallable: Approximate memory used: 172MB / 274MB
12/11/16 21:01:59 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 304
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1351
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 119
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 437
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 961
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2400
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1679
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1044
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1114
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2620
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1472
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2233
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 815
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2215
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2246
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1896
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3346
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3320
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2855
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1442
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3043
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7924501920488942
0.7924501920488942
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.476s
[INFO] Finished at: Fri Nov 16 21:02:00 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

TanimotoCoefficientSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:03:04 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:03:04 INFO file.FileDataModel: Reading file info…
12/11/16 21:03:06 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:03:06 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:03:06 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:03:07 INFO model.GenericDataModel: Processed 287 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 265 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 265 tasks in 2 threads
12/11/16 21:03:07 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 21:03:07 INFO eval.StatsCallable: Approximate memory used: 179MB / 274MB
12/11/16 21:03:07 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.79503719853545
0.79503719853545
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.632s
[INFO] Finished at: Fri Nov 16 21:03:07 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

LogLikelihoodSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:05:05 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:05:05 INFO file.FileDataModel: Reading file info…
12/11/16 21:05:07 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:05:07 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 297 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:05:08 INFO eval.StatsCallable: Average time per recommendation: 3ms
12/11/16 21:05:08 INFO eval.StatsCallable: Approximate memory used: 178MB / 275MB
12/11/16 21:05:08 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.813376035770476
0.813376035770476
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.558s
[INFO] Finished at: Fri Nov 16 21:05:08 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

Antud tulemustest on näha, et antud andmehulga puhul annab parema tulemuse TanimotoCoefficientSimilarity ja EuclideanDistanceSimilarity, mis kindlasti ei tähenda, et mõnes teises olukorras teisi meetodeid mitte kaaluda.

 

AGA saab alati paremini

SlopeOneRecommender 

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 22:49:49 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 22:49:49 INFO file.FileDataModel: Reading file info…
12/11/16 22:49:51 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 22:49:51 INFO file.FileDataModel: Read lines: 1000209
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 6040 users
12/11/16 22:49:51 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 302 users
12/11/16 22:49:51 INFO slopeone.MemoryDiffStorage: Building average diffs…
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 283 users
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 283 tasks in 2 threads
12/11/16 22:49:55 INFO eval.StatsCallable: Average time per recommendation: 22ms
12/11/16 22:49:55 INFO eval.StatsCallable: Approximate memory used: 277MB / 481MB
12/11/16 22:49:55 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7236027312886469
0.7236027312886469
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 10.150s
[INFO] Finished at: Fri Nov 16 22:49:55 EET 2012
[INFO] Final Memory: 8M/459M
[INFO] ————————————————————————

real 0m11.922s
user 0m15.635s
sys 0m0.975s