Skip to content

Margus Roo –

If you're inventing and pioneering, you have to be willing to be misunderstood for long periods of time

  • Cloudbreak Autoscale fix
  • Endast

Category: Linux

Mahaut and UserSimilarity

Posted on November 20, 2012 - November 20, 2012 by margusja

Teemaks siis mahout ja kahe userID (X ja Y) vaheline sarnasus.

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/similarity/PearsonCorrelationSimilarity.html

Matemaatiline valem:

Vajame selleks org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity pakki, mida java keeles saame endale:

import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;

Loome ka objekti similarity:

UserSimilarity similarity = new PearsonCorrelationSimilarity (model); // model on mahout DataModel objekt, mida siinkohal ei hakka lahti joonistama.

Selle tulemusena avaneb meile võimalused kasutada meetodit userSimilarity:

Double sim = similarity.userSimilarity(X,Y); // X ja Y, mis on toodud meetodi argumentidena on DataModel userID-d

Posted in LinuxTagged mahout, similarity

java.lang.NoClassDefFoundError: org/apache/mahout/cf/taste/model/DataModel

Posted on November 18, 2012 by margusja

Probleem

23:32:36 margusja@IRack> vimjava -classpath target/my-app-1.0-SNAPSHOT.jar com.mycompany.app.App
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/mahout/cf/taste/model/DataModel
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2442)
at java.lang.Class.getMethod0(Class.java:2685)
at java.lang.Class.getMethod(Class.java:1620)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:492)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:484)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.model.DataModel
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
… 6 more

Lahendus

Puudus viide org/apache/mahout/cf/taste/model/DataModel, mida saab häda pärast täpsustada:

00:06:39 margusja@IRack> javjava -classpath target/my-app-1.0-SNAPSHOT.jar:target/mahout-core-0.8-20121116.235610-121.jar  com.mycompany.app.App

Posted in LinuxTagged java, mahout, maven

Failed to execute goal on project my-app: Could not resolve dependencies for project com.mycompany.app:my-app:jar:1.0-SNAPSHOT: Could not find artifact org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT

Posted on November 17, 2012 - November 17, 2012 by margusja

Probleem

22:52:46 margusja@IRack> cp mvn compile
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building my-app 1.0-SNAPSHOT
[INFO] ————————————————————————
[WARNING] The POM for org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT is missing, no dependency information available
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 0.409s
[INFO] Finished at: Sat Nov 17 22:52:51 EET 2012
[INFO] Final Memory: 2M/81M
[INFO] ————————————————————————
[ERROR] Failed to execute goal on project my-app: Could not resolve dependencies for project com.mycompany.app:my-app:jar:1.0-SNAPSHOT: Could not find artifact org.apache.mahout:mahout-core:jar:0.8-SNAPSHOT -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Lahendus

Tirime kohale jar faili, mida vajame. Antud juhul mahout-core…

Paneme oma reposse

23:10:12 margusja@IRack> vimmvn install:install-file -DgroupId=org.apache.mahout -DartifactId=mahout-core -Dversion=0.8-SNAPSHOT -Dpackaging=jar -Dfile=target/mahout-core-0.8-20121116.235610-121.jar
[INFO] Scanning for projects…
[INFO]
[INFO] ————————————————————————
[INFO] Building my-app 1.0-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] — maven-install-plugin:2.3.1:install-file (default-cli) @ my-app —
[INFO] Installing /Users/margusja/java/my-app/target/mahout-core-0.8-20121116.235610-121.jar to /Users/margusja/.m2/repository/org/apache/mahout/mahout-core/0.8-SNAPSHOT/mahout-core-0.8-SNAPSHOT.jar
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 1.101s
[INFO] Finished at: Sat Nov 17 23:10:53 EET 2012
[INFO] Final Memory: 3M/81M
[INFO] ————————————————————————

Posted in LinuxTagged java, mahout, maven

Mahout Item based methods

Posted on November 16, 2012 - November 16, 2012 by margusja

On oluline, millist meetodit kasutada, lahendades erinevaid probleeme.

Üks lihtne näide. Meil on 1000 000 reaga andmehulk (UserID, ItemID, preference)

PearsonCorrelationSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 20:59:38 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 20:59:38 INFO file.FileDataModel: Reading file info…
12/11/16 20:59:40 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 20:59:40 INFO file.FileDataModel: Read lines: 1000209
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 6040 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 20:59:40 INFO model.GenericDataModel: Processed 333 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 312 users
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 312 tasks in 2 threads
12/11/16 20:59:40 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 20:59:40 INFO eval.StatsCallable: Approximate memory used: 174MB / 275MB
12/11/16 20:59:40 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 974
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 526
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1837
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2544
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2157
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3781
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 325
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1439
12/11/16 20:59:40 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2211
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 313
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 867
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 864
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 255
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1932
12/11/16 20:59:41 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.8425337616195027
0.8425337616195027
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.510s
[INFO] Finished at: Fri Nov 16 20:59:41 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

real 0m8.268s
user 0m11.295s
sys 0m0.731s
[hduser@vm37 my-app]$

 

EuclideanDistanceSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:01:57 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:01:57 INFO file.FileDataModel: Reading file info…
12/11/16 21:01:59 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:01:59 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:01:59 INFO model.GenericDataModel: Processed 299 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:01:59 INFO eval.StatsCallable: Average time per recommendation: 6ms
12/11/16 21:01:59 INFO eval.StatsCallable: Approximate memory used: 172MB / 274MB
12/11/16 21:01:59 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:01:59 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 304
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1351
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 119
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 437
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 961
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2400
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1679
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1044
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1114
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2620
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1472
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2233
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 815
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2215
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2246
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1896
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3346
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3320
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 2855
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 1442
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Item exists in test data but not training data: 3043
12/11/16 21:02:00 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7924501920488942
0.7924501920488942
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.476s
[INFO] Finished at: Fri Nov 16 21:02:00 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

TanimotoCoefficientSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:03:04 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:03:04 INFO file.FileDataModel: Reading file info…
12/11/16 21:03:06 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:03:06 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:03:06 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:03:07 INFO model.GenericDataModel: Processed 287 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 265 users
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 265 tasks in 2 threads
12/11/16 21:03:07 INFO eval.StatsCallable: Average time per recommendation: 2ms
12/11/16 21:03:07 INFO eval.StatsCallable: Approximate memory used: 179MB / 274MB
12/11/16 21:03:07 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:03:07 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.79503719853545
0.79503719853545
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.632s
[INFO] Finished at: Fri Nov 16 21:03:07 EET 2012
[INFO] Final Memory: 9M/271M
[INFO] ————————————————————————

LogLikelihoodSimilarity

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 21:05:05 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 21:05:05 INFO file.FileDataModel: Reading file info…
12/11/16 21:05:07 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 21:05:07 INFO file.FileDataModel: Read lines: 1000209
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 6040 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 21:05:08 INFO model.GenericDataModel: Processed 297 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 282 users
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 282 tasks in 2 threads
12/11/16 21:05:08 INFO eval.StatsCallable: Average time per recommendation: 3ms
12/11/16 21:05:08 INFO eval.StatsCallable: Approximate memory used: 178MB / 275MB
12/11/16 21:05:08 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 21:05:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.813376035770476
0.813376035770476
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 6.558s
[INFO] Finished at: Fri Nov 16 21:05:08 EET 2012
[INFO] Final Memory: 9M/272M
[INFO] ————————————————————————

Antud tulemustest on näha, et antud andmehulga puhul annab parema tulemuse TanimotoCoefficientSimilarity ja EuclideanDistanceSimilarity, mis kindlasti ei tähenda, et mõnes teises olukorras teisi meetodeid mitte kaaluda.

 

AGA saab alati paremini

SlopeOneRecommender 

[INFO] — exec-maven-plugin:1.2.1:java (default-cli) @ my-app —
12/11/16 22:49:49 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
12/11/16 22:49:49 INFO file.FileDataModel: Reading file info…
12/11/16 22:49:51 INFO file.FileDataModel: Processed 1000000 lines
12/11/16 22:49:51 INFO file.FileDataModel: Read lines: 1000209
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 6040 users
12/11/16 22:49:51 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.95 of GroupLensDataModel
12/11/16 22:49:51 INFO model.GenericDataModel: Processed 302 users
12/11/16 22:49:51 INFO slopeone.MemoryDiffStorage: Building average diffs…
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 283 users
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 283 tasks in 2 threads
12/11/16 22:49:55 INFO eval.StatsCallable: Average time per recommendation: 22ms
12/11/16 22:49:55 INFO eval.StatsCallable: Approximate memory used: 277MB / 481MB
12/11/16 22:49:55 INFO eval.StatsCallable: Unable to recommend in 0 cases
12/11/16 22:49:55 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7236027312886469
0.7236027312886469
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 10.150s
[INFO] Finished at: Fri Nov 16 22:49:55 EET 2012
[INFO] Final Memory: 8M/459M
[INFO] ————————————————————————

real 0m11.922s
user 0m15.635s
sys 0m0.975s


Posted in LinuxTagged machin learning, mahout

Kimi Raikonen ja tema tiimi vaheline radioside

Posted on November 7, 2012 by margusja

Posted in LinuxTagged Raikonen

Openssl symetric and asymetric encrypt-decrypt ja signeerimise howto

Posted on November 3, 2012 - October 19, 2013 by margusja

Asümeetriline:

1. Loo võtmepaar

openssl genrsa -out private.pem 1024

openssl rsa -in private.pem -out public.pem -outform PEM -pubout

2. Jaga oma avaliku võtit partneriga

3. Partner krüpteerib avaliku võtmega

openssl rsautl -encrypt -inkey public.pem -pubin -in file.txt -out file.ssl

4.  Ava salajase võtmega

openssl rsautl -decrypt -inkey private.pem -in file.ssl -out decrypted.txt

Sümeetriline:

Krüpteerimine

openssl aes-256-cbc -salt -a -e -in saladus.txt -out encrypted.txt
enter aes-256-cbc encryption password:
Verifying – enter aes-256-cbc encryption password:

Dešifreerimiseks:

openssl aes-256-cbc -salt -a -d -in encrypted.txt -out plaintext.txt
enter aes-256-cbc decryption password:

Signeerimine:

Allkirjastame

openssl rsautl -sign -in saladus.txt -out saladus.signed -inkey private.pem

Partner saab kontrollida

openssl rsautl -verify -in saladus.signed -out saladus.verified -pubin -inkey public.pem

Räsi genereerimiseks on tänaseks MD5 saatanast 128b on ilmselgelt liialt vähe.

> openssl sha1 saladus.txt

või

> shasum -a 256 saladus.txt

Posted in IT, Linux

wget – recursively download all files from certain directory listed by apache

Posted on October 19, 2012 - October 19, 2012 by margusja

wget -r -np -nH –cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

Posted in Linux

R ja Hüpergeomeetriline jaotus

Posted on September 23, 2012 by margusja

Probleem: Meil on 10 palli, millest 4 on valget ja 6 musta. Võetakse juhuslikult kaks palli. Milline on tõenäosus, et võetud pallide hulgas ei ole ühtegi valget palli, on üks valge pall ja mõlemad pallid on valged?

R-ga saab seda lahendada alljärgnevalt.

R> dhyper(0:2, 4, 6, 2)

Vastus:

R> 0.3333333 0.5333333 0.1333333

Võimalus, et kumbki pall pole valge on 33%

Võimalus, et üks pall on valge on 53%

Võimalus, et mõlemad pallid on valged on 13%

Posted in LinuxTagged R

R ja Binoomjaotus

Posted on September 23, 2012 by margusja

Olgu meil probleem kus tuleb arvutada korvpalluri vabavisete tõenäosus, kelle vabavisete tabamuste tõenäosus on 90%.

Korvpallur viskab vabaviskeid 2 korda.

R> dbinom(0:2, 2, 0.9)

#0:2 on arvutatavate võimaluste variatsioonid – 0 korda tabab, 1 korda tabab ja 2 korda tabab

# 2 on sooritused ehk mitu korda viskab

# 0.9 on sportlase tabamusprotsent siiani

Vasutseks saame:

R> 0.01 0.18 0.81

Võimalus, et sportlane viskab mõlemad korrad mööda on 0,01 ehk o,1%

Võimalus, et sportlane viskab 1 korra mööda on 0.18 ehk 18 protsenti

Võimalus, et sportlane tabab mõlemad korrad on 0.81 ehk 81%

Posted in LinuxTagged R

two simple R examples

Posted on September 23, 2012 by margusja

Simple knn example

Simple svm example

Posted in LinuxTagged knn, R, svm

Posts navigation

Older posts
Newer posts

The Master

Categories

  • Apache
  • Apple
  • Assembler
  • Audi
  • BigData
  • BMW
  • C
  • Elektroonika
  • Fun
  • Hadoop
  • help
  • Infotehnoloogia koolis
  • IOT
  • IT
  • IT eetilised
  • Java
  • Langevarjundus
  • Lapsed
  • lastekodu
  • Linux
  • M-401
  • Mac
  • Machine Learning
  • Matemaatika
  • Math
  • MSP430
  • Muusika
  • neo4j
  • openCL
  • Õpetaja identiteet ja tegevusvõimekus
  • oracle
  • PHP
  • PostgreSql
  • ProM
  • R
  • Turvalisus
  • Varia
  • Windows
Proudly powered by WordPress | Theme: micro, developed by DevriX.