Skip to content

Margus Roo –

If you're inventing and pioneering, you have to be willing to be misunderstood for long periods of time

  • Cloudbreak Autoscale fix
  • Endast

Atmega 328p

Posted on January 21, 2014 - January 21, 2014 by margusja

Tinutasin mingi asja kokku, uskuge, minu töntsnäppudega oli keeruline.

2014-01-21 21.01.02

margusja@IRack:~$ avrdude -c avrispmkII -v -p ATMEGA328P -P usb

 

avrdude: Version 5.11.1, compiled on Sep  8 2012 at 11:06:53

Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/

Copyright (c) 2007-2009 Joerg Wunsch

 

System wide configuration file is “/opt/local/etc/avrdude.conf”

User configuration file is “/Users/margusja/.avrduderc”

User configuration file does not exist or is not a regular file, skipping

 

Using Port                    : usb

Using Programmer              : avrispmkII

avrdude: usbdev_open(): Found AVRISP mkII, serno: 000200133546

AVR Part                      : ATMEGA328P

Chip Erase delay              : 9000 us

PAGEL                         : PD7

BS2                           : PC2

RESET disposition             : dedicated

RETRY pulse                   : SCK

serial program mode           : yes

parallel program mode         : yes

Timeout                       : 200

StabDelay                     : 100

CmdexeDelay                   : 25

SyncLoops                     : 32

ByteDelay                     : 0

PollIndex                     : 3

PollValue                     : 0x53

Memory Detail                 :

 

Block Poll               Page                       Polled

Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack

———– —- —– —– —- —— —— —- —— —– —– ———

eeprom        65    20     4    0 no       1024    4      0  3600  3600 0xff 0xff

flash         65     6   128    0 yes     32768  128    256  4500  4500 0xff 0xff

lfuse          0     0     0    0 no          1    0      0  4500  4500 0x00 0x00

hfuse          0     0     0    0 no          1    0      0  4500  4500 0x00 0x00

efuse          0     0     0    0 no          1    0      0  4500  4500 0x00 0x00

lock           0     0     0    0 no          1    0      0  4500  4500 0x00 0x00

calibration    0     0     0    0 no          1    0      0     0     0 0x00 0x00

signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00

 

Programmer Type : STK500V2

Description     : Atmel AVR ISP mkII

Programmer Model: AVRISP mkII

Hardware Version: 1

Firmware Version Master : 1.10

Vtarget         : 3.3 V

SCK period      : 8.00 us

 

avrdude: AVR device initialized and ready to accept instructions

 

Reading | ################################################## | 100% 0.01s

 

avrdude: Device signature = 0x1e950f

avrdude: safemode: lfuse reads as FF

avrdude: safemode: hfuse reads as DE

avrdude: safemode: efuse reads as 5

 

avrdude: safemode: lfuse reads as FF

avrdude: safemode: hfuse reads as DE

avrdude: safemode: efuse reads as 5

avrdude: safemode: Fuses OK

 

avrdude done.  Thank you.

 

margusja@IRack:~$

 

Tegelikult oli see järg prototüüpplaadil oleva kola ilusamale kujule viimisest. Ülemine on prototüüp, alumine kompaktsem variant.

2014-01-21 22.49.16

 

Kusagil nurgas on ka pisike LCD, mis ülal toodud plaadilt temperatuuri näitab.

2014-01-21 22.49.55

Posted in Elektroonika

Tubli poiss – viis – istu

Posted on January 16, 2014 - January 16, 2014 by margusja

Screen Shot 2014-01-16 at 17.46.32

Posted in IT

Kuidas pääseda orbiidile

Posted on January 6, 2014 by margusja

1. Helista NASA-sse. Nende telefoninumber on (713) 483-3111. Tee neile selgeks, et sul on niipea kui võimalik vaja siit ära saada.
2. Kui nad ei ole nõus kaasa aitama, helista mõnele sõbrale, kes sul Valges Majas võiks töötada – (202) 456-1414 – et ta kostaks NASA-meestele mõne sõna sinu heaks.
3. Kui sul ei ole ühtegi sõpra Valges Majas, helista Kremlisse (küsi ülemere-operaatorilt numbrit 0107-095-295-9051).Neil ei ole seal kunagi ühtegi sõpra (või vähemalt mitte niisugust, kellele auk pähe rääkida), aga neil tundub olevat mõningast mõjuvõimu. Igatahes võid proovida.
4. Kui see ebaõnnestub, helista juhendite saamiseks paavstile. Tema telefoninumber on 011-39-6-6982 eeldades, et kommutaator on töökorras.
5. Kui kõik need püüded luhtuvad, hääleta maha mööduv lendav taldrik ja tee neile selgeks, et sul on eluliselt oluline saada siit minema enne, kui saabuvad su telefoniarved.

Douglas Adams

Posted in Fun

Kodilas 26-12-2013

Posted on December 27, 2013 by margusja
Posted in Fun

I love MS keyboard

Posted on December 17, 2013 by margusja

I am not MS fan but keyboards from MS are excellent!
2013-12-17 10.00.29

 

 

Posted in Fun

Pambuga (Marekiga) lumes 2013

Posted on December 9, 2013 by margusja
Posted in Lapsed

VHDL fun

Posted on December 5, 2013 by margusja

taissummaator

Posted in IT

Mahout text classification

Posted on November 28, 2013 - June 8, 2015 by margusja

2014-03-18

[speech@h14 ~]$ hdfs dfs -ls demo
Found 10 items
-rw-r–r– 3 speech supergroup 628 2014-03-17 10:59 demo/text1.txt
-rw-r–r– 3 speech supergroup 1327 2014-03-17 10:59 demo/text10.txt
-rw-r–r– 3 speech supergroup 5165 2014-03-17 10:59 demo/text2.txt
-rw-r–r– 3 speech supergroup 3736 2014-03-17 10:59 demo/text3.txt
-rw-r–r– 3 speech supergroup 4338 2014-03-17 10:59 demo/text4.txt
-rw-r–r– 3 speech supergroup 3338 2014-03-17 10:59 demo/text5.txt
-rw-r–r– 3 speech supergroup 5836 2014-03-17 10:59 demo/text6.txt
-rw-r–r– 3 speech supergroup 2936 2014-03-17 10:59 demo/text7.txt
-rw-r–r– 3 speech supergroup 905 2014-03-17 10:59 demo/text8.txt
-rw-r–r– 3 speech supergroup 1566 2014-03-17 10:59 demo/text9.txt
[speech@h14 ~]$

 

Mahout has utilities to generate Vectors from a directory of text documents. Before creating the vectors, you need to convert the documents to SequenceFile format. SequenceFile is a hadoop class which allows us to write arbitary (key, value) pairs into it. The DocumentVectorizer requires the key to be a Text with a unique document id, and value to be the Text content in UTF-8 format.

The output of seqDirectory will be a Sequence file < Text, Text > of all documents (/sub-directory-path/documentFileName, documentText).

[speech@h14 ~]$ mahout seqdirectory -c UTF-8 -i demo -o demo-seqfiles

Check the output:

mahout seqdumper -i /user/margusja/demo-seqfiles/part-m-00000

[speech@h14 ~]$ hdfs dfs -ls demo-seqfiles
Found 2 items
-rw-r–r– 3 speech supergroup 0 2014-03-18 14:54 demo-seqfiles/_SUCCESS
-rw-r–r– 3 speech supergroup 15186 2014-03-18 14:54 demo-seqfiles/part-m-00000

[speech@h14 ~]$ mahout seq2sparse -nv -i demo-seqfiles -o demo-vectors -ow -x 10
-x 10 removes stopwords. Words that are in 10 files will be removed

[speech@h14 ~]$ hdfs dfs -ls demo-vectors
Found 7 items
drwxr-xr-x – speech supergroup 0 2014-03-18 14:57 demo-vectors/df-count
-rw-r–r– 3 speech supergroup 10472 2014-03-18 14:57 demo-vectors/dictionary.file-0
-rw-r–r– 3 speech supergroup 10933 2014-03-18 14:57 demo-vectors/frequency.file-0
drwxr-xr-x – speech supergroup 0 2014-03-18 14:58 demo-vectors/tf-vectors
drwxr-xr-x – speech supergroup 0 2014-03-18 14:58 demo-vectors/tfidf-vectors
drwxr-xr-x – speech supergroup 0 2014-03-18 14:57 demo-vectors/tokenized-documents
drwxr-xr-x – speech supergroup 0 2014-03-18 14:57 demo-vectors/wordcount

[speech@h14 ~]$ mahout kmeans -i demo-vectors/tfidf-vectors -c demo-canopy-centroids -o demo-kmeans-clusters -k 3 -x 10 -cl -ow

[speech@h14 ~]$ hdfs dfs -ls demo-kmeans-clusters
Found 6 items
-rw-r–r– 3 speech supergroup 194 2014-03-18 15:02 demo-kmeans-clusters/_policy
drwxr-xr-x – speech supergroup 0 2014-03-18 15:02 demo-kmeans-clusters/clusteredPoints
drwxr-xr-x – speech supergroup 0 2014-03-18 15:01 demo-kmeans-clusters/clusters-0
drwxr-xr-x – speech supergroup 0 2014-03-18 15:01 demo-kmeans-clusters/clusters-1
drwxr-xr-x – speech supergroup 0 2014-03-18 15:01 demo-kmeans-clusters/clusters-2
drwxr-xr-x – speech supergroup 0 2014-03-18 15:02 demo-kmeans-clusters/clusters-3-final

[speech@h14 ~]$ mahout clusterdump -dt sequencefile -d demo-vectors/dictionary.file-0 -i demo-kmeans-clusters/clusters-3-final -n 10 -o demo_clusters -p demo-kmeans-clusters/clusteredPoints
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /home/speech/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/03/18 15:02:46 INFO common.AbstractJob: Command line arguments: {–dictionary=[demo-vectors/dictionary.file-0], –dictionaryType=[sequencefile], –distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], –endPhase=[2147483647], –input=[demo-kmeans-clusters/clusters-3-final], –numWords=[10], –output=[demo_clusters], –outputFormat=[TEXT], –pointsDir=[demo-kmeans-clusters/clusteredPoints], –startPhase=[0], –tempDir=[temp]}
14/03/18 15:02:48 INFO clustering.ClusterDumper: Wrote 3 clusters
14/03/18 15:02:48 INFO driver.MahoutDriver: Program took 1880 ms (Minutes: 0.03133333333333333)
[speech@h14 ~]$

This is simple example how to create clusters from text articles using mahout and hadoop.

  • Created 10 text files, copied postimees.ee articles each to separate file. Moved local dir to hadoop fs


[hduser@vm38 mahout-0.9]$ hadoop fs -ls demo
Warning: $HADOOP_HOME is deprecated.

Found 10 items
-rw-r–r– 2 hduser supergroup 1933 2013-11-25 15:15 /user/hduser/demo/uudis1.txt
-rw-r–r– 2 hduser supergroup 1870 2013-11-25 15:15 /user/hduser/demo/uudis10.txt
-rw-r–r– 2 hduser supergroup 706 2013-11-25 15:15 /user/hduser/demo/uudis2.txt
-rw-r–r– 2 hduser supergroup 1812 2013-11-25 15:15 /user/hduser/demo/uudis3.txt
-rw-r–r– 2 hduser supergroup 1174 2013-11-25 15:15 /user/hduser/demo/uudis4.txt
-rw-r–r– 2 hduser supergroup 2363 2013-11-25 15:15 /user/hduser/demo/uudis5.txt
-rw-r–r– 2 hduser supergroup 1708 2013-11-25 15:15 /user/hduser/demo/uudis6.txt
-rw-r–r– 2 hduser supergroup 2198 2013-11-25 15:15 /user/hduser/demo/uudis7.txt
-rw-r–r– 2 hduser supergroup 806 2013-11-25 15:15 /user/hduser/demo/uudis8.txt
-rw-r–r– 2 hduser supergroup 737 2013-11-25 15:15 /user/hduser/demo/uudis9.txt
[hduser@vm38 mahout-0.9]$

  •  Create SequenceFile format in to hadoop fs


[hduser@vm38 mahout-0.9]$ bin/mahout seqdirectory -c UTF-8 -i demo -o demo-seqfiles
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/hduser/mahout-0.9/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/11/28 11:04:54 INFO common.AbstractJob: Command line arguments: {–charset=[UTF-8], –chunkSize=[64], –endPhase=[2147483647], –fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], –input=[demo], –keyPrefix=[], –output=[demo-seqfiles], –startPhase=[0], –tempDir=[temp]}
13/11/28 11:04:55 INFO driver.MahoutDriver: Program took 1263 ms (Minutes: 0.02105)
[hduser@vm38 mahout-0.9]$

  •  Convert data to vectors ( key -nv gives namevectors)


[hduser@vm38 mahout-0.9]$ bin/mahout seq2sparse -nv -i demo-seqfiles/ -o demo-vectors -ow
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/hduser/mahout-0.9/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/11/28 11:09:26 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
13/11/28 11:09:26 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
13/11/28 11:09:26 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
13/11/28 11:09:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/28 11:09:26 INFO input.FileInputFormat: Total input paths to process : 1
13/11/28 11:09:27 INFO mapred.JobClient: Running job: job_201310021514_0216
13/11/28 11:09:28 INFO mapred.JobClient: map 0% reduce 0%
13/11/28 11:09:42 INFO mapred.JobClient: map 100% reduce 0%
13/11/28 11:09:47 INFO mapred.JobClient: Job complete: job_201310021514_0216
13/11/28 11:09:47 INFO mapred.JobClient: Counters: 19
13/11/28 11:09:47 INFO mapred.JobClient: Job Counters
13/11/28 11:09:47 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=13764
13/11/28 11:09:47 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/28 11:09:47 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/11/28 11:09:47 INFO mapred.JobClient: Rack-local map tasks=1
13/11/28 11:09:47 INFO mapred.JobClient: Launched map tasks=1
13/11/28 11:09:47 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/11/28 11:09:47 INFO mapred.JobClient: File Output Format Counters
13/11/28 11:09:47 INFO mapred.JobClient: Bytes Written=15158
13/11/28 11:09:47 INFO mapred.JobClient: FileSystemCounters
13/11/28 11:09:47 INFO mapred.JobClient: HDFS_BYTES_READ=15834
13/11/28 11:09:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21686
13/11/28 11:09:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=15158
13/11/28 11:09:47 INFO mapred.JobClient: File Input Format Counters
13/11/28 11:09:47 INFO mapred.JobClient: Bytes Read=15716
13/11/28 11:09:47 INFO mapred.JobClient: Map-Reduce Framework
13/11/28 11:09:47 INFO mapred.JobClient: Map input records=10
13/11/28 11:09:47 INFO mapred.JobClient: Physical memory (bytes) snapshot=81567744
13/11/28 11:09:47 INFO mapred.JobClient: Spilled Records=0
13/11/28 11:09:47 INFO mapred.JobClient: CPU time spent (ms)=460
13/11/28 11:09:47 INFO mapred.JobClient: Total committed heap usage (bytes)=76939264
13/11/28 11:09:47 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2875113472
13/11/28 11:09:47 INFO mapred.JobClient: Map output records=10
13/11/28 11:09:47 INFO mapred.JobClient: SPLIT_RAW_BYTES=118
13/11/28 11:09:47 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
…
…
…
13/11/28 11:13:17 INFO mapred.JobClient: Job complete: job_201310021514_0222
13/11/28 11:13:17 INFO mapred.JobClient: Counters: 29
13/11/28 11:13:17 INFO mapred.JobClient: Job Counters
13/11/28 11:13:17 INFO mapred.JobClient: Launched reduce tasks=1
13/11/28 11:13:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=13780
13/11/28 11:13:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/28 11:13:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/11/28 11:13:17 INFO mapred.JobClient: Rack-local map tasks=1
13/11/28 11:13:17 INFO mapred.JobClient: Launched map tasks=1
13/11/28 11:13:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10639
13/11/28 11:13:17 INFO mapred.JobClient: File Output Format Counters
13/11/28 11:13:17 INFO mapred.JobClient: Bytes Written=5122
13/11/28 11:13:17 INFO mapred.JobClient: FileSystemCounters
13/11/28 11:13:17 INFO mapred.JobClient: FILE_BYTES_READ=4957
13/11/28 11:13:17 INFO mapred.JobClient: HDFS_BYTES_READ=5262
13/11/28 11:13:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54153
13/11/28 11:13:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=5122
13/11/28 11:13:17 INFO mapred.JobClient: File Input Format Counters
13/11/28 11:13:17 INFO mapred.JobClient: Bytes Read=5122
13/11/28 11:13:17 INFO mapred.JobClient: Map-Reduce Framework
13/11/28 11:13:17 INFO mapred.JobClient: Map output materialized bytes=4957
13/11/28 11:13:17 INFO mapred.JobClient: Map input records=10
13/11/28 11:13:17 INFO mapred.JobClient: Reduce shuffle bytes=0
13/11/28 11:13:17 INFO mapred.JobClient: Spilled Records=20
13/11/28 11:13:17 INFO mapred.JobClient: Map output bytes=4912
13/11/28 11:13:17 INFO mapred.JobClient: Total committed heap usage (bytes)=217645056
13/11/28 11:13:17 INFO mapred.JobClient: CPU time spent (ms)=1020
13/11/28 11:13:17 INFO mapred.JobClient: Combine input records=0
13/11/28 11:13:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=140
13/11/28 11:13:17 INFO mapred.JobClient: Reduce input records=10
13/11/28 11:13:17 INFO mapred.JobClient: Reduce input groups=10
13/11/28 11:13:17 INFO mapred.JobClient: Combine output records=0
13/11/28 11:13:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=232296448
13/11/28 11:13:17 INFO mapred.JobClient: Reduce output records=10
13/11/28 11:13:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5751988224
13/11/28 11:13:17 INFO mapred.JobClient: Map output records=10
13/11/28 11:13:17 INFO common.HadoopUtil: Deleting demo-vectors/partial-vectors-0
13/11/28 11:13:17 INFO driver.MahoutDriver: Program took 231406 ms (Minutes: 3.8567666666666667)

The result:

[hduser@vm38 mahout-0.9]$ hadoop fs -ls demo-vectors
Warning: $HADOOP_HOME is deprecated.

Found 7 items
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:21 /user/hduser/demo-vectors/df-count
-rw-r–r– 2 hduser supergroup 757 2013-11-28 11:19 /user/hduser/demo-vectors/dictionary.file-0
-rw-r–r– 2 hduser supergroup 873 2013-11-28 11:21 /user/hduser/demo-vectors/frequency.file-0
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:21 /user/hduser/demo-vectors/tf-vectors
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:22 /user/hduser/demo-vectors/tfidf-vectors
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:18 /user/hduser/demo-vectors/tokenized-documents
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:19 /user/hduser/demo-vectors/wordcount

 

  • Now a simple clustering with kmenas


[hduser@vm38 mahout-0.9]$ bin/mahout kmeans -i demo-vectors/tfidf-vectors -c demo-canopy-centroids -o demo-kmeans-clusters5 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -cd 0.1 -k 4 -x 4 -cl
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/hduser/mahout-0.9/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/11/28 11:26:10 INFO common.AbstractJob: Command line arguments: {–clustering=null, –clusters=[demo-canopy-centroids], –convergenceDelta=[0.1], –distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], –endPhase=[2147483647], –input=[demo-vectors/tfidf-vectors], –maxIter=[4], –method=[mapreduce], –numClusters=[4], –output=[demo-kmeans-clusters4], –startPhase=[0], –tempDir=[temp]}
13/11/28 11:26:11 INFO common.HadoopUtil: Deleting demo-canopy-centroids
13/11/28 11:26:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/11/28 11:26:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/11/28 11:26:11 INFO compress.CodecPool: Got brand-new compressor
13/11/28 11:26:11 INFO kmeans.RandomSeedGenerator: Wrote 4 Klusters to demo-canopy-centroids/part-randomSeed
13/11/28 11:26:11 INFO kmeans.KMeansDriver: Input: demo-vectors/tfidf-vectors Clusters In: demo-canopy-centroids/part-randomSeed Out: demo-kmeans-clusters4 Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
13/11/28 11:26:11 INFO kmeans.KMeansDriver: convergence: 0.1 max Iterations: 4 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
13/11/28 11:26:11 INFO compress.CodecPool: Got brand-new decompressor
Cluster Iterator running iteration 1 over priorPath: demo-kmeans-clusters4/clusters-0
13/11/28 11:26:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/28 11:26:12 INFO input.FileInputFormat: Total input paths to process : 1
13/11/28 11:26:12 INFO mapred.JobClient: Running job: job_201310021514_0231
13/11/28 11:26:13 INFO mapred.JobClient: map 0% reduce 0%
13/11/28 11:26:26 INFO mapred.JobClient: map 100% reduce 0%
13/11/28 11:26:38 INFO mapred.JobClient: map 100% reduce 100%
13/11/28 11:26:43 INFO mapred.JobClient: Job complete: job_201310021514_0231
13/11/28 11:26:43 INFO mapred.JobClient: Counters: 29
13/11/28 11:26:43 INFO mapred.JobClient: Job Counters
13/11/28 11:26:43 INFO mapred.JobClient: Launched reduce tasks=1
13/11/28 11:26:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14946
13/11/28 11:26:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/28 11:26:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/11/28 11:26:43 INFO mapred.JobClient: Launched map tasks=1
13/11/28 11:26:43 INFO mapred.JobClient: Data-local map tasks=1
13/11/28 11:26:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11140
13/11/28 11:26:43 INFO mapred.JobClient: File Output Format Counters
13/11/28 11:26:43 INFO mapred.JobClient: Bytes Written=1992
13/11/28 11:26:43 INFO mapred.JobClient: FileSystemCounters
13/11/28 11:26:43 INFO mapred.JobClient: FILE_BYTES_READ=2787
13/11/28 11:26:43 INFO mapred.JobClient: HDFS_BYTES_READ=7996
13/11/28 11:26:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=49849
13/11/28 11:26:43 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1992
13/11/28 11:26:43 INFO mapred.JobClient: File Input Format Counters
13/11/28 11:26:43 INFO mapred.JobClient: Bytes Read=1694
13/11/28 11:26:43 INFO mapred.JobClient: Map-Reduce Framework
13/11/28 11:26:43 INFO mapred.JobClient: Map output materialized bytes=2787
13/11/28 11:26:43 INFO mapred.JobClient: Map input records=10
13/11/28 11:26:43 INFO mapred.JobClient: Reduce shuffle bytes=2787
13/11/28 11:26:43 INFO mapred.JobClient: Spilled Records=8
13/11/28 11:26:43 INFO mapred.JobClient: Map output bytes=2765
13/11/28 11:26:43 INFO mapred.JobClient: Total committed heap usage (bytes)=219152384
13/11/28 11:26:43 INFO mapred.JobClient: CPU time spent (ms)=1970
13/11/28 11:26:43 INFO mapred.JobClient: Combine input records=0
13/11/28 11:26:43 INFO mapred.JobClient: SPLIT_RAW_BYTES=136
13/11/28 11:26:43 INFO mapred.JobClient: Reduce input records=4
13/11/28 11:26:43 INFO mapred.JobClient: Reduce input groups=4
13/11/28 11:26:43 INFO mapred.JobClient: Combine output records=0
13/11/28 11:26:43 INFO mapred.JobClient: Physical memory (bytes) snapshot=249462784
13/11/28 11:26:43 INFO mapred.JobClient: Reduce output records=4
13/11/28 11:26:43 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5729689600
13/11/28 11:26:43 INFO mapred.JobClient: Map output records=4
Cluster Iterator running iteration 2 over priorPath: demo-kmeans-clusters4/clusters-1
13/11/28 11:26:43 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/28 11:26:43 INFO input.FileInputFormat: Total input paths to process : 1
13/11/28 11:26:43 INFO mapred.JobClient: Running job: job_201310021514_0232
13/11/28 11:26:44 INFO mapred.JobClient: map 0% reduce 0%
…
…
…
13/11/28 11:27:15 INFO kmeans.KMeansDriver: Clustering data
13/11/28 11:27:15 INFO kmeans.KMeansDriver: Running Clustering
13/11/28 11:27:15 INFO kmeans.KMeansDriver: Input: demo-vectors/tfidf-vectors Clusters In: demo-kmeans-clusters4 Out: demo-kmeans-clusters4 Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure@5bd31f85
13/11/28 11:27:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/11/28 11:27:16 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/28 11:27:16 INFO input.FileInputFormat: Total input paths to process : 1
13/11/28 11:27:16 INFO mapred.JobClient: Running job: job_201310021514_0233
13/11/28 11:27:17 INFO mapred.JobClient: map 0% reduce 0%
13/11/28 11:27:30 INFO mapred.JobClient: map 100% reduce 0%
13/11/28 11:27:35 INFO mapred.JobClient: Job complete: job_201310021514_0233
13/11/28 11:27:35 INFO mapred.JobClient: Counters: 19
13/11/28 11:27:35 INFO mapred.JobClient: Job Counters
13/11/28 11:27:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14929
13/11/28 11:27:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/28 11:27:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/11/28 11:27:35 INFO mapred.JobClient: Launched map tasks=1
13/11/28 11:27:35 INFO mapred.JobClient: Data-local map tasks=1
13/11/28 11:27:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/11/28 11:27:35 INFO mapred.JobClient: File Output Format Counters
13/11/28 11:27:35 INFO mapred.JobClient: Bytes Written=1723
13/11/28 11:27:35 INFO mapred.JobClient: FileSystemCounters
13/11/28 11:27:35 INFO mapred.JobClient: HDFS_BYTES_READ=4016
13/11/28 11:27:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21720
13/11/28 11:27:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1723
13/11/28 11:27:35 INFO mapred.JobClient: File Input Format Counters
13/11/28 11:27:35 INFO mapred.JobClient: Bytes Read=1694
13/11/28 11:27:35 INFO mapred.JobClient: Map-Reduce Framework
13/11/28 11:27:35 INFO mapred.JobClient: Map input records=10
13/11/28 11:27:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=89280512
13/11/28 11:27:35 INFO mapred.JobClient: Spilled Records=0
13/11/28 11:27:35 INFO mapred.JobClient: CPU time spent (ms)=990
13/11/28 11:27:35 INFO mapred.JobClient: Total committed heap usage (bytes)=61341696
13/11/28 11:27:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2866900992
13/11/28 11:27:35 INFO mapred.JobClient: Map output records=10
13/11/28 11:27:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=136
13/11/28 11:27:35 INFO driver.MahoutDriver: Program took 84664 ms (Minutes: 1.4110666666666667)

And the result:

[hduser@vm38 mahout-0.9]$ hadoop fs -ls demo-kmeans-clusters4
Warning: $HADOOP_HOME is deprecated.

Found 5 items
-rw-r–r– 2 hduser supergroup 194 2013-11-28 11:27 /user/hduser/demo-kmeans-clusters4/_policy
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:27 /user/hduser/demo-kmeans-clusters4/clusteredPoints
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:26 /user/hduser/demo-kmeans-clusters4/clusters-0
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:26 /user/hduser/demo-kmeans-clusters4/clusters-1
drwxr-xr-x – hduser supergroup 0 2013-11-28 11:27 /user/hduser/demo-kmeans-clusters4/clusters-2-final

 

  • View clusters

Clustering tasks in Mahout will output data in the format of a SequenceFile (Text, Cluster) and the Text is a cluster identifier string. To analyze this output we need to convert the sequence files to a human readable format and this is achieved using the clusterdump utility.

[hduser@vm38 mahout-0.9]$ bin/mahout clusterdump -dt sequencefile -d demo-vectors/dictionary.file-* -i /user/hduser/demo-kmeans-clusters6/clusters-3-final/ -n 10 -o clusters_out

[hduser@vm38 mahout-0.9]$ less clusters_out


VL-5{n=1 c=[anesteesia:5.835, arvestades:2.204, eaiü:6.904, eesmärk:2.204, eest:1.693, eesti:2.710, eestis:1.693, ei:1.511, eriala:3.690, euroopa:2.204, freimann:3.690, freimanni:3.690, ida:1.916, intensiivravi:4.520, ja:4.421, juhatuse:2.710, ka:1.563, kes:1.511, kokku:1.916, linnas:2.204, lisas:2.204, merle:4.520, märkis:2.204, neile:2.204, ning:1.915, nõuded:3.690, oma:1.511, oskusi:5.219, pärast:2.204, pärnu:2.204, põhja:1.916, riiklikul:2.204, see:2.137, selgitas:1.693, sõnul:2.137, ta:2.394, tallinn:4.520, tartu:3.690, tasandil:2.204, teadmisi:4.520, tuleb:1.916, vajab:3.690, võib:2.204, õdede:3.690, õdedele:3.690, õed:3.690, üldõe:3.690] r=[]}
Top Terms:
eaiü => 6.903923511505127
anesteesia => 5.834880828857422
oskusi => 5.218875885009766
intensiivravi => 4.519679069519043
merle => 4.519679069519043
teadmisi => 4.519679069519043
tallinn => 4.519679069519043
ja => 4.421442031860352
freimann => 3.6903023719787598
üldõe => 3.6903023719787598
VL-6{n=1 c=[aasta:2.204, annab:2.204, eesti:4.694, eestis:2.394, elab:3.117, elanikkonnast:3.690, ette:1.916, inimesi:2.204, ja:2.472, juba:2.204, juhatuse:1.916, ka:1.563, korteriühistute:5.219, korteriühistutes:5.219, kui:1.357, ligikaudu:3.690, liidu:3.690, liikme:2.204, liit:3.690, mille:2.204, mis:3.386, nii:1.916, ning:1.563, protsenti:3.690, riiklikul:2.204, saab:1.511, samuti:1.693, sõnul:1.511, tasandil:2.204, tähtis:2.204, täna:1.693, tänavu:1.693, urmas:2.204, välja:1.693, ära:2.204, üle:1.916] r=[]}
Top Terms:
korteriühistutes => 5.218875885009766
korteriühistute => 5.218875885009766
eesti => 4.693934917449951
elanikkonnast => 3.6903023719787598
liit => 3.6903023719787598
liidu => 3.6903023719787598
ligikaudu => 3.6903023719787598
protsenti => 3.6903023719787598
mis => 3.386294364929199
elab => 3.1168882846832275
VL-8{n=6 c=[16:0.639, 1918:0.615, 28:0.870, 400:0.735, 95:0.615, aasta:0.519, aastal:0.958, aga:0.873, aktuaalne:0.735, algust:0.615, all:0.735, andis:0.367, annab:0.519, apteegi:0.615, apteek:0.753, apteeki:0.615, aru:0.615, arvan:0.735, arvestades:0.367, audru:1.065, auks:0.615, avatud:0.735, detsembril:0.367, ees:0.753, eesmärk:0.367, eest:0.771, eesti:0.782, eestis:0.888, ei:1.044, elab:0.367, elavad:0.615, enam:0.958, ette:0.319, euroopa:0.636, euroopaga:0.615, euroopasse:0.615, eurot:0.367, hambaarst:0.753, head:0.958, ida:0.639, ilmaga:0.615, inimesi:0.367, inimest:0.735, insenerid:0.615, ja:1.958, juba:0.519, juhatuse:0.319, juures:1.156, ka:1.150, kaamera:0.735, kaljas:0.753, kalmudel:0.615, kangelaste:0.615, kas:0.615, kaubanduskeskuse:0.615, kaugemale:0.615, kell:0.615, kes:0.860, kohal:0.615, kohta:0.735, kokku:0.319, kui:1.483, kuni:0.367, kõik:0.887, kõrgemal:0.753, küsimus:0.887, laenu:0.636, langenud:0.753, lennujaama:0.615, lennukid:0.753, lennuraja:0.753, libedust:0.615, ligi:0.367, liiduga:0.753, liikme:0.367, linna:0.615, linnas:0.367, lisas:0.367, läve:0.753, lörtsi:0.615, ma:0.735, maailmas:0.615, maanteeinfo:0.753, maja:0.615, majas:0.753, maksis:0.367, me:1.335, meelt:0.735, meetri:0.615, meie:0.972, meil:0.735, meile:1.090, mille:0.367, mis:1.053, mälestusmärkide:0.615, märkis:0.519, narva:0.887, neile:0.367, nende:0.519, nihutada:0.615, nii:0.873, ning:0.813, novembril:0.870, nüüd:0.887, oleks:0.615, olemas:0.735, oleme:0.735, oli:1.256, olid:1.246, olnud:0.735, oma:0.860, palju:0.735, peaks:0.615, perearsti:0.615, politsei:0.615, projekteeritud:0.735, pärast:0.367, pärnu:0.367, põhja:0.319, põhjus:0.735, püstitatud:0.615, rahvale:0.735, rahvas:0.958, raja:0.870, raske:0.735, reaalset:0.367, reinsalu:0.870, riia:0.615, riigi:0.753, riigist:0.735, rääkis:1.090, saab:1.007, saada:0.367, saama:0.615, saame:0.887, saanud:0.367, samas:0.367, samuti:0.282, seda:1.129, see:1.630, selgitas:0.847, selle:1.053, selleks:0.735, selles:0.519, selline:0.735, sest:0.873, siin:0.615, siis:1.363, sõdureid:0.615, sõja:0.615, sõnul:0.755, ta:0.771, tagasi:0.615, tallinna:1.278, tegelikult:0.735, teisel:0.615, tema:0.319, temperatuurid:0.615, tsybulenko:0.615, tuleb:0.639, tuli:0.753, tulnud:0.735, tähendab:0.615, tähtis:0.367, täna:1.053, tänavu:0.282, ukraina:1.305, ukrainas:0.615, ukrainlased:0.615, urmas:0.367, vabadussõda:0.615, vabadussõja:0.972, vabadussõjas:0.615, vaevalt:0.735, vahendas:0.735, vaja:0.615, vald:0.615, vastu:1.363, vihma:0.615, viisat:0.615, väga:1.353, välja:0.564, või:1.170, võib:0.367, võidelnute:0.615, võimaluse:0.615, võrra:0.615, võtta:0.887, www.mnt.ee:0.615, ühe:0.519, ühes:0.636, ühinemisraha:0.615, üle:0.319, ülejõe:0.615] r=[16:0.903, 1918:1.375, 28:1.945, 400:1.039, 95:1.375, aasta:1.162, aastal:0.958, aga:1.299, aktuaalne:1.039, algust:1.375, all:1.039, andis:0.821, annab:1.162, apteegi:1.375, apteek:1.684, apteeki:1.375, aru:1.375, arvan:1.039, arvestades:0.821, audru:2.382, auks:1.375, avatud:1.039, detsembril:0.821, ees:1.684, eesmärk:0.821, eest:1.148, eesti:1.749, eestis:1.265, ei:1.092, elab:0.821, elavad:1.375, enam:0.958, ette:0.714, euroopa:1.423, euroopaga:1.375, euroopasse:1.375, eurot:0.821, hambaarst:1.684, head:0.958, ida:0.903, ilmaga:1.375, inimesi:0.821, inimest:1.039, insenerid:1.375, ja:1.509, juba:1.162, juhatuse:0.714, juures:1.647, ka:0.554, kaamera:1.039, kaljas:1.684, kalmudel:1.375, kangelaste:1.375, kas:1.375, kaubanduskeskuse:1.375, kaugemale:1.375, kell:1.375, kes:0.885, kohal:1.375, kohta:1.039, kokku:0.714, kui:0.749, kuni:0.821, kõik:1.282, kõrgemal:1.684, küsimus:1.282, laenu:1.423, langenud:1.684, lennujaama:1.375, lennukid:1.684, lennuraja:1.684, libedust:1.375, ligi:0.821, liiduga:1.684, liikme:0.821, linna:1.375, linnas:0.821, lisas:0.821, läve:1.684, lörtsi:1.375, ma:1.039, maailmas:1.375, maanteeinfo:1.684, maja:1.375, majas:1.684, maksis:0.821, me:1.041, meelt:1.039, meetri:1.375, meie:2.175, meil:1.039, meile:1.122, mille:0.821, mis:1.131, mälestusmärkide:1.375, märkis:1.162, narva:1.282, neile:0.821, nende:1.162, nihutada:1.375, nii:1.299, ning:0.597, novembril:1.945, nüüd:1.282, oleks:1.375, olemas:1.039, oleme:1.039, oli:1.499, olid:0.915, olnud:1.039, oma:0.885, palju:1.039, peaks:1.375, perearsti:1.375, politsei:1.375, projekteeritud:1.039, pärast:0.821, pärnu:0.821, põhja:0.714, põhjus:1.039, püstitatud:1.375, rahvale:1.039, rahvas:0.958, raja:1.945, raske:1.039, reaalset:0.821, reinsalu:1.945, riia:1.375, riigi:1.684, riigist:1.039, rääkis:1.122, saab:0.712, saada:0.821, saama:1.375, saame:1.282, saanud:0.821, samas:0.821, samuti:0.631, seda:0.798, see:1.382, selgitas:0.847, selle:1.131, selleks:1.039, selles:1.162, selline:1.039, sest:1.299, siin:1.375, siis:1.005, sõdureid:1.375, sõja:1.375, sõnul:0.755, ta:1.148, tagasi:1.375, tallinna:1.428, tegelikult:1.039, teisel:1.375, tema:0.714, temperatuurid:1.375, tsybulenko:1.375, tuleb:0.903, tuli:1.684, tulnud:1.039, tähendab:1.375, tähtis:0.821, täna:1.131, tänavu:0.631, ukraina:2.917, ukrainas:1.375, ukrainlased:1.375, urmas:0.821, vabadussõda:1.375, vabadussõja:2.175, vabadussõjas:1.375, vaevalt:1.039, vahendas:1.039, vaja:1.375, vald:1.375, vastu:1.005, vihma:1.375, viisat:1.375, väga:1.566, välja:0.798, või:1.224, võib:0.821, võidelnute:1.375, võimaluse:1.375, võrra:1.375, võtta:1.282, www.mnt.ee:1.375, ühe:1.162, ühes:1.423, ühinemisraha:1.375, üle:0.714, ülejõe:1.375]}
Top Terms:
ja => 1.9576633373896282
see => 1.6297114690144856
kui => 1.4834059476852417
vastu => 1.3625396092732747
siis => 1.3625396092732747
väga => 1.3529229958852131
me => 1.3353430827458699
ukraina => 1.3047189712524414
tallinna => 1.2775271733601887
oli => 1.2556068897247314

Thous are 3 clusters top 10 words. Yes there are room for tuning but this is a simple how to.

  • Bind clustered word and text document

[hduser@vm38 mahout-0.9]$ bin/mahout seqdumper -i /user/hduser/demo-kmeans-clusters6/clusteredPoints/part-m-00000
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/hduser/mahout-0.9/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
Warning: $HADOOP_HOME is deprecated.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/11/28 11:48:04 INFO common.AbstractJob: Command line arguments: {–endPhase=[2147483647], –input=[/user/hduser/demo-kmeans-clusters6/clusteredPoints/part-m-00000], –startPhase=[0], –tempDir=[temp]}
Input Path: /user/hduser/demo-kmeans-clusters6/clusteredPoints/part-m-00000
Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedVectorWritable
Key: 8: Value: 1.0: /uudis1.txt = [6:2.204, 8:3.117, 11:1.916, 13:2.204, 16:2.204, 21:3.690, 22:4.520, 23:3.690, 25:2.204, 28:6.392, 30:2.204, 36:1.693, 43:1.916, 52:2.204, 55:4.520, 57:1.916, 65:3.829, 67:3.117, 70:3.117, 72:1.563, 73:2.204, 92:1.919, 95:3.117, 98:3.817, 105:2.204, 116:2.204, 120:3.690, 121:4.520, 122:2.204, 123:2.933, 127:2.204, 128:1.916, 130:2.204, 133:1.693, 141:3.319, 145:2.204, 148:2.204, 149:4.147, 150:1.693, 158:3.690, 159:3.690, 160:3.690, 162:2.204, 165:2.204, 171:2.204, 172:1.916, 174:2.204, 182:2.710, 183:1.511, 185:3.690, 186:2.204, 189:2.204, 191:1.693, 192:3.378, 194:1.693, 195:2.933, 196:2.204, 198:2.204, 199:3.319, 200:3.690, 201:2.394, 208:3.690, 215:3.690, 221:4.520, 227:1.693, 235:2.204, 236:2.204, 239:3.690, 241:1.693, 244:4.285, 246:1.693, 251:2.204, 257:3.117, 258:3.817, 259:3.690, 262:3.690]
Key: 8: Value: 1.0: /uudis10.txt = [3:3.690, 5:5.219, 7:3.690, 9:1.916, 15:3.690, 16:2.204, 17:2.204, 27:2.204, 29:3.690, 36:2.933, 37:4.694, 39:2.137, 65:3.495, 70:3.817, 72:1.563, 75:3.690, 76:3.690, 83:3.690, 85:1.511, 92:1.919, 95:2.204, 100:4.520, 117:3.690, 123:1.693, 124:2.204, 126:5.835, 128:1.916, 133:2.933, 135:3.690, 136:3.117, 137:3.117, 138:2.204, 141:1.916, 142:1.563, 143:5.219, 147:2.204, 148:2.204, 150:1.693, 151:2.204, 152:2.137, 155:2.204, 167:2.204, 169:3.690, 171:2.204, 172:1.916, 175:2.204, 177:5.219, 179:4.520, 180:2.204, 191:1.693, 192:1.511, 198:2.204, 199:1.916, 202:3.690, 203:3.690, 204:1.511, 214:2.204, 222:2.204, 231:2.204, 232:3.690, 233:5.835, 234:3.690, 241:2.394, 248:3.690, 261:1.916]
Key: 8: Value: 1.0: /uudis2.txt = [24:3.690, 39:1.511, 43:1.916, 57:1.916, 61:2.204, 62:2.204, 63:3.690, 74:4.520, 81:3.690, 85:2.137, 87:2.204, 89:1.916, 92:1.357, 97:2.204, 123:1.693, 142:1.105, 149:1.693, 152:1.511, 162:2.204, 178:3.690, 182:1.916, 183:1.511, 192:1.511, 194:1.693, 195:1.693, 201:1.693, 204:1.511, 214:2.204, 216:1.916, 220:1.916, 237:3.690, 241:1.693, 245:1.693]
Key: 8: Value: 1.0: /uudis3.txt = [9:1.916, 11:3.319, 13:2.204, 25:2.204, 30:2.204, 34:4.520, 38:2.394, 39:2.617, 40:2.204, 42:3.690, 43:1.916, 47:1.916, 49:3.817, 50:3.690, 51:3.690, 58:1.916, 62:2.204, 65:2.211, 72:1.105, 73:2.204, 79:3.690, 85:1.511, 92:1.357, 97:3.117, 108:4.520, 116:2.204, 124:2.204, 139:3.117, 142:1.105, 145:3.117, 146:3.690, 147:2.204, 149:1.693, 150:1.693, 151:2.204, 152:1.511, 155:2.204, 164:2.204, 172:1.916, 174:2.204, 180:2.204, 182:1.916, 183:1.511, 184:2.204, 188:2.204, 190:1.693, 191:1.693, 197:3.117, 201:1.693, 207:2.933, 210:1.916, 219:3.690, 220:1.916, 222:2.204, 226:1.693, 228:7.828, 229:3.690, 230:3.690, 235:2.204, 236:2.204, 243:3.690, 244:1.916, 246:2.933]
Key: 8: Value: 1.0: /uudis4.txt = [2:1.916, 6:2.204, 9:1.916, 19:3.117, 31:2.204, 35:2.204, 60:3.690, 68:1.916, 72:1.563, 82:3.690, 86:3.690, 92:2.350, 94:2.204, 96:4.520, 101:3.690, 102:4.520, 103:4.520, 109:2.204, 111:3.690, 112:2.204, 113:2.204, 114:4.520, 123:1.693, 125:3.690, 127:2.204, 128:2.710, 133:1.693, 140:3.690, 142:1.105, 167:2.204, 173:5.219, 186:3.117, 191:1.693, 192:3.378, 194:1.693, 195:1.693, 196:2.204, 201:2.394, 204:1.511, 207:1.693, 210:3.833, 224:3.690, 225:2.204, 226:1.693, 241:2.394, 244:1.916, 245:1.693, 249:3.690, 250:3.690, 251:3.117]
Key: 5: Value: 1.0: /uudis5.txt = [18:5.835, 27:2.204, 32:6.904, 35:2.204, 36:1.693, 37:2.710, 38:1.693, 39:1.511, 45:3.690, 49:2.204, 53:3.690, 54:3.690, 58:1.916, 64:4.520, 65:4.421, 68:2.710, 72:1.563, 85:1.511, 89:1.916, 112:2.204, 113:2.204, 129:4.520, 136:2.204, 138:2.204, 142:1.915, 144:3.690, 152:1.511, 153:5.219, 164:2.204, 165:2.204, 166:1.916, 181:2.204, 192:2.137, 194:1.693, 204:2.137, 207:2.394, 209:4.520, 211:3.690, 212:2.204, 213:4.520, 220:1.916, 238:3.690, 247:2.204, 254:3.690, 255:3.690, 256:3.690, 260:3.690]
Key: 6: Value: 1.0: /uudis6.txt = [8:2.204, 19:2.204, 37:4.694, 38:2.394, 40:3.117, 41:3.690, 47:1.916, 61:2.204, 65:2.472, 67:2.204, 68:1.916, 72:1.563, 90:5.219, 91:5.219, 92:1.357, 106:3.690, 107:3.690, 109:2.204, 110:3.690, 130:2.204, 133:3.386, 141:1.916, 142:1.563, 163:3.690, 181:2.204, 183:1.511, 190:1.693, 204:1.511, 212:2.204, 225:2.204, 226:1.693, 227:1.693, 231:2.204, 245:1.693, 253:2.204, 261:1.916]
Key: 9: Value: 1.0: /uudis7.txt = [0:5.219, 1:3.690, 4:4.520, 10:6.904, 20:2.204, 26:3.690, 36:4.789, 39:3.022, 44:3.690, 48:3.690, 52:7.310, 56:2.204, 59:4.520, 65:1.563, 69:2.204, 71:3.817, 72:1.915, 77:3.690, 78:3.690, 84:3.690, 85:3.701, 88:3.690, 89:1.916, 93:3.690, 94:2.204, 98:5.399, 99:3.690, 105:2.204, 118:2.204, 122:3.817, 131:8.655, 132:3.690, 134:3.817, 139:2.204, 142:3.126, 149:1.693, 152:3.701, 154:3.690, 156:3.690, 157:4.520, 161:4.520, 168:3.690, 170:4.520, 175:2.204, 176:4.520, 184:3.117, 187:4.520, 188:3.817, 189:3.117, 190:1.693, 195:2.394, 205:3.690, 206:3.117, 216:3.319, 218:3.690, 223:3.690, 227:3.386, 240:3.690, 245:2.394, 246:1.693, 253:2.204, 257:2.204, 261:1.916, 263:2.204]
Key: 8: Value: 1.0: /uudis8.txt = [2:1.916, 38:2.933, 57:1.916, 58:1.916, 65:2.211, 72:1.105, 87:2.204, 104:3.690, 115:3.690, 119:4.520, 137:2.204, 150:2.394, 166:1.916, 183:1.511, 210:1.916, 217:3.690, 226:2.933, 242:3.690, 246:2.394, 247:2.204, 252:3.690]
Key: 9: Value: 1.0: /uudis9.txt = [2:1.916, 11:1.916, 12:4.520, 14:3.690, 17:2.204, 20:2.204, 31:2.204, 33:3.690, 47:1.916, 56:2.204, 65:2.211, 66:4.520, 69:2.204, 71:2.204, 80:3.690, 118:2.204, 134:2.204, 142:1.563, 166:1.916, 190:1.693, 193:4.520, 197:2.204, 199:1.916, 206:3.117, 207:1.693, 216:2.710, 227:1.693, 258:2.204, 263:2.204]
Count: 10
13/11/28 11:48:05 INFO driver.MahoutDriver: Program took 667 ms (Minutes: 0.011116666666666667)

 

Links

https://mahout.apache.org/users/basics/creating-vectors-from-text.html

http://mahout.apache.org/users/clustering/k-means-clustering.html

https://mahout.apache.org/users/clustering/cluster-dumper.html

Posted in Machine Learning

Milline on sinu kuvand?

Posted on November 21, 2013 by margusja

margusja5

Posted in Machine Learning

DHT11

Posted on November 18, 2013 by margusja

2013-11-18 22.00.17

 

Meil on digitaalne temperatuuri- ja niiskuseandur DHT-11. Soovides sealt lugeda andmeid peame andma andurile vastavale viigule õigel ajal õige pikkusega 1 ja 0 formaadis sisendi, mille peale annab andur 40 bitise vastuse.

Suurema osa andurite jaoks on valmis kirjutatud paljude arenusvahendite jaoks vajalikud pakid, mis teevad programeerija elu lihtsamaks.

Aga kui me peaks ikka tahtma teada, kuidas asi päriselt välja näeb, siis on ka selleks võimalus. Nagu ma mainisin, tegu on DHT-11 anduriga, mille andmefailis on kirjas:

Screen Shot 2013-11-18 at 22.16.49

 

Siin on kenasti teave olemas, millistest 5 baidist tagastatav vastus koosneb.

NewFile1

 

Siin on niiskuse osa. Vasaku maha tõmmatud risti alla jääb anduri vastus, et ta on valmis andmeid saatma. Sellele järgneb niiskuse info, kõrgem bit ees ehk meie peame selle keerama 00010100, mis teeb kümnendsüsteemis 40, mis ongi meie niiskuse protsent. Parempoolse risti alla jääb järgmine vastuse bait.

NewFile2

Temperatuuri bait. Vasalult ja paremalt on maha tõmmatud meid hetkel mitte huvitavad paidid. Siin on samuti kõrgem bit ees, ehk:

128 = 0; 68 = 0; 32 = 0; 16 = 1; 8 = 0; 4 = 1; 2 = 1; 1 = 1, mis teeb kokku 23 kraadi.

Kui nüüd vaadata kellegi valmis kirjutatud abiteeki, siis on seal sisalduv informatsioon juba palju selgem.

Posted in Elektroonika

Posts navigation

Older posts
Newer posts

The Master

Categories

  • Apache
  • Apple
  • Assembler
  • Audi
  • BigData
  • BMW
  • C
  • Elektroonika
  • Fun
  • Hadoop
  • help
  • Infotehnoloogia koolis
  • IOT
  • IT
  • IT eetilised
  • Java
  • Langevarjundus
  • Lapsed
  • lastekodu
  • Linux
  • M-401
  • Mac
  • Machine Learning
  • Matemaatika
  • Math
  • MSP430
  • Muusika
  • neo4j
  • openCL
  • Õpetaja identiteet ja tegevusvõimekus
  • oracle
  • PHP
  • PostgreSql
  • ProM
  • R
  • Turvalisus
  • Varia
  • Windows
Proudly powered by WordPress | Theme: micro, developed by DevriX.