Skip to content

Margus Roo –

If you're inventing and pioneering, you have to be willing to be misunderstood for long periods of time

  • Cloudbreak Autoscale fix
  • Endast

apache hive how to join log files and use sql queries over joined data

Posted on September 22, 2014 - September 22, 2014 by margusja

Let’s create two very simple log files. Into log1.txt file lets put in example users problems log data and into log2.txt file solutions log data

log1.txt:

user1 | 2014-09-23 | error message 1
user2 | 2014-09-23 | error message 2
user3 | 2014-09-23 | error message 3
user4 | 2014-09-23 | error message 1
user5 | 2014-09-23 | error message 2
user6 | 2014-09-23 | error message 12
user7 | 2014-09-23 | error message 11
user1 | 2014-09-24 | error message 1
user2 | 2014-09-24 | error message 2
user3 | 2014-09-24 | error message 3
user4 | 2014-09-24 | error message 10
user1 | 2014-09-24 | error message 17
user2 | 2014-09-24 | error message 13
user1 | 2014-09-24 | error message 1

log2.txt:
user1 | support2 | solution message 1
user2 | support1 | solution message 2
user3 | support2 | solution message 3
user1 | support1 | solution message 4
user2 | support2 | solution message 5
user4 | support1 | solution message 6
user2 | support2 | solution message 7
user5 | support1 | solution message 8

Create two tables for datasets above:

hive> create table log1 (user STRING, date STRING, error STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’ STORED AS TEXTFILE;       

OK

Time taken: 5.968 seconds

hive> LOAD DATA INPATH ‘/user/margusja/hiveinput/log1.txt’ OVERWRITE INTO TABLE log1;                                             

Loading data to table default.log1

rmr: DEPRECATED: Please use ‘rm -r’ instead.

Moved: ‘hdfs://bigdata1.host.int:8020/apps/hive/warehouse/log1’ to trash at: hdfs://bigdata1.host.int:8020/user/margusja/.Trash/Current

Table default.log1 stats: [numFiles=1, numRows=0, totalSize=523, rawDataSize=0]

OK

Time taken: 4.687 seconds

hive> create table log2 (user STRING, support STRING, solution STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘|’ STORED AS TEXTFILE;

OK

Time taken: 0.997 seconds

hive> LOAD DATA INPATH ‘/user/margusja/hiveinput/log2.txt’ OVERWRITE INTO TABLE log2;                                                   

Loading data to table default.log2

rmr: DEPRECATED: Please use ‘rm -r’ instead.

Moved: ‘hdfs://bigdata1.host.int:8020/apps/hive/warehouse/log2’ to trash at: hdfs://bigdata1.host.int:8020/user/margusja/.Trash/Current

Table default.log2 stats: [numFiles=1, numRows=0, totalSize=304, rawDataSize=0]

OK

Time taken: 0.72 seconds

hive>

Now let’s make SQL over two datafile placed to HDFS storage using HIVE:

hive> select log1.user, log1.date, log1.error, log2.support, log2.solution from log2 join log1 on (log1.user = log2.user);

And result. We see now how two separated log file are joined together and now we can see in example that user2 has error message 2 in 2012-09-23 and support2 offered solution message 7.

user1  2014-09-23  error message 1 support2  solution message 1

user1  2014-09-23  error message 1 support1  solution message 4

user2  2014-09-23  error message 2 support1  solution message 2

user2  2014-09-23  error message 2 support2  solution message 5

user2  2014-09-23  error message 2 support2  solution message 7

user3  2014-09-23  error message 3 support2  solution message 3

user4  2014-09-23  error message 1 support1  solution message 6

user5  2014-09-23  error message 2 support1  solution message 8

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

user2  2014-09-24  error message 2 support1  solution message 2

user2  2014-09-24  error message 2 support2  solution message 5

user2  2014-09-24  error message 2 support2  solution message 7

user3  2014-09-24  error message 3 support2  solution message 3

user4  2014-09-24  error message 10 support1  solution message 6

user1  2014-09-24  error message 17 support2  solution message 1

user1  2014-09-24  error message 17 support1  solution message 4

user2  2014-09-24  error message 13 support1  solution message 2

user2  2014-09-24  error message 13 support2  solution message 5

user2  2014-09-24  error message 13 support2  solution message 7

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

Time taken: 34.561 seconds, Fetched: 22 row(s)

More cool things:

We can select only specified user:

hive> select log1.user, log1.date, log1.error, log2.support, log2.solution from log2 join log1 on (log1.user = log2.user) where log1.user like ‘%user1%’;

user1  2014-09-23  error message 1 support2  solution message 1

user1  2014-09-23  error message 1 support1  solution message 4

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

user1  2014-09-24  error message 17 support2  solution message 1

user1  2014-09-24  error message 17 support1  solution message 4

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

Time taken: 31.932 seconds, Fetched: 8 row(s)

We can query by date:

hive> select log1.user, log1.date, log1.error, log2.support, log2.solution from log2 join log1 on (log1.user = log2.user) where log1.date like ‘%2014-09-23%’;

user1  2014-09-23  error message 1 support2  solution message 1

user1  2014-09-23  error message 1 support1  solution message 4

user2  2014-09-23  error message 2 support1  solution message 2

user2  2014-09-23  error message 2 support2  solution message 5

user2  2014-09-23  error message 2 support2  solution message 7

user3  2014-09-23  error message 3 support2  solution message 3

user4  2014-09-23  error message 1 support1  solution message 6

user5  2014-09-23  error message 2 support1  solution message 8

Now lets forward our awesome join sentence to the next table – log3 where we are going to hold our joined data

hive> create table log3 (user STRING, date STRING, error STRING, support STRING, solution STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ \t’ STORED AS TEXTFILE;

hive> insert table log3 select log1.user, log1.date, log1.error, log2.support, log2.solution from log2 join log1 on (log1.user = log2.user)

And now we cat use very simple sql to get data:

hive> select * from log3;

OK

user1  2014-09-23  error message 1 support2  solution message 1

user1  2014-09-23  error message 1 support1  solution message 4

user2  2014-09-23  error message 2 support1  solution message 2

user2  2014-09-23  error message 2 support2  solution message 5

user2  2014-09-23  error message 2 support2  solution message 7

user3  2014-09-23  error message 3 support2  solution message 3

user4  2014-09-23  error message 1 support1  solution message 6

user5  2014-09-23  error message 2 support1  solution message 8

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

user2  2014-09-24  error message 2 support1  solution message 2

user2  2014-09-24  error message 2 support2  solution message 5

user2  2014-09-24  error message 2 support2  solution message 7

user3  2014-09-24  error message 3 support2  solution message 3

user4  2014-09-24  error message 10 support1  solution message 6

user1  2014-09-24  error message 17 support2  solution message 1

user1  2014-09-24  error message 17 support1  solution message 4

user2  2014-09-24  error message 13 support1  solution message 2

user2  2014-09-24  error message 13 support2  solution message 5

user2  2014-09-24  error message 13 support2  solution message 7

user1  2014-09-24  error message 1 support2  solution message 1

user1  2014-09-24  error message 1 support1  solution message 4

Time taken: 0.075 seconds, Fetched: 22 row(s)

hive>

Posted in BigData, Hadoop

Driving 220V 10A relay with Atmega328p

Posted on September 13, 2014 by margusja

Posted in Elektroonika

uus mänguasi – AR.Drone 2.0

Posted on September 10, 2014 by margusja

2014-09-10 21.30.57

Posted in Fun

Nädalavahetus langevarjunduses

Posted on September 7, 2014 by margusja

mar2_swoop mar1_swoop mar_tandem4 mar_tandem3 mar_tandem2 mar_tandem1 mar_exit

Posted in Langevarjundus

WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_xxx is : 1

Posted on September 3, 2014 by margusja

2014-09-03 14:34:46,574 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1409734969311_0027_000001 (auth:SIMPLE)

2014-09-03 14:34:46,773 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1409734969311_0027_01_000001 by user margusja

2014-09-03 14:34:46,810 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1409734969311_0027

2014-09-03 14:34:46,816 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=margusjaIP=90.190.106.48OPERATION=Start Container RequestTARGET=ContainerManageImplRESULT=SUCCESSAPPID=application_1409734969311_0027CONTAINERID=container_1409734969311_0027_01_000001

2014-09-03 14:34:46,817 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1409734969311_0027 transitioned from NEW to INITING

2014-09-03 14:34:46,818 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1409734969311_0027_01_000001 to application application_1409734969311_0027

2014-09-03 14:34:46,828 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1409734969311_0027 transitioned from INITING to RUNNING

2014-09-03 14:34:46,838 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1409734969311_0027_01_000001 transitioned from NEW to LOCALIZING

2014-09-03 14:34:46,838 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1409734969311_0027

2014-09-03 14:34:46,895 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.jar transitioned from INIT to DOWNLOADING

2014-09-03 14:34:46,895 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.splitmetainfo transitioned from INIT to DOWNLOADING

Screen Shot 2014-09-03 at 15.05.25

2014-09-03 14:34:46,895 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.split transitioned from INIT to DOWNLOADING

2014-09-03 14:34:46,895 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.xml transitioned from INIT to DOWNLOADING

2014-09-03 14:34:46,895 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1409734969311_0027_01_000001

2014-09-03 14:34:47,105 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-yarn/nm-local-dir/nmPrivate/container_1409734969311_0027_01_000001.tokens. Credentials list: 

2014-09-03 14:34:47,109 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user margusja

2014-09-03 14:34:47,172 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-yarn/nm-local-dir/nmPrivate/container_1409734969311_0027_01_000001.tokens to /tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/container_1409734969311_0027_01_000001.tokens

2014-09-03 14:34:47,172 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027 = file:/tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027

2014-09-03 14:34:48,150 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.jar(->file:/tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/filecache/10/job.jar) transitioned from DOWNLOADING to LOCALIZED

2014-09-03 14:34:48,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.splitmetainfo(->file:/tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/filecache/11/job.splitmetainfo) transitioned from DOWNLOADING to LOCALIZED

2014-09-03 14:34:48,211 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.split(->file:/tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/filecache/12/job.split) transitioned from DOWNLOADING to LOCALIZED

2014-09-03 14:34:48,249 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://h14.dbweb.ee:8020/user/margusja/.staging/job_1409734969311_0027/job.xml(->file:/tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/filecache/13/job.xml) transitioned from DOWNLOADING to LOCALIZED

2014-09-03 14:34:48,251 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1409734969311_0027_01_000001 transitioned from LOCALIZING to LOCALIZED

2014-09-03 14:34:48,300 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1409734969311_0027_01_000001 transitioned from LOCALIZED to RUNNING

2014-09-03 14:34:48,310 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /tmp/hadoop-yarn/nm-local-dir/usercache/margusja/appcache/application_1409734969311_0027/container_1409734969311_0027_01_000001/default_container_executor.sh]

2014-09-03 14:34:48,557 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1409734969311_0027_01_000001 is : 1

2014-09-03 14:34:48,559 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1409734969311_0027_01_000001 and exit code: 1

org.apache.hadoop.util.Shell$ExitCodeException: 

at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)

at org.apache.hadoop.util.Shell.run(Shell.java:418)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

The problem:

  • org.apache.hadoop.mapreduce.v2.app.MRAppMaster was missing!

Solution: you need install:

(1/2): hadoop-client                                                                                                                                                                                                    |  16 kB     00:00     

(2/2): hadoop-mapreduce

Posted in HadoopTagged Hadoop

Multivariate data detect outliers with mahout and Mahalanobis distance algorithm

Posted on August 14, 2014 by margusja

I am not mathematician but one of our project needed that we will find outliers from multivariate population. As I understand Mahalanobis distance is one widely used algorithm.  This is example with very simple dataset to show distance between normal points and outlier.

So my simple dataset is two dimension so we can display it in x;y graph:

{1;2, 2;4, 3;6, 3;2, 4;8}

Let’s put it into paper

2014-08-14 14.02.06

 

the outlier is clearly visible – 3;2

Now I use mahout MahalanobisDistanceMeasure package (org.apache.mahout.common.distance.MahalanobisDistanceMeasure)

 

package com.deciderlab.MahalanobisDistanceMeasure;

import org.apache.mahout.common.distance.MahalanobisDistanceMeasure;

import org.apache.mahout.math.Matrix;

import org.apache.mahout.math.RandomAccessSparseVector;

import org.apache.mahout.math.SparseMatrix;

import org.apache.mahout.math.Vector;

publicclass DistanceMahalanobisSample {

  publicstaticvoid main(String[] args) {

    double[][] d = { { 1.0, 2.0 }, { 2.0, 4.0 },

        { 3.0, 6.0 }, { 3.0, 2.0 }, { 4.0, 8.0 } };

    Vector v1 = new RandomAccessSparseVector(2);

    v1.assign(d[0]);

    Vector v2 = new RandomAccessSparseVector(2);

    v2.assign(d[1]);

    Vector v3 = new RandomAccessSparseVector(2);

    v3.assign(d[2]);

    Vector v4 = new RandomAccessSparseVector(2);

    v4.assign(d[3]);

    Vector v5 = new RandomAccessSparseVector(2);

    v5.assign(d[4]);

    Matrix matrix = new SparseMatrix(2, 2);

    matrix.assignRow(0, v1);

    matrix.assignRow(1, v2);

    double distance1;

    double distance2;

    MahalanobisDistanceMeasure dmM = new MahalanobisDistanceMeasure();

    dmM.setInverseCovarianceMatrix(matrix);

    distance0 = dmM.distance(v2, v1);

    distance1 = dmM.distance(v2, v3);

    distance2 = dmM.distance(v2, v4);

    System.out.println(“d0=” + distance0 +  ” ,d1=” + distance1 + “, d2=” + distance2);

  }

}

Compile it. I use maven to deal with dependencies.

Run it:

[margusja@vm37 MahalanobisDistanceMeasure]$ hadoop jar /var/www/html/margusja/MahalanobisDistanceMeasure/target/MahalanobisDistanceMeasure-1.0-SNAPSHOT.jar com.deciderlab.MahalanobisDistanceMeasure.DistanceMahalanobisSample

d0=5.0 ,d1=5.0, d2=3.0

So, distance between v1 (1;2) and v2 (2;4) is 5.0 and distance between v2 (2;4) and v3(3;6) is 5.0 but distance between v2(2;4) and v4 (3;2) is 3.0. So it allows me mark record (3;2) mark as outlier.

 

Posted in Machine Learning

how to start hadoop MRv2

Posted on August 13, 2014 - August 13, 2014 by margusja

yarn_architecture

Since hadoop MRv1 and MRv2 are different I found good starting point from http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_using-apache-hadoop/content/running_mapreduce_examples_on_yarn.html.

[root@vm37 hadoop-yarn]# yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar 

An example program must be given as the first argument.

Valid program names are:

  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

  dbcount: An example job that count the pageview counts from a database.

  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

  grep: A map/reduce program that counts the matches of a regex in the input.

  join: A job that effects a join over sorted, equally partitioned datasets

  multifilewc: A job that counts words from several files.

  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

  randomwriter: A map/reduce program that writes 10GB of random data per node.

  secondarysort: An example defining a secondary sort to the reduce.

  sort: A map/reduce program that sorts the data written by the random writer.

  sudoku: A sudoku solver.

  teragen: Generate data for the terasort

  terasort: Run the terasort

  teravalidate: Checking results of terasort

  wordcount: A map/reduce program that counts the words in the input files.

  wordmean: A map/reduce program that counts the average length of the words in the input files.

  wordmedian: A map/reduce program that counts the median length of the words in the input files.

  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

Now lets take wordcount example. I have downloaded example dataset and put it into hadoop fs

[root@vm37 hadoop-mapreduce]# hfds dfs -put wc.txt /user/margusja/wc/input/

Now execute mapreduce job

[root@vm37 hadoop-mapreduce]# yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar wordcount /user/margusja/wc/input /user/margusja/wc/output

14/08/13 12:34:59 INFO client.RMProxy: Connecting to ResourceManager at vm38.dbweb.ee/192.168.1.72:8032

14/08/13 12:35:00 INFO input.FileInputFormat: Total input paths to process : 1

14/08/13 12:35:01 INFO mapreduce.JobSubmitter: number of splits:1

14/08/13 12:35:01 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class

14/08/13 12:35:01 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

14/08/13 12:35:01 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name

14/08/13 12:35:01 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class

14/08/13 12:35:01 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

14/08/13 12:35:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1407837551751_0037

14/08/13 12:35:01 INFO impl.YarnClientImpl: Submitted application application_1407837551751_0037 to ResourceManager at vm38.dbweb.ee/192.168.1.72:8032

14/08/13 12:35:02 INFO mapreduce.Job: The url to track the job: http://vm38.dbweb.ee:8088/proxy/application_1407837551751_0037/

14/08/13 12:35:02 INFO mapreduce.Job: Running job: job_1407837551751_0037

14/08/13 12:35:11 INFO mapreduce.Job: Job job_1407837551751_0037 running in uber mode : false

14/08/13 12:35:11 INFO mapreduce.Job:  map 0% reduce 0%

14/08/13 12:35:21 INFO mapreduce.Job:  map 100% reduce 0%

14/08/13 12:35:31 INFO mapreduce.Job:  map 100% reduce 100%

14/08/13 12:35:31 INFO mapreduce.Job: Job job_1407837551751_0037 completed successfully

14/08/13 12:35:31 INFO mapreduce.Job: Counters: 43

        File System Counters

                FILE: Number of bytes read=167524

                FILE: Number of bytes written=493257

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=384341

                HDFS: Number of bytes written=120766

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters 

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=8033

                Total time spent by all reduces in occupied slots (ms)=7119

        Map-Reduce Framework

                Map input records=9488

                Map output records=67825

                Map output bytes=643386

                Map output materialized bytes=167524

                Input split bytes=134

                Combine input records=67825

                Combine output records=11900

                Reduce input groups=11900

                Reduce shuffle bytes=167524

                Reduce input records=11900

                Reduce output records=11900

                Spilled Records=23800

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=172

                CPU time spent (ms)=5880

                Physical memory (bytes) snapshot=443211776

                Virtual memory (bytes) snapshot=1953267712

                Total committed heap usage (bytes)=317194240

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters 

                Bytes Read=384207

        File Output Format Counters 

                Bytes Written=120766

…

And in my cluster UI I can see my submitted job

Screen Shot 2014-08-13 at 12.35.16

After job is finished you can explore result via HDFS UI

Screen Shot 2014-08-13 at 12.41.04 Screen Shot 2014-08-13 at 12.40.18

or you can move it to your local dir using hdfs command line command

[root@vm37 hadoop-mapreduce]# hdfs dfs -get /user/margusja/wc/output/part-r-00000

[root@vm37 hadoop-mapreduce]# head part-r-00000 

”       6

“‘Among 2

“‘And   1

“‘Appen 1

“‘Ce    1

“‘Doigts’       1

“‘E’s   1

“‘Ello, 1

“‘Er    1

“‘Er’s  1

Next step might by digging into source code

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/

I am lazy person so I use maven to manage my java class dependencies

[margusja@vm37 ~]$ mvn archetype:generate -DgroupId=com.deciderlab.wordcount -DartifactId=wordcount -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

[INFO] Scanning for projects…

[INFO]                                                                         

[INFO] ————————————————————————

[INFO] Building Maven Stub Project (No POM) 1

[INFO] ————————————————————————

[INFO] 

[INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom >>>

[INFO] 

[INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom <<<

[INFO] 

[INFO] — maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom —

[INFO] Generating project in Batch mode

[INFO] —————————————————————————-

[INFO] Using following parameters for creating project from Old (1.x) Archetype: maven-archetype-quickstart:1.0

[INFO] —————————————————————————-

[INFO] Parameter: groupId, Value: com.deciderlab.wordcount

[INFO] Parameter: packageName, Value: com.deciderlab.wordcount

[INFO] Parameter: package, Value: com.deciderlab.wordcount

[INFO] Parameter: artifactId, Value: wordcount

[INFO] Parameter: basedir, Value: /var/www/html/margusja

[INFO] Parameter: version, Value: 1.0-SNAPSHOT

[INFO] project created from Old (1.x) Archetype in dir: /var/www/html/margusja/wordcount

[INFO] ————————————————————————

[INFO] BUILD SUCCESS

[INFO] ————————————————————————

[INFO] Total time: 6.382s

[INFO] Finished at: Wed Aug 13 13:08:59 EEST 2014

[INFO] Final Memory: 14M/105M

[INFO] ————————————————————————

Now you can move WordCount.java source to your src/main/…[whar ever is your dir]

I made some changes in Wordcount.java

…

package com.deciderlab.wordcount;

…

Job job = new Job(conf, “Margusja’s word count demo”);

…

in pom.xml I added some dependencies

    <dependency>

      <groupId>org.apache.hadoop</groupId>

      <artifactId>hadoop-common</artifactId>

      <version>2.4.1</version>

    </dependency>

    <dependency>

      <groupId>org.apache.hadoop</groupId>

      <artifactId>hadoop-core</artifactId>

      <version>1.2.1</version>

    </dependency>

now build your jar

[margusja@vm37 wordcount]$ mvn package

[INFO] Scanning for projects…

[INFO]                                                                         

[INFO] ————————————————————————

[INFO] Building wordcount 1.0-SNAPSHOT

[INFO] ————————————————————————

[INFO] 

[INFO] — maven-resources-plugin:2.5:resources (default-resources) @ WordCount —

[debug] execute contextualize

[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!

[INFO] skip non existing resourceDirectory /var/www/html/margusja/wordcount/src/main/resources

[INFO] 

[INFO] — maven-compiler-plugin:2.3.2:compile (default-compile) @ WordCount —

[INFO] Nothing to compile – all classes are up to date

[INFO] 

[INFO] — maven-resources-plugin:2.5:testResources (default-testResources) @ WordCount —

[debug] execute contextualize

[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!

[INFO] skip non existing resourceDirectory /var/www/html/margusja/wordcount/src/test/resources

[INFO] 

[INFO] — maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ WordCount —

[INFO] Nothing to compile – all classes are up to date

[INFO] 

[INFO] — maven-surefire-plugin:2.10:test (default-test) @ WordCount —

[INFO] Surefire report directory: /var/www/html/margusja/wordcount/target/surefire-reports

——————————————————-

 T E S T S

——————————————————-

Running com.deciderlab.wordcount.AppTest

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.052 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] 

[INFO] — maven-jar-plugin:2.3.2:jar (default-jar) @ WordCount —

[INFO] ————————————————————————

[INFO] BUILD SUCCESS

[INFO] ————————————————————————

[INFO] Total time: 4.552s

[INFO] Finished at: Wed Aug 13 13:58:08 EEST 2014

[INFO] Final Memory: 12M/171M

[INFO] ————————————————————————

[margusja@vm37 wordcount]$ 

Now you are ready to lunch your first hadoop MRv2 job

[margusja@vm37 wordcount]$ hadoop jar /var/www/html/margusja/wordcount/target/WordCount-1.0-SNAPSHOT.jar com.deciderlab.wordcount.WordCount /user/margusja/wc/input /user/margusja/wc/output4

…

14/08/13 14:01:38 INFO mapreduce.Job: Running job: job_1407837551751_0040

14/08/13 14:01:47 INFO mapreduce.Job: Job job_1407837551751_0040 running in uber mode : false

14/08/13 14:01:47 INFO mapreduce.Job:  map 0% reduce 0%

14/08/13 14:01:58 INFO mapreduce.Job:  map 100% reduce 0%

14/08/13 14:02:07 INFO mapreduce.Job:  map 100% reduce 100%

14/08/13 14:02:08 INFO mapreduce.Job: Job job_1407837551751_0040 completed successfully

14/08/13 14:02:08 INFO mapreduce.Job: Counters: 43

        File System Counters

                FILE: Number of bytes read=167524

                FILE: Number of bytes written=493091

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=384341

                HDFS: Number of bytes written=120766

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters 

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=7749

                Total time spent by all reduces in occupied slots (ms)=6591

        Map-Reduce Framework

                Map input records=9488

                Map output records=67825

                Map output bytes=643386

                Map output materialized bytes=167524

                Input split bytes=134

                Combine input records=67825

                Combine output records=11900

                Reduce input groups=11900

                Reduce shuffle bytes=167524

                Reduce input records=11900

                Reduce output records=11900

                Spilled Records=23800

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=114

                CPU time spent (ms)=6020

                Physical memory (bytes) snapshot=430088192

                Virtual memory (bytes) snapshot=1945890816

                Total committed heap usage (bytes)=317194240

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters 

                Bytes Read=384207

        File Output Format Counters 

                Bytes Written=120766

You can examine your running jobs

Screen Shot 2014-08-13 at 14.01.48

And result in HDFS UI

Screen Shot 2014-08-13 at 14.04.08

Posted in BigData

Wireless temperature measure system

Posted on August 8, 2014 - August 8, 2014 by margusja

The main components are atmega328p micro cpu, RFM12b radio modules and DS18S20 for temperature measurement.

Temperature sender unit

2014-08-08 13.48.01

Sender code – github

Senders can power up using USB power converters or pattery pack.
Display unit gets power from any USB.

 

Display unit owns LCD plug (MCP23008) between atmega328p and LDC.

2014-08-08 13.47.00 2014-08-08 13.46.51

2014-08-08 13.47.29

Now I thought it might by cool to collect all sensors data.

So I build network module who collects the same data that display module gets but sends it to my zabbix server.

Network module is built using ENC28J60

2014-08-08 14.07.28 2014-08-08 14.07.17

So now I can see cool graphs

Screen Shot 2014-08-08 at 14.11.39

Posted in Elektroonika

In case of RStudio repositorium fails

Posted on August 5, 2014 by margusja

> install.packages(c(“maps”, “mapproj”))
Warning in install.packages :
cannot open: HTTP status was ‘404 Not Found’
Warning in install.packages :
cannot open: HTTP status was ‘404 Not Found’
Warning in install.packages :
unable to access index for repository http://mirrors.webhostinggeeks.com/cran/bin/macosx/contrib/3.1
Warning in install.packages :
cannot open: HTTP status was ‘404 Not Found’
Warning in install.packages :
cannot open: HTTP status was ‘404 Not Found’
Warning in install.packages :
unable to access index for repository http://mirrors.webhostinggeeks.com/cran/bin/macosx/contrib/3.1
Warning in install.packages :
packages ‘maps’, ‘mapproj’ are not available (for R version 3.1.1)
> options(repos = c(CRAN = “http://cran.rstudio.com”))
> install.packages(c(“maps”, “mapproj”))
trying URL ‘http://cran.rstudio.com/bin/macosx/contrib/3.1/maps_2.3-7.tgz’
Content type ‘application/x-gzip’ length 2061757 bytes (2.0 Mb)
opened URL
==================================================
downloaded 2.0 Mb

trying URL ‘http://cran.rstudio.com/bin/macosx/contrib/3.1/mapproj_1.2-2.tgz’
Content type ‘application/x-gzip’ length 68904 bytes (67 Kb)
opened URL
==================================================
downloaded 67 Kb

The downloaded binary packages are in
/var/folders/vm/5pggdh2x3_s_l6z55brtql3h0000gn/T//Rtmpty6Yxo/downloaded_packages

Posted in R

Suvi

Posted on August 4, 2014 by margusja

2014-08-04 14.03.19

Posted in Fun

Posts navigation

Older posts
Newer posts

The Master

Categories

  • Apache
  • Apple
  • Assembler
  • Audi
  • BigData
  • BMW
  • C
  • Elektroonika
  • Fun
  • Hadoop
  • help
  • Infotehnoloogia koolis
  • IOT
  • IT
  • IT eetilised
  • Java
  • Langevarjundus
  • Lapsed
  • lastekodu
  • Linux
  • M-401
  • Mac
  • Machine Learning
  • Matemaatika
  • Math
  • MSP430
  • Muusika
  • neo4j
  • openCL
  • Õpetaja identiteet ja tegevusvõimekus
  • oracle
  • PHP
  • PostgreSql
  • ProM
  • R
  • Turvalisus
  • Varia
  • Windows
Proudly powered by WordPress | Theme: micro, developed by DevriX.