Margus Roo – – Page 8 – If you're inventing and pioneering, you have to be willing to be misunderstood for long periods of time

First steps in openCL programming using GPU

Posted on May 3, 2017 by margusja

Ressources we have:

clGetDeviceIDs returned 2 devices

Device 0

CL_DEVICE_NAME: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz

CL_DEVICE_VENDOR: Intel

CL_DEVICE_VERSION: OpenCL 1.2

CL_DRIVER_VERSION: 1.1

CL_DEVICE_PROFILE: FULL_PROFILE

CL_DEVICE_AVAILABLE: 1

CL_DEVICE_COMPILER_AVAILABLE: 1

CL_DEVICE_GLOBAL_MEM_SIZE: 17179869184

CL_DEVICE_LOCAL_MEM_SIZE: 32768

CL_DEVICE_IMAGE_SUPPORT: 1

Device 1

CL_DEVICE_NAME: Intel(R) Iris(TM) Graphics 540

CL_DEVICE_VENDOR: Intel Inc.

CL_DEVICE_VERSION: OpenCL 1.2

CL_DRIVER_VERSION: 1.2(Mar 16 2017 22:07:32)

CL_DEVICE_PROFILE: FULL_PROFILE

CL_DEVICE_AVAILABLE: 1

CL_DEVICE_COMPILER_AVAILABLE: 1

CL_DEVICE_GLOBAL_MEM_SIZE: 1610612736

CL_DEVICE_LOCAL_MEM_SIZE: 65536

CL_DEVICE_IMAGE_SUPPORT: 1

Apache Spark some hints

Posted on April 2, 2017 - February 28, 2018 by margusja

Stages – pipelined jobs RDD -> RDD -> Rdd (narrow)
Suffle – The transfer of data between stages (wide)
Debug – to visualise how do we build RDD – input.toDebugString (input is RDD)
Cache expensive RDDs after shuffle
Use Accumulators (counters inside executors) to debug RDD’s – Values via UI
Pipeline as much as possible (rdd->map->filter) one stage
split into stages to reorganise RDDs
Avoid shuffle large amount of RDDs
Parditioneid 2xCores in cluster
Max – task should not take no longer than 100ms
Memory problem – dmesg oom-killer
Use build in aggregateByKey noy your own aggregation not groupBy
Filter as early you can
Use KyroSerializer
SSD disks YARN local dir (shuffle is faster)
USE High level API’s (DataFrame for core porcessing)
rdd.reduceByKey(func) is better than rdd.groupByKey() and reduce
Use data.join().explain()
RDD.distinct – Shuffles!
Learning Spark (e-book)

scala> List( 1, 2, 4, 3 ).reduce( (x,y) => x + y )
res22: Int = 10

scala> List( 1, 2, 4, 3 ).fold(0)((x,y) => x+y)
res24: Int = 10

scala> List( 1, 2, 4, 3 ).fold(0)((x,y) => { if (x > y) x else y } )
res25: Int = 4

scala> List( 5, 2, 4, 3 ).reduce( (a,b) => { if (a > b) a else b } )
res29: Int = 5

Avoid duplicates during joins

https://docs.databricks.com/spark/latest/faq/join-two-dataframes-duplicated-column.html

Apache-Spark 2.x + Yarn – some errors and solutions

Posted on March 27, 2017 - March 31, 2017 by margusja

Problem:
2017-03-24 09:15:55,235 ERROR [dispatcher-event-loop-2] cluster.YarnScheduler: Lost executor 2 on bigdata38.webmedia.int: Container marked as failed: container_e50_1490337980512_0004_01_000003 on host: bigdata38.webmedia.int. Exit status: 52. Diagnostics: Exception from container-launch.
Container id: container_e50_1490337980512_0004_01_000003
Exit code: 52
Container exited with a non-zero exit code 52
The exit code 52 comes from org.apache.spark.util.SparkExitCode, and it is val OOM=52 – i.e. an OutOfMemoryError

Problem:

2017-03-24 09:33:49,251 WARN [dispatcher-event-loop-4] cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e50_1490337980512_0006_01_000002 on host: bigdata33.webmedia.int. Exit status: -100. Diagnostics: Container released on a *lost* node

2017-03-24 09:33:46,427 WARN nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(311)) – Directory /hadoop/yarn/local error, used space above threshold of 90.0%, removing from list of valid directories

2017-03-24 09:33:46,427 WARN nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(311)) – Directory /hadoop/yarn/log error, used space above threshold of 90.0%, removing from list of valid directories

2017-03-24 09:33:46,427 INFO nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:logDiskStatus(373)) – Disk(s) failed: 1/1 local-dirs are bad: /hadoop/yarn/local; 1/1 log-dirs are bad: /hadoop/yarn/log

2017-03-24 09:33:46,428 ERROR nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:updateDirsAfterTest(366)) – Most of the disks failed. 1/1 local-dirs are bad: /hadoop/yarn/local; 1/1 log-dirs are bad: /hadoop/yarn/log

Problem:

2017-03-24 09:40:45,618 WARN [dispatcher-event-loop-9] scheduler.TaskSetManager: Lost task 53.0 in stage 2.2 (TID 440, bigdata38.webmedia.int): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container marked as failed: container_e50_1490337980512_0006_01_000010 on host: bigdata38.webmedia.int. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

The GC overhead limit means, GC has been running non-stop in quick succession but it was not able to recover much memory. Only reason for that is, either code has been poorly written and have alot of back reference(which is doubtful, as you are doing simple join), or memory capacity has reached.

May-be problem (if it takes long time – usually should be less than 50ms):

2017-03-24 11:46:41,488 INFO recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-0 from (begin) .. (end); will stop at (end)

2017-03-24 11:46:41,489 INFO recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-1 from (begin) .. (end); will stop at ‘NMTokens/appattempt_1490337980512_0011_000001’ @ 10303 : 1

2017-03-24 11:46:41,499 INFO recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-1 from ‘NMTokens/appattempt_1490337980512_0011_000001’ @ 10303 : 1 .. (end); will stop at (end)

2017-03-24 11:46:41,500 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:run(1023)) – Full compaction cycle completed in 20 msec

yarn.resourcemanager.leveldb-state-store.compaction-interval-secs

yarn.timeline-service.leveldb-timeline-store.path

Problem:

ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

Out-Of-Memory error

17/03/31 15:31:12 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker-26,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded

ntpdate: no servers can be used, exitin

Posted on March 23, 2017 by margusja

service ntpd stop
ntpdate 0.rhel.pool.ntp.org
service ntpd start

Ilus pilt Krissuga

Posted on February 28, 2017 by margusja

Harva saab endast nii kena pildi lasta teha. Patt oleks seda siis ainult endale hoida.

Smart Home GUI

Posted on February 8, 2017 - February 8, 2017 by margusja

Now I can operate with some devices on my home via GUI – Awesome!

Hadoop Object Storage – Ozone

Posted on February 3, 2017 - February 3, 2017 by margusja

https://wiki.apache.org/hadoop/Ozone

Downloaded last hadoop development source (hadoop-3.0.0-alpha2) switched to HDFS-7240 branch where ozone development is taking place. Build it – success.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]

OPTIONS is none or any of:

–buildpaths attempt to add class files from build tree
–config dir Hadoop config directory
–daemon (start|status|stop) operate on a daemon
–debug turn on shell script debug mode
–help usage information
–hostnames list[,of,host,names] hosts to use in worker mode
–hosts filename list of hosts to use in worker mode
–loglevel level set the log4j level for this command
–workers turn on worker mode

SUBCOMMAND is one of:

balancer run a cluster balancing utility
cacheadmin configure the HDFS cache
classpath prints the class path needed to get the hadoop jar and the required libraries
crypto configure HDFS encryption zones
datanode run a DFS datanode
debug run a Debug Admin to execute HDFS debug commands
dfsadmin run a DFS admin client
dfs run a filesystem command on the file system
diskbalancer Distributes data evenly among disks on a given node
envvars display computed Hadoop environment variables
erasurecode run a HDFS ErasureCoding CLI
fetchdt fetch a delegation token from the NameNode
fsck run a DFS filesystem checking utility
getconf get config values from configuration
groups get the groups which users belong to
haadmin run a DFS HA admin client
jmxget get JMX exported values from NameNode or DataNode.
journalnode run the DFS journalnode
lsSnapshottableDir list all snapshottable dirs owned by the current user
mover run a utility to move block replicas across storage types
namenode run the DFS namenode
nfs3 run an NFS version 3 gateway
oev apply the offline edits viewer to an edits file
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to a legacy fsimage
oz command line interface for ozone
portmap run a portmap service
scm run the Storage Container Manager service
secondarynamenode run the DFS secondary namenode
snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot
storagepolicies list/get/set block storage policies
version print the version
zkfc run the ZK Failover Controller daemon

As you can see new fancy attributes like oz and scm are there.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ bin/hdfs oz
ERROR: oz is not COMMAND nor fully qualified CLASSNAME.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ bin/hdfs scm
Error: Could not find or load main class

No luck. I was out of ideas so wrote to hadoop users list. No answers. After it I tried hadoop developers list and got help:

Hi Margus,

It looks like there might have been some error when merging trunk into HDFS-7240, which mistakenly
changed some entries in hdfs script. Thanks for the catch!

We will update the branch to fix it. In the meantime, as a quick fix, you can apply the attached
patch file and re-compile, OR do the following manually:

1. open hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
2. between
oiv_legacy)
       HADOOP_CLASSNAME=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer
     ;;
 and
portmap)
       HADOOP_SUBCMD_SUPPORTDAEMONIZATION="true"
       HADOOP_CLASSNAME=org.apache.hadoop.portmap.Portmap
     ;;
add
oz) 
    HADOOP_CLASSNAME=org.apache.hadoop.ozone.web.ozShell.Shell 
;;
3. change this line
CLASS='org.apache.hadoop.ozone.storage.StorageContainerManager'
to
HADOOP_CLASSNAME='org.apache.hadoop.ozone.storage.StorageContainerManager'
4. re-compile.


rebuild it and it helped.

Lets try to play whit a new toy.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -createVolume http://127.0.0.1:9864/margusja -user ozone -quota 10GB -root
Volume name : margusja
{
 "owner" : {
 "name" : "ozone"
 },
 "quota" : {
 "unit" : "GB",
 "size" : 10
 },
 "volumeName" : "margusja",
 "createdOn" : "Fri, 03 Feb 2017 10:13:39 GMT",
 "createdBy" : "hdfs"
}

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -createBucket http://127.0.0.1:9864/margusja/demo -user ozone -v
Volume Name : margusja
Bucket Name : demo
{
 "volumeName" : "margusja",
 "bucketName" : "demo",
 "acls" : null,
 "versioning" : "DISABLED",
 "storageType" : "DISK"
}

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -putKey http://127.0.0.1:9864/margusja/demo/key001 -file margusja.txt
Volume Name : margusja
Bucket Name : demo
Key Name : key001
File Hash : 4273b3664fcf8bd89fd2b6d25cdf64ae


[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -putKey http://127.0.0.1:9864/margusja/demo/key002 -file margusja2.txt
Volume Name : margusja
Bucket Name : demo
Key Name : key002

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -listKey http://127.0.0.1:9864/margusja/demo/
Volume Name : margusja
bucket Name : demo
{
 "version" : 0,
 "md5hash" : "4273b3664fcf8bd89fd2b6d25cdf64ae",
 "createdOn" : "Fri, 03 Feb 2017 12:25:43 +0200",
 "size" : 21,
 "keyName" : "key001"
}
{
 "version" : 0,
 "md5hash" : "4273b3664fcf8bd89fd2b6d25cdf64ae",
 "createdOn" : "Fri, 03 Feb 2017 12:26:14 +0200",
 "size" : 21,
 "keyName" : "key002"
}
[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$


To compare with filesystem we created directory /margusja after it created subdirectory margusja/demo and finally added two files to margusja/demo/. 
So the picture is smth like

/margusja (volume)
/margusja/demo (bucket)
/margusja/demo/margusja.txt (key001)
/margusja/demo/margusja2.txt (key002)

Kodune el. boiler on võrgus

Posted on December 11, 2016 - December 11, 2016 by margusja

sonoff pow to Sonoff-MQTT-OTA-Arduino

Posted on November 27, 2016 - November 27, 2016 by margusja

Hiinlased on tulnud välja päris taskukohase tükiga – https://www.itead.cc/sonoff-pow.html. Tegemis on wifi kaudu lülitatava releega (230v/16A) piisav enamus kodumajapidamises ühefaasiliste jubinate kontrollimiseks.

Kui nüüd jubin lahti võtta (küsimusele: “Miks peaks?” otsige vastust raamatust “Hackers: Heroes of the Computer Revolution” by S. Levy), siis leiame sealt huvipakkuva pordi:

GND ja VDD vahele lähevad veel serial RX ja TX.

Loodus tühja kohta ei salli. Github’ist leiab projekti https://github.com/arendst/Sonoff-MQTT-OTA-Arduino. Tänud Ull’le (alias Märt Maiste), kes need kaks asja mul kokku aitas panna.

Edasi on lihtne. Tuleb github projekt alla laadida. Kokku lasta ja jubina sisse lasta. Kuna mul parajasti ühtegi töökorras FTDI plaati ei olnud, siis aitas arduino plaat hädast välja.

Kui nüüd jubin kenasti vooluvõrku panna ja muud seadistused teha, siis peaks kodusest DHCP serverist saama ta IP ja avades selle IP veebilehitsejas peaks avanema pilt:

Kõnealune jubin toetab MQTT protokolli, mis annab väga vajaliku kihi raud- ja tarkvara vahele.

Mina paigaldasin raspberry pi peale mosquitto MQTT serveri (tnx Ull vihje eest). Nüüd on võimalik MQTT sub käsuga kuulata jubina staatust. Näiteks kas ta on sisse lülitatud, pinget, voolu tarbimist ja palju muud veel. Kõike seda saab ka veebiliidese kaudu

Kui nüüd WAN port suunata raspberry 22 porti, saab (juhul kui internet on olemas ja kodus ka LAN peal kõik toimib) kontrollida eemalt oma jubinaid

Lisaks peaks kogu see kompott kokku istuma OpenHub projektiga.