Apache Spark some hints

  • Stages – pipelined jobs RDD -> RDD -> Rdd (narrow)
  • Suffle – The transfer of data between stages  (wide)
  • Debug – to visualise how do we build RDD – input.toDebugString (input is RDD)
  • Cache expensive RDDs after shuffle
  • Use Accumulators (counters inside executors) to debug RDD’s – Values via UI
  • Pipeline as much as possible (rdd->map->filter) one stage
  • split into stages to reorganise RDDs
  • Avoid shuffle large amount of RDDs
  • Parditioneid 2xCores in cluster
  • Max – task should not take no longer than 100ms
  • Memory problem – dmesg oom-killer
  • Use build in aggregateByKey noy your own aggregation not groupBy
  • Filter as early you can
  • Use KyroSerializer
  • SSD disks YARN local dir (shuffle is faster)
  • USE High level API’s (DataFrame for core porcessing)
  • rdd.reduceByKey(func) is better than rdd.groupByKey() and reduce
  • Use data.join().explain()

    RDD.distinct – Shuffles!

  • Learning Spark (e-book)

Apache-Spark 2.x + Yarn – some errors and solutions

Problem:
2017-03-24 09:15:55,235 ERROR [dispatcher-event-loop-2] cluster.YarnScheduler: Lost executor 2 on bigdata38.webmedia.int: Container marked as failed: container_e50_1490337980512_0004_01_000003 on host: bigdata38.webmedia.int. Exit status: 52. Diagnostics: Exception from container-launch.
Container id: container_e50_1490337980512_0004_01_000003
Exit code: 52
Container exited with a non-zero exit code 52
The exit code 52 comes from org.apache.spark.util.SparkExitCode, and it is val OOM=52 – i.e. an OutOfMemoryError

Problem:

2017-03-24 09:33:49,251 WARN  [dispatcher-event-loop-4] cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e50_1490337980512_0006_01_000002 on host: bigdata33.webmedia.int. Exit status: -100. Diagnostics: Container released on a *lost* node

2017-03-24 09:33:46,427 WARN  nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(311)) – Directory /hadoop/yarn/local error, used space above threshold of 90.0%, removing from list of valid directories

2017-03-24 09:33:46,427 WARN  nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(311)) – Directory /hadoop/yarn/log error, used space above threshold of 90.0%, removing from list of valid directories

2017-03-24 09:33:46,427 INFO  nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:logDiskStatus(373)) – Disk(s) failed: 1/1 local-dirs are bad: /hadoop/yarn/local; 1/1 log-dirs are bad: /hadoop/yarn/log

2017-03-24 09:33:46,428 ERROR nodemanager.LocalDirsHandlerService (LocalDirsHandlerService.java:updateDirsAfterTest(366)) – Most of the disks failed. 1/1 local-dirs are bad: /hadoop/yarn/local; 1/1 log-dirs are bad: /hadoop/yarn/log

 

Problem:

2017-03-24 09:40:45,618 WARN  [dispatcher-event-loop-9] scheduler.TaskSetManager: Lost task 53.0 in stage 2.2 (TID 440, bigdata38.webmedia.int): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Container marked as failed: container_e50_1490337980512_0006_01_000010 on host: bigdata38.webmedia.int. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

The GC overhead limit means, GC has been running non-stop in quick succession but it was not able to recover much memory. Only reason for that is, either code has been poorly written and have alot of back reference(which is doubtful, as you are doing simple join), or memory capacity has reached.

May-be problem (if it takes long time – usually should be less than 50ms):

2017-03-24 11:46:41,488 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-0 from (begin) .. (end); will stop at (end)

2017-03-24 11:46:41,489 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-1 from (begin) .. (end); will stop at ‘NMTokens/appattempt_1490337980512_0011_000001’ @ 10303 : 1

2017-03-24 11:46:41,499 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(1032)) – Manual compaction at level-1 from ‘NMTokens/appattempt_1490337980512_0011_000001’ @ 10303 : 1 .. (end); will stop at (end)

2017-03-24 11:46:41,500 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:run(1023)) – Full compaction cycle completed in 20 msec

yarn.resourcemanager.leveldb-state-store.compaction-interval-secs

yarn.timeline-service.leveldb-timeline-store.path

Problem:

ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

Out-Of-Memory error

17/03/31 15:31:12 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] Uncaught exception in thread Thread[Executor task launch worker-26,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded

Ilus pilt Krissuga

Harva saab endast nii kena pildi lasta teha. Patt oleks seda siis ainult endale hoida.

Hadoop Object Storage – Ozone

https://wiki.apache.org/hadoop/Ozone

Downloaded last hadoop development source (hadoop-3.0.0-alpha2) switched to HDFS-7240 branch where ozone development is taking place. Build it – success.

 

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]

OPTIONS is none or any of:

–buildpaths attempt to add class files from build tree
–config dir Hadoop config directory
–daemon (start|status|stop) operate on a daemon
–debug turn on shell script debug mode
–help usage information
–hostnames list[,of,host,names] hosts to use in worker mode
–hosts filename list of hosts to use in worker mode
–loglevel level set the log4j level for this command
–workers turn on worker mode

SUBCOMMAND is one of:

balancer run a cluster balancing utility
cacheadmin configure the HDFS cache
classpath prints the class path needed to get the hadoop jar and the required libraries
crypto configure HDFS encryption zones
datanode run a DFS datanode
debug run a Debug Admin to execute HDFS debug commands
dfsadmin run a DFS admin client
dfs run a filesystem command on the file system
diskbalancer Distributes data evenly among disks on a given node
envvars display computed Hadoop environment variables
erasurecode run a HDFS ErasureCoding CLI
fetchdt fetch a delegation token from the NameNode
fsck run a DFS filesystem checking utility
getconf get config values from configuration
groups get the groups which users belong to
haadmin run a DFS HA admin client
jmxget get JMX exported values from NameNode or DataNode.
journalnode run the DFS journalnode
lsSnapshottableDir list all snapshottable dirs owned by the current user
mover run a utility to move block replicas across storage types
namenode run the DFS namenode
nfs3 run an NFS version 3 gateway
oev apply the offline edits viewer to an edits file
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to a legacy fsimage
oz command line interface for ozone
portmap run a portmap service
scm run the Storage Container Manager service
secondarynamenode run the DFS secondary namenode
snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot
storagepolicies list/get/set block storage policies
version print the version
zkfc run the ZK Failover Controller daemon

 

As you can see new fancy attributes like oz and scm are there.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ bin/hdfs oz
ERROR: oz is not COMMAND nor fully qualified CLASSNAME.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ bin/hdfs scm
Error: Could not find or load main class

No luck. I was out of ideas so wrote to hadoop users list. No answers. After it I tried hadoop developers list and got help:

Hi Margus,

It looks like there might have been some error when merging trunk into HDFS-7240, which mistakenly
changed some entries in hdfs script. Thanks for the catch!

We will update the branch to fix it. In the meantime, as a quick fix, you can apply the attached
patch file and re-compile, OR do the following manually:

1. open hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
2. between
oiv_legacy)
       HADOOP_CLASSNAME=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer
     ;;
 and
portmap)
       HADOOP_SUBCMD_SUPPORTDAEMONIZATION="true"
       HADOOP_CLASSNAME=org.apache.hadoop.portmap.Portmap
     ;;
add
oz) 
    HADOOP_CLASSNAME=org.apache.hadoop.ozone.web.ozShell.Shell 
;;
3. change this line
CLASS='org.apache.hadoop.ozone.storage.StorageContainerManager'
to
HADOOP_CLASSNAME='org.apache.hadoop.ozone.storage.StorageContainerManager'
4. re-compile.


rebuild it and it helped.

Lets try to play whit a new toy.

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -createVolume http://127.0.0.1:9864/margusja -user ozone -quota 10GB -root
Volume name : margusja
{
 "owner" : {
 "name" : "ozone"
 },
 "quota" : {
 "unit" : "GB",
 "size" : 10
 },
 "volumeName" : "margusja",
 "createdOn" : "Fri, 03 Feb 2017 10:13:39 GMT",
 "createdBy" : "hdfs"
}

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -createBucket http://127.0.0.1:9864/margusja/demo -user ozone -v
Volume Name : margusja
Bucket Name : demo
{
 "volumeName" : "margusja",
 "bucketName" : "demo",
 "acls" : null,
 "versioning" : "DISABLED",
 "storageType" : "DISK"
}

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -putKey http://127.0.0.1:9864/margusja/demo/key001 -file margusja.txt
Volume Name : margusja
Bucket Name : demo
Key Name : key001
File Hash : 4273b3664fcf8bd89fd2b6d25cdf64ae


[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -putKey http://127.0.0.1:9864/margusja/demo/key002 -file margusja2.txt
Volume Name : margusja
Bucket Name : demo
Key Name : key002

[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$ ./bin/hdfs oz -v -listKey http://127.0.0.1:9864/margusja/demo/
Volume Name : margusja
bucket Name : demo
{
 "version" : 0,
 "md5hash" : "4273b3664fcf8bd89fd2b6d25cdf64ae",
 "createdOn" : "Fri, 03 Feb 2017 12:25:43 +0200",
 "size" : 21,
 "keyName" : "key001"
}
{
 "version" : 0,
 "md5hash" : "4273b3664fcf8bd89fd2b6d25cdf64ae",
 "createdOn" : "Fri, 03 Feb 2017 12:26:14 +0200",
 "size" : 21,
 "keyName" : "key002"
}
[ozone@bigdata24 hadoop-3.0.0-alpha2-SNAPSHOT]$


To compare with filesystem we created directory /margusja after it created subdirectory margusja/demo and finally added two files to margusja/demo/. 
So the picture is smth like

/margusja (volume)
/margusja/demo (bucket)
/margusja/demo/margusja.txt (key001)
/margusja/demo/margusja2.txt (key002)

sonoff pow to Sonoff-MQTT-OTA-Arduino

Hiinlased on tulnud välja päris taskukohase tükiga – https://www.itead.cc/sonoff-pow.html. Tegemis on wifi kaudu lülitatava releega (230v/16A) piisav enamus kodumajapidamises ühefaasiliste jubinate kontrollimiseks.

2016-11-27-10-24-33

Kui nüüd jubin lahti võtta (küsimusele: “Miks peaks?” otsige vastust raamatust “Hackers: Heroes of the Computer Revolution” by S. Levy), siis leiame sealt huvipakkuva pordi:

2016-11-27-11-05-31

GND ja VDD vahele lähevad veel serial RX ja TX.

Loodus tühja kohta ei salli. Github’ist leiab projekti https://github.com/arendst/Sonoff-MQTT-OTA-Arduino. Tänud Ull’le (alias Märt Maiste), kes need kaks asja mul kokku aitas panna.

Edasi on lihtne. Tuleb github projekt alla laadida. Kokku lasta ja jubina sisse lasta. Kuna mul parajasti ühtegi töökorras FTDI plaati ei olnud, siis aitas arduino plaat hädast välja.

2016-11-27-10-34-08

screen-shot-2016-11-27-at-10-38-44

Kui nüüd jubin kenasti vooluvõrku panna ja muud seadistused teha, siis peaks kodusest DHCP serverist saama ta IP ja avades selle IP veebilehitsejas peaks avanema pilt:

screen-shot-2016-11-27-at-11-21-27

Kõnealune jubin toetab MQTT protokolli, mis annab väga vajaliku kihi raud- ja tarkvara vahele.

Mina paigaldasin raspberry pi peale mosquitto MQTT serveri (tnx Ull vihje eest). Nüüd on võimalik MQTT sub käsuga kuulata jubina staatust. Näiteks kas ta on sisse lülitatud, pinget, voolu tarbimist ja palju muud veel. Kõike seda saab ka veebiliidese kaudu

screen-shot-2016-11-27-at-11-26-29

screen-shot-2016-11-27-at-11-29-59

 

Kui nüüd WAN port suunata raspberry 22 porti, saab (juhul kui internet on olemas ja kodus ka LAN peal kõik toimib) kontrollida eemalt oma jubinaid

screen-shot-2016-11-27-at-11-36-25

Lisaks peaks kogu see kompott kokku istuma OpenHub projektiga.

Basys2 with external clock

Recently discovered that internal build in clock is quite unstable. Making a simple stopper was quite unsharp.

Added external clock (25Mz) and the picture is much better.

20161023_115610

(Arduino -> Atmega328P) + RTC + MCP23008 + LCD = kell

Tuli tahtmine kella ehitada.

Vanast ajast oli olemas LCD 16X4, RTC ja arduino R3

Ullult olin soetanud LCD expanderi – https://taaralabs.eu/lcd-plug/

RTC annab kella võimaluse. Kui toide välja võtta, siis kenasti tiksub edasi ja hoiab kellaaega õigena.

MCP23008 – annab vabaks hulga arduinio porte, vastasel juhul poleks ilmselt RTC ka kuhugi panna.

Hetkel selline laiali prototüüp, kes teab kas see kuhugi kaugemale jõuabki.

2016-01-26 22.09.30

Koodiosa tuleb kõvasti tuunida, hetkel suht häbi selle osa üle.

Näiteks on nädalapäev ühe päeva võrra nihkes.

Esimene versioon tõesti halba koodi – https://github.com/margusja/ArduinoRTC

 

Osa2

Kuna kogu arduino plaati ühe kella alla matta on minu jaoks liigne luksus, siis otsustasin selle osa eraldi Atmega328P kätte anda. Alguses oli plaan ilma välise kellata, aga hetkel on siis välise 16MH kellaga.

Ühendasin ISP atmega328P’ga:

icsp_hookup

Allikas: http://upvector.com/atmega/

 

Minul sai selline asi. Programmaator on mul avrispmkII

2016-01-28 21.55.22

Ja oh seda õnne:

margusja@IRack:~/Documents/Arduino/hardware/breadboard/avr$ avrdude -c avrispmkII -v -p ATMEGA328P -P usb

avrdude: Version 5.11.1, compiled on Feb 12 2013 at 01:24:54
Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
Copyright (c) 2007-2009 Joerg Wunsch

System wide configuration file is “/usr/local/CrossPack-AVR-20130212/etc/avrdude.conf”
User configuration file is “/Users/margusja/.avrduderc”
User configuration file does not exist or is not a regular file, skipping

Using Port : usb
Using Programmer : avrispmkII
avrdude: usbdev_open(): Found AVRISP mkII, serno: 000200133546
AVR Part : ATMEGA328P
Chip Erase delay : 9000 us
PAGEL : PD7
BS2 : PC2
RESET disposition : dedicated
RETRY pulse : SCK
serial program mode : yes
parallel program mode : yes
Timeout : 200
StabDelay : 100
CmdexeDelay : 25
SyncLoops : 32
ByteDelay : 0
PollIndex : 3
PollValue : 0x53
Memory Detail :

Block Poll Page Polled
Memory Type Mode Delay Size Indx Paged Size Size #Pages MinW MaxW ReadBack
———– —- —– —– —- —— —— —- —— —– —– ———
eeprom 65 20 4 0 no 1024 4 0 3600 3600 0xff 0xff
flash 65 6 128 0 yes 32768 128 256 4500 4500 0xff 0xff
lfuse 0 0 0 0 no 1 0 0 4500 4500 0x00 0x00
hfuse 0 0 0 0 no 1 0 0 4500 4500 0x00 0x00
efuse 0 0 0 0 no 1 0 0 4500 4500 0x00 0x00
lock 0 0 0 0 no 1 0 0 4500 4500 0x00 0x00
calibration 0 0 0 0 no 1 0 0 0 0 0x00 0x00
signature 0 0 0 0 no 3 0 0 0 0 0x00 0x00

Programmer Type : STK500V2
Description : Atmel AVR ISP mkII
Programmer Model: AVRISP mkII
Hardware Version: 1
Firmware Version Master : 1.23
Vtarget : 5.1 V
SCK period : 8.00 us

avrdude: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude: Device signature = 0x1e950f
avrdude: safemode: lfuse reads as FF
avrdude: safemode: hfuse reads as DE
avrdude: safemode: efuse reads as 5

avrdude: safemode: lfuse reads as FF
avrdude: safemode: hfuse reads as DE
avrdude: safemode: efuse reads as 5
avrdude: safemode: Fuses OK

avrdude done. Thank you.

Part 3

Peale juhtmete ümbertõstmist ja uue kivi programmeerimist on pilt ikka segane, aga arduino sai vahelt minema.

2016-01-29 20.00.32

 

Kuna on plaanis kivil lock bitte ja fuses muuta, siis tundub, et siinkohal on ka õige praegused setingud maha kirjutada

avrdude -c avrispmkII -v -p ATMEGA328P -P usb -U lfuse:r:-:h -U hfuse:r:-:h -U efuse:r:-:h

Reading | ################################################## | 100% 0.00s

avrdude: writing output file “<stdout>”
0xff
avrdude: reading hfuse memory:

Reading | ################################################## | 100% 0.00s

avrdude: writing output file “<stdout>”
0xda
avrdude: reading efuse memory:

Reading | ################################################## | 100% 0.00s

avrdude: writing output file “<stdout>”
0x5

avrdude: safemode: lfuse reads as FF
avrdude: safemode: hfuse reads as DA
avrdude: safemode: efuse reads as 5
avrdude: safemode: Fuses OK

Ja ega ma neid peast ei arvuta – http://www.engbedded.com/fusecalc/

 

Part 4.5 – Asja lihtsustamiseks võtsin kasutusele sisemise kella (avrdude: safemode: lfuse reads as C2).

Nüüd on prototüüp võimalikult lihtne.

bash shortcuts

No ei jää meelde. Seega endale siia väike spikker.

Moving the cursor:

  Ctrl + a   Go to the beginning of the line (Home)
  Ctrl + e   Go to the End of the line (End)
  Ctrl + p   Previous command (Up arrow)
  Ctrl + n   Next command (Down arrow)
   Alt + b   Back (left) one word      or use Option+Right-Arrow
   Alt + f   Forward (right) one word  or use Option+Left-Arrow
  Ctrl + f   Forward one character
  Ctrl + b   Backward one character
  Ctrl + xx  Toggle between the start of line and current cursor position

Editing:

 Ctrl + L   Clear the Screen, similar to the clear command

  Alt + Del Delete the Word before the cursor.
  Alt + d   Delete the Word after the cursor.
 Ctrl + d   Delete character under the cursor
 Ctrl + h   Delete character before the cursor (backspace)

 Ctrl + w   Cut the Word before the cursor to the clipboard.
 Ctrl + k   Cut the Line after the cursor to the clipboard.
 Ctrl + u   Cut/delete the Line before the cursor position.

  Alt + t   Swap current word with previous
 Ctrl + t   Swap the last two characters before the cursor (typo).
 Esc  + t   Swap the last two words before the cursor.

 ctrl + y   Paste the last thing to be cut (yank)
  Alt + u   UPPER capitalize every character from the cursor to the end of the current word.
  Alt + l   Lower the case of every character from the cursor to the end of the current word.
  Alt + c   Capitalize the character under the cursor and move to the end of the word.
  Alt + r   Cancel the changes and put back the line as it was in the history (revert).
 ctrl + _   Undo
 
  TAB       Tab completion for file/directory names

For example, to move to a directory ‘sample1’; Type cd sam ; then press TAB and ENTER.
type just enough characters to uniquely identify the directory you wish to open.

History:

  Ctrl + r   Recall the last command including the specified character(s)
             searches the command history as you type.
             Equivalent to : vim ~/.bash_history. 
  Ctrl + p   Previous command in history (i.e. walk back through the command history)
  Ctrl + n   Next command in history (i.e. walk forward through the command history)
   Alt + .   Use the last word of the previous command
  Ctrl + s   Go back to the next most recent command.
            (beware to not execute it from a terminal because this will also launch its XOFF).
  Ctrl + o   Execute the command found via Ctrl+r or Ctrl+s
  Ctrl + g   Escape from history searching mode

Process control:

 Ctrl + C   Interrupt/Kill whatever you are running (SIGINT)
 Ctrl + l   Clear the screen
 Ctrl + s   Stop output to the screen (for long running verbose commands)
 Ctrl + q   Allow output to the screen (if previously stopped using command above)
 Ctrl + D   Send an EOF marker, unless disabled by an option, this will close the current shell (EXIT)
 Ctrl + Z   Send the signal SIGTSTP to the current task, which suspends it.
            To return to it later enter fg 'process name' (foreground).