Margus Roo – – Page 19 – If you're inventing and pioneering, you have to be willing to be misunderstood for long periods of time

Spark

Posted on May 21, 2014 - May 21, 2014 by margusja

Spark on hadoop-mapreduc’e kõrval väga võimekas alternatiiv paralleelarvutuste teostamiseks.

Alljärgnevalt mõned sammud, kuidas seadistada spark standalone klasterit.

Mina kasutan hetkel kõige uuemat binary pakki, kus on ka hadoop2 mapreduce tugi. Nimelt on spark’l MapReduce2 tugi olemas, aga hetkel jääme sparki enda standalone lahenduse juurde.

Mul on kasutada kolm füüsilist serverit – vm37, vm38, vm24. vm37 valin ma nn master serveriks, mida kutsutakse spark kontekstis ka driver’ks.

Laen alla hetkel viimase versiooni – http://d3kbcqa49mib13.cloudfront.net/spark-0.9.1-bin-hadoop2.tgz vm37 /opt/ kataloogi ja pakin laht.

Sama kordan ka kõigis slave serverites – laen sama paketi ja pakin lahti samasse kohta – /opt

Master (vm37) peab omama ilma paroolita ssh ligipääsu slave serveritesse. Siinkohal on abiks ssh võtmetega ligipääsud.

cd /opt/spark-0.9.1-bin-hadoop2

Seadistan nn slaved: vim conf/slaves – lisan iga slave eraldi reale.

Käivitan klastri: ./sbin/start-all.sh

Kui nüüd kõik kenasti õnnestus, siis peaks tekkima master serverisse veebiliides vm37:8081

Spark Master GUI

Kasutades spark-shell käsurida, teeme lihtsa arvutussessiooni:

SPARK CLI

GUI kaudu peaks ilmuma samuti sessiooni informatsioon:

Detailsem vaade:

Laadime ühe faili ja loeme kui palju on sõnu selles failis:

CLI count words

On näha, et tööks kasutati kahte serverit vm24 ja vm38. Antud töö kohta on ka GUI kaudu informatsioon olemas:

Result 1

Result2

Antud juhul oli tegu väga triviaalse näitega. Spark omab matemaatiliste ja masin-õppivate arvutuste tuge MLib

Andmete reaalajas arvutamiseks on võimalik kasutada Spark Streaming tuge. Näiteks lugeda mõnest järjekorrasüsteemis nagu Apache-Kafka või Apache-Flume väljund voogusid, neid analüüsida ja tulemused salvestada HDFS andmebaasi HBase.

Apache-kafka and Atmel 328p + enc28j60

Posted on May 14, 2014 by margusja

I put together a simple hardware

atmega328p – executes programm

enc28j60 – ethernet

programmed it using C code:

…

// Demo using DHCP and DNS to perform a web client request.
// 2011-06-08 <jc@wippler.nl> http://opensource.org/licenses/mit-license.php

#include <EtherCard.h>

// ethernet interface mac address, must be unique on the LAN
static byte mymac[] = { 0x74,0x69,0x69,0x2D,0x30,0x31 };

byte Ethernet::buffer[700];
static uint32_t timer;
Stash stash;

char website[] PROGMEM = “vm37.dbweb.ee”;
#define PATH “”
#define VARIABLE “test”

// called when the client request is complete
static void my_callback (byte status, word off, word len) {
Serial.println(“>>>”);
Ethernet::buffer[off+300] = 0;
Serial.print((const char*) Ethernet::buffer + off);
Serial.println(“…”);
}

void setup () {
Serial.begin(57600);
Serial.println(“\n[webClient]”);

if (ether.begin(sizeof Ethernet::buffer, mymac) == 0)
Serial.println( “Failed to access Ethernet controller”);
if (!ether.dhcpSetup())
Serial.println(“DHCP failed”);
ether.hisport = 8080;
ether.printIp(“IP: “, ether.myip);
ether.printIp(“GW: “, ether.gwip);
ether.printIp(“DNS: “, ether.dnsip);

if (!ether.dnsLookup(website))
Serial.println(“DNS failed”);

ether.printIp(“SRV: “, ether.hisip);
}

void loop () {
ether.packetLoop(ether.packetReceive());

if (millis() > timer) {
timer = millis() + 10000;

byte sd = stash.create();
stash.print(“{\”messages\”: [{\”value\”:{\”key\”:\”Margusja\”}}]}”);
stash.print(VARIABLE);
stash.print(“&action=Submit”);
stash.save();

// generate the header with payload – note that the stash size is used,
// and that a “stash descriptor” is passed in as argument using “$H”
Stash::prepare(PSTR(“POST /topics/kafkademo1 HTTP/1.1” “\r\n”
“Host: vm37.dbweb.ee:8080” “\r\n”
//”User-Agent: margusja” “\r\n”
“Accept: */*” “\r\n”
“Content-Type: application/json” “\r\n”
“Content-Length: $D” “\r\n”
“\r\n”
“$H”),
website, PSTR(PATH), website, stash.size(), sd);

// send the packet – this also releases all stash buffers once done
ether.tcpSend();
}
}

…

Uploaded code to MC and powered it up.

And after it got data from DHCP I saw nice picture – it sends data into my kafka queue

The new toy to play with K8200

Posted on May 13, 2014 by margusja

wordclaud about my webpage

Posted on May 12, 2014 by margusja

Apache-Storm cluster demo

Posted on May 12, 2014 by margusja

This is simple demo about apache-storm in cluster environment. Watch it in full screen mode

A very simple apache-kafka cluster demo

Posted on May 8, 2014 by margusja

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Try this video in full screen.

Kafka-Storm

Posted on April 29, 2014 - April 29, 2014 by margusja

Apache-Kafka

Posted on April 29, 2014 - April 29, 2014 by margusja

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

[root@sandbox kafka_2.8.0-0.8.1]# bin/kafka-server-start.sh config/server.properties

[root@sandbox kafka_2.8.0-0.8.1]# bin/kafka-topics.sh –create –replication-factor 1 –zookeeper localhost:2181 –partition 1 –topic demoTopic
Created topic “demoTopic”.
[root@sandbox kafka_2.8.0-0.8.1]#

[root@sandbox kafka_2.8.0-0.8.1]# bin/kafka-topics.sh –list –zookeeper localhost:2181
demoTopic
[root@sandbox kafka_2.8.0-0.8.1]#

[root@sandbox kafka_2.8.0-0.8.1]# bin/kafka-console-producer.sh –broker-list localhost:9092 –topic demoTopic
SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Kirjutame siia midagi…

Kuulame:

[root@sandbox kafka_2.8.0-0.8.1]# ./bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic demoTopic –from-beginning