Apache Spark hints

scala> val data = sc.textFile(“hdfs://path/to/file”)

scala> data.foreach(println) // print all lines from file

scala> def myPrint (a: String) : Unit = {println(a))

scala> data.foreach(a => myPrint(a)) // prints all lines from file using myPrint function

 

scala> case class EmailRow(row:String) // create class for row

scala> val df=data.map(x => EmailRow(x) ).toDF() // Create dataframe

// show dataframe

scala> df.show()

scala> df.select(“row”).show()

df.foreach(println)

 

// Create unique id column for dataset

scala> import org.apache.spark.sql.functions.monotonicallyIncreasingId

scala> val newDf = df.withColumn(“id”, monotonicallyIncreasingId) // adds a new columnt at the end of the current dataset

scala> val ds2 = newDf.select(“id”,”row”) // now id is the first columnt

scala> ds2.select(“id”, “row”).where(df(“row”).contains(“X-“)).show() //filter out smth and show it

scala> ds2.count() // how many lines do I have in my dataset

 

val text_file = sc.textFile(“hdfs://bigdata21.webmedia.int:8020/user/margusja/titanic_test.csv”)
//text_file.map(_.length).collect
//text_file.flatMap(_.split(“,”)).collect
// word (as a key), 1
text_file.flatMap(_.split(“,”)).map((_,1)).reduceByKey(_+_)

case class Person(name: String, age: String)
val people = text_file.map(_.split(“,”)).map(p => Person(p(2), p(5))).toDS().toDF()

// Age is String and contains empty fields. Lets filter out numerical values
people.filter($”age” > 0).select(people(“age”).cast(“int”)).show()

// lets take avarage of people age
people.filter($”age” > 0).select(avg((people(“age”).cast(“int”)))).show()

Posted in IT

Create function in Apache Spark

scala> def myPrint (a: String) : Unit = {println(a)}
myPrint: (a: String)Unit

scala> myPrint(“Tere maailm”)
Tere maailm

 

Posted in IT

HDP-2.5 Smartsense 1.3 installation issue via Ambari

After automatic installation only one hst agent started in the same server where hst server was installed.

In others servers I saw following error:

[root@bigdata19 conf]# hst list-agents
Traceback (most recent call last):
File "/usr/sbin/hst-agent.py", line 420, in <module>
main(sys.argv)
File "/usr/sbin/hst-agent.py", line 403, in main
list_agents()
File "/usr/sbin/hst-agent.py", line 285, in list_agents
agents = server_api.list_agents()
File "/usr/hdp/share/hst/hst-agent/lib/hst_agent/ServerAPI.py", line 72, in list_agents
content = self.call(request)
File "/usr/hdp/share/hst/hst-agent/lib/hst_agent/ServerAPI.py", line 52, in call
self.cachedconnect = security.CachedHTTPSConnection(self.config)
File "/usr/hdp/share/hst/hst-agent/lib/hst_agent/security.py", line 111, in __init__
self.connect()
File "/usr/hdp/share/hst/hst-agent/lib/hst_agent/security.py", line 116, in connect
self.httpsconn.connect()
File "/usr/hdp/share/hst/hst-agent/lib/hst_agent/security.py", line 87, in connect
raise err
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:765)

The solution for me was just coping files from server where is working hts agent from directory /var/lib/smartsense/hst-agent/keys/* to non forking agent’s server to directory /var/lib/smartsense/hst-agent/keys/. Before it delete files from non working hts agent’s server /var/lib/smartsense/hst-agent/keys.

Posted in IT

ERROR: Bad Request;default/org.apache.falcon.FalconWebException::org.apache.falcon.FalconException: java.lang.RuntimeException: java.lang.IllegalStateException: Cluster entity vertex must exist

Somehow I started to get: ERROR: Bad Request;default/org.apache.falcon.FalconWebException::org.apache.falcon.FalconException: java.lang.RuntimeException: java.lang.IllegalStateException: Cluster entity vertex must exist

I did not find any solution from internet.

The solution helped me was to delete directory /hadoop/falcon/embeddedmq from falcon server and restart falcon server.

Posted in IT

Basys2 and four bit binary to decimal number into seven segment led display

2016-03-21 20.22.56At first the good idea is to draw down signals from the  input to the output. Basically it is the truth table:

In the header we can see seven segment display led signals (ca…cg)

Out – decimal number I hope to display and sw3…sw0 input in binary.

-------------------------------------------------
--ca |cb |cc |cd |ce |cf |cg |out|sw3|sw2|sw1|sw0
-------------------------------------------------
-- 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1
-- 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 1 | 0
-- 0 | 0 | 0 | 0 | 1 | 1 | 0 | 3 | 0 | 0 | 1 | 1
-- 1 | 0 | 0 | 1 | 1 | 0 | 0 | 4 | 0 | 1 | 0 | 0
-- 0 | 1 | 0 | 0 | 1 | 0 | 0 | 5 | 0 | 1 | 0 | 1
-- 1 | 1 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 1 | 1 | 0
-- 0 | 0 | 0 | 1 | 1 | 1 | 1 | 7 | 0 | 1 | 1 | 1
-- 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 | 0 | 0 | 0
-- 0 | 0 | 0 | 1 | 1 | 0 | 0 | 9 | 1 | 0 | 0 | 1
-- 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0

Now we have functional relation between the input and the output so lets implement it in VHDL:

process (sw)
BEGIN
case sw is
	when "0001" => segment7 <= "1001111"; -- 1
	when "0010" => segment7 <= "0010010"; -- 2
	when "0011" => segment7 <= "0000110"; -- 3
	when "0100" => segment7 <= "1001100"; -- 4
	when "0101" => segment7 <= "0100100"; -- 5
	when "0110" => segment7 <= "1100000"; -- 6
	when "0111" => segment7 <= "0001111"; -- 7
	when "1000" => segment7 <= "0000000"; -- 8
	when "1001" => segment7 <= "0001100"; -- 9
	when "0000" => segment7 <= "0000001"; -- 0
	when others => segment7 <= "1111111"; -- blank
end case;
END process;

Quite easy :) - full code is locating https://github.com/margusja/binary2decimalLed/blob/master/one.vhd

But this is not I want. In hardware we can not do thinks like case. So lets move closer to the hardware.

Screen Shot 2016-04-06 at 22.36.30
This is not optimized solution but much closer to hardware than previous one. 

Update
Got time and optimized logic.
13087789_1608990496089069_6114430185435126336_n 

And sentences are much better compared previous ones:
--ca <= (not sw3 AND not sw2 AND not sw1 AND sw0) OR (not sw3 AND sw2 AND not sw1 AND not sw0);
ca <= (not sw0 AND sw2 AND not sw3) OR (sw0 AND not sw1 AND  not sw2 AND not sw3);
cb <= (sw0 AND not sw1 AND sw2 AND not sw3) OR (not sw0 AND sw1 AND sw2 AND not sw3);
cc <= (not sw3 AND not sw2 AND sw1 AND not sw0);
--cd <= (not sw3 AND not sw2 AND not sw1 AND sw0) OR (not sw3 AND sw2 AND not sw1 AND not sw0) 
--		OR (not sw3 AND sw2 AND sw1 AND sw0) OR (sw3 AND not sw2 AND not sw1 AND sw0);
cd <= (sw0 AND not sw1 AND not sw2) OR (not sw0 AND not sw1 AND sw2 AND not sw3) OR 
		(sw0 AND sw1 AND sw2 AND not sw3);
--ce <= (not sw3 AND not sw2 AND not sw1 AND sw0) OR (not sw3 AND not sw2 AND sw1 AND sw0) 
--			OR (not sw3 AND sw2 AND not sw1 AND not sw0) OR (not sw3 AND sw2 AND not sw1 AND sw0)
--			OR (not sw3 AND sw2 AND sw1 AND sw0) OR (sw3 AND not sw2 AND not sw1 AND sw0);
ce <= (sw0 AND not sw3) OR (not sw1 AND sw2 AND not sw3) OR (sw0 AND not sw1 AND not sw2);
--cf <= (not sw3 AND not sw2 AND not sw1 AND sw0) OR (not sw3 AND not sw2 AND sw1 AND not sw0)
--		OR (not sw3 AND not sw2 AND sw1 AND sw0) OR (not sw3 AND sw2 AND sw1 AND sw0);
cf <= (sw1 AND not sw2 AND not sw3) OR (sw0 AND not sw2 AND not sw3) OR (sw0 AND sw1 AND not sw3);
--cg <= (not sw3 AND not sw2 AND not sw1 AND sw0) OR (not sw3 AND sw2 AND sw1 AND sw0)
--		OR (not sw3 AND not sw2 AND not sw1 AND not sw0);		
cg <= (not sw1 AND not sw2 AND not sw3) OR (sw0 AND sw1 AND sw2 AND not sw3);

First steps in Hortonworks HDF (NiFi)

Just downloaded package. Unpacked and run.

Opened GUI and defined simple workflow:

  1. Listen one local directory and if there will be file transport it into HDFS
  2. Listen webpage changes and if there is change then transport it into HDFS

So to create workflow I needed to drag neccessary  component into nifi canvas. After it I configured it and run. And still it looks awesome.

Screen Shot 2016-04-01 at 17.53.13

Posted in IT

Apache-Flink on my machine

It is easy to set up and run local flink.

Download it, unpack and run

Screen Shot 2016-03-31 at 22.15.45





Start netcat and insert some content

Screen Shot 2016-03-31 at 22.15.57

 

GUI will give information about the process

Screen Shot 2016-03-31 at 22.09.40 Screen Shot 2016-03-31 at 22.09.20

And result

Screen Shot 2016-03-31 at 22.15.37

 

Posted in IT