Thanks to Erkki I found project FlameGraph.

Just to remember to myself one session:

su – hdfs

28 git clone https://github.com/jvm-profiling-tools/async-profiler
29 git clone https://github.com/BrendanGregg/FlameGraph
30 cd async-profiler
33 JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/ make
34 cd ../
35 mkdir async-profiler-output
36 cd async-profiler-output
37 jps (get hdfs datanode process)
38 ../async-profiler/profiler.sh -t -d 10 -o collapsed -f /tmp/collapsed.txt 231591
41 ../FlameGraph/flamegraph.pl –colors=java /tmp/collapsed.txt > flamegraph_yarn-bigdata40_hdfs_namenode.svg


Got nice flame

Posted in IT

my vim tips

  1. :sp filename for a horizontal split
  2. :vsp filename or :vs filename for a vertical split

move between splits: CNTR+w and arrow

Posted in IT

Enabling LLAP in HDP-2.6

I got following errors when tried to enable Hive LLAP via ambari:


2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/", line 616, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
  File "/var/lib/ambari-agent/cache/common-services/HIVE/", line 123, in start
    raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.

WARN cli.LlapStatusServiceDriver: Watch timeout 200s exhausted before desired state RUNNING is attained.
INFO cli.LlapStatusServiceDriver: LLAP status finished
2017-10-30 14:37:31,924 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.

Command failed after 1 tries

In my case the solution was:
num_retries_for_checking_llap_status from 10 to 30
Posted in IT

Spark to HBase via HBase REST

import scalaj.http._

val hkey = Base64.getEncoder.encodeToString(key.getBytes(StandardCharsets.UTF_8))
val value = Base64.getEncoder.encodeToString(rawValue.getBytes(StandardCharsets.UTF_8))
val data = "{\"Row\":[{\"key\":\" " + hkey + " \", \"Cell\":[{\"column\":\"Y2Y6Y29sMw==\", \"$\":\" " + value + "  \"}]}]}"
Http("http://bigdata33.webmedia.int:8080/deepscan_data_1_1/" + key + "/cf:content").postData(data).header("content-type", "application/json").asString

In case you need PUT instead of POST:
Http("http://bigdata41.webmedia.int:9090/nifi-api/processors/10001184-103f-112d-799b-662b43e70ced").postData(data).header("content-type", "application/json").method("put").asString
Posted in IT

Change GIT password from command line

$ git config credential.helper store
$ git push https://github.com/repo.git

Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>

Ambari install HDF Nifi

Tried to install Nifi using Ambari.

Got error message:

File “/var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py”, line 47, in <module> stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”) NameError: name ‘get_component_version_with_stack_selector’ is not defined


All hosts in Ambari Hosts menu showed Stack: HDP, Name: HDP-, Status: Current.


Could not resolve it via Ambari. Even hdf-select in node where to I tried to install Nifi I cot executing:


nifi –

Solution for me was to change some lines in /var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py. Commented out red line and added green:


if stack_name == “HDP”:

  # Override HDP stack root

  stack_root = “/usr/hdf”

  # Override HDP stack version

  #stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”)

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

elif not stack_version_buildnum and stack_name:

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

Posted in IT

R csv to libcsv

library(“e1071″, lib.loc=”~/Library/R/3.3/library”)
library(“SparseM”, lib.loc=”~/Library/R/3.3/library”)
data <- read.csv(‘/Users/margusja/Downloads/mnist_test.csv’)


x <- as.matrix(data[,2:785])
y <- data[,1]

xs <- as.matrix.csr(x)
write.matrix.csr(xs, y =y, file=”test.txt”)

Posted in IT

Apache Spark hints

scala> val data = sc.textFile(“hdfs://path/to/file”)

scala> data.foreach(println) // print all lines from file

scala> def myPrint (a: String) : Unit = {println(a))

scala> data.foreach(a => myPrint(a)) // prints all lines from file using myPrint function


scala> case class EmailRow(row:String) // create class for row

scala> val df=data.map(x => EmailRow(x) ).toDF() // Create dataframe

// show dataframe

scala> df.show()

scala> df.select(“row”).show()



// Create unique id column for dataset

scala> import org.apache.spark.sql.functions.monotonicallyIncreasingId

scala> val newDf = df.withColumn(“id”, monotonicallyIncreasingId) // adds a new columnt at the end of the current dataset

scala> val ds2 = newDf.select(“id”,”row”) // now id is the first columnt

scala> ds2.select(“id”, “row”).where(df(“row”).contains(“X-“)).show() //filter out smth and show it

scala> ds2.count() // how many lines do I have in my dataset


val text_file = sc.textFile(“hdfs://bigdata21.webmedia.int:8020/user/margusja/titanic_test.csv”)
// word (as a key), 1

case class Person(name: String, age: String)
val people = text_file.map(_.split(“,”)).map(p => Person(p(2), p(5))).toDS().toDF()

// Age is String and contains empty fields. Lets filter out numerical values
people.filter($”age” > 0).select(people(“age”).cast(“int”)).show()

// lets take avarage of people age
people.filter($”age” > 0).select(avg((people(“age”).cast(“int”)))).show()

Posted in IT

Create function in Apache Spark

scala> def myPrint (a: String) : Unit = {println(a)}
myPrint: (a: String)Unit

scala> myPrint(“Tere maailm”)
Tere maailm


Posted in IT