FlameGraph

Thanks to Erkki I found project FlameGraph.

Just to remember to myself one session:

su – hdfs

28 git clone https://github.com/jvm-profiling-tools/async-profiler
29 git clone https://github.com/BrendanGregg/FlameGraph
30 cd async-profiler
33 JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/ make
34 cd ../
35 mkdir async-profiler-output
36 cd async-profiler-output
37 jps (get hdfs datanode process)
38 ../async-profiler/profiler.sh -t -d 10 -o collapsed -f /tmp/collapsed.txt 231591
41 ../FlameGraph/flamegraph.pl –colors=java /tmp/collapsed.txt > flamegraph_yarn-bigdata40_hdfs_namenode.svg

 

Got nice flame

Posted in IT

OpenCP cv:Mat and cv:transpose

Matrix transpose does some cool stuff with matrix elements. In example using OpenCV:

    Mat A = (Mat_<float>(2, 5) << 1, 2, 3, 4, 5, 7, 8, 9, 10, 11);

    cout << “A = “ << endl << ” “ << A << endl << endl;

    Mat A1;

    transpose(A, A1);

    cout << “A1 = “ << endl << ” “ << A1 << endl << endl;

A =

[1, 2, 3, 4, 5;

7, 8, 9, 10, 11]

 

A1 =

[1, 7;

2, 8;

3, 9;

4, 10;

5, 11]

 

Nice yeah.

As OpenCV holds pictures as matrixes then lets find out what is turns out to transpose image.

 

 

Picture has a little more matrix elements than in a previous example –

Image heigth: 720 cols: 1080 – It means 720 x 1080 elements.

    // Load an color image in grayscale

    Mat img = imread(“/Users/margusja/Pictures/faces/margusja2.jpg”,0);

    cv::Size s = img.size();

    int rows = s.height;

    int cols = s.width;

    Mat fimage;

    transpose(img, fimage);

    cout << “Image heigth: “ << rows << ” cols: “ << cols << endl;

    namedWindow( “Display window”, WINDOW_AUTOSIZE );

    imshow(“Margusja 1”,img);

    namedWindow( “Display window”, WINDOW_AUTOSIZE );

    imshow(“Margusja transposed”,fimage);

And result:

Versus

 

Now we have a clue how they rotate our pictures 🙂

Simple Tensorflow arithmetic

Lets imagine we have to do a simple arithmetic: (10+20) * (30-40)

In some unclear reason we decidet to use Tensorflow.

Default language is there python.

(tensorflow) margusja@IRack:~/tensorflow/tensorflow_scripts$ python
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 13:19:00)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import tensorflow as tf
/Users/margusja/tensorflow/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module ‘tensorflow.python.framework.fast_tensor_util’ does not match runtime version 3.6
return f(*args, **kwds)
>>>
>>> a = tf.constant(10)
>>> b = tf.constant(20)
>>> c = tf.constant(30)
>>> d = tf.constant(40)
>>> e = tf.add(a,b)
>>> f = tf.subtract(c,d)
>>> h = tf.multiply(e,f)
>>> sess = tf.Session()
2017-11-10 19:02:43.777454: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
>>> print(sess.run(h))
-300

We can explore graph in tensorbar:

 

my vim tips

  1. :sp filename for a horizontal split
  2. :vsp filename or :vs filename for a vertical split

move between splits: CNTR+w and arrow

Posted in IT

Enabling LLAP in HDP-2.6

I got following errors when tried to enable Hive LLAP via ambari:

 

2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 616, in <module>
    HiveServerInteractive().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 123, in start
    raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.

WARN cli.LlapStatusServiceDriver: Watch timeout 200s exhausted before desired state RUNNING is attained.
INFO cli.LlapStatusServiceDriver: LLAP status finished
2017-10-30 14:37:31,924 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.

Command failed after 1 tries

In my case the solution was:
num_retries_for_checking_llap_status from 10 to 30
Posted in IT

Spark to HBase via HBase REST

import scalaj.http._

val hkey = Base64.getEncoder.encodeToString(key.getBytes(StandardCharsets.UTF_8))
val value = Base64.getEncoder.encodeToString(rawValue.getBytes(StandardCharsets.UTF_8))
val data = "{\"Row\":[{\"key\":\" " + hkey + " \", \"Cell\":[{\"column\":\"Y2Y6Y29sMw==\", \"$\":\" " + value + "  \"}]}]}"
Http("http://bigdata33.webmedia.int:8080/deepscan_data_1_1/" + key + "/cf:content").postData(data).header("content-type", "application/json").asString

In case you need PUT instead of POST:
Http("http://bigdata41.webmedia.int:9090/nifi-api/processors/10001184-103f-112d-799b-662b43e70ced").postData(data).header("content-type", "application/json").method("put").asString
Posted in IT

Change GIT password from command line

$ git config credential.helper store
$ git push https://github.com/repo.git

Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>

Statistika ja katseseeria pikkus

Kasutades R keelt ja RStudio rakendust näitan, miks on katseseeria pikkus oluline.

Võtame näidisvektori:
> myFamilyAges
[1] 43 42 12 8 5

Antud vektori elementide keskmine (mean):
> mean(myFamilyAges)
[1] 22

Võtame nüüd sample() käsuga viis korda kõnealusest vektorist viis elemente ja arvutame nende keskmise:
> mean(sample(myFamilyAges, 5, replace = TRUE))
[1] 22.6
> mean(sample(myFamilyAges, 5, replace = TRUE))
[1] 35.8
> mean(sample(myFamilyAges, 5, replace = TRUE))
[1] 21.4
> mean(sample(myFamilyAges, 5, replace = TRUE))
[1] 13.2
> mean(sample(myFamilyAges, 5, replace = TRUE))
[1] 35.4

nagu näeme varieerub tulemus tugevalt:

> sd(c(22.6,35.8,21.4,13.2,35.4))
[1] 9.752538

Võtame nüüd ühe korra 4000 korda samast vektorist elemente ja arvutame nende keskmise:
> mean(sample(myFamilyAges, 4000, replace = TRUE))
[1] 21.8995

 

Kuvame ka tihedusgraafikud

 

Nagu näha – kohe väga lähedale originaalvektori keskmisele (22).

Seega – suurus on oluline 🙂

Ambari 2.5.0.3 install HDF Nifi

Tried to install Nifi using Ambari.

Got error message:

File “/var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py”, line 47, in <module> stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”) NameError: name ‘get_component_version_with_stack_selector’ is not defined

 

All hosts in Ambari Hosts menu showed Stack: HDP, Name: HDP-2.6.1.0, Status: Current.

 

Could not resolve it via Ambari. Even hdf-select in node where to I tried to install Nifi I cot executing:

hdf-select

nifi – 3.0.0.0-453

Solution for me was to change some lines in /var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py. Commented out red line and added green:

 

if stack_name == “HDP”:

  # Override HDP stack root

  stack_root = “/usr/hdf”

  # Override HDP stack version

  #stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”)

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

elif not stack_version_buildnum and stack_name:

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

Posted in IT