Phoenix org.apache.phoenix.mapreduce.index.IndexTool java.lang.ArrayIndexOutOfBoundsException

One good example how community works!

I had a strange error:

Hi

Needed to recreate indexes over main table contains more than 2.3 x 10^10 records.
I used ASYNC and org.apache.phoenix.mapreduce.index.IndexTool
One index succeed but another gives stack:
2018-03-20 13:23:16,723 FATAL [IPC Server handler 0 on 43926] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1521544097253_0004_m_000008_0 – exited : java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) at org.apache.hadoop.hbase.io.ImmutableBytesWritable.write(ImmutableBytesWritable.java:159) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:114) at org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
 
 
Is here any best practice how to deal with situations like this?
 
Br, Margus
And there was answer out there!
Hard to say at a glance, but this issue is happening down in the MapReduce framework, not in Phoenix itself.

It looks similar to problems I’ve seen many years ago around mapreduce.task.io.sort.mb. You can try reducing that value. It also may be related to a bug in your Hadoop version.

And it resolved my problem!
Great hint! Looks like it helped!

What a great power of community!

Br, Margus

Posted in IT

Scala higher-order functions

For me personally, very difficult to thing recursively. So I’ll put some examples for me down here.

First – higher-order function is a function takes function as an argument or returns result as a function.

Lets create a high-order function in scala:

scala> def highF(f: Int => Int, a: Int) = f(a)
highF: (f: Int => Int, a: Int)Int

As we can see it does not do anything usable. Just takes f(Int) and returns Int.

Lets create two first class functions:

scala> def first1(a: Int) : Int = { a.+(a) }
first1: (a: Int)Int

scala> def first2(a: Int) : Int = { a.-(a) }
first2: (a: Int)Int

This functions are already easier to explain. First one takes Int and just does Int + Int. In example a => a + a.

Second one just subtracts a => a – a.

Now comes interesting part – how we use higher-order function.

scala> highF(first1, 2)
res23: Int = 4

scala> highF(first2, 2)
res24: Int = 0

And there is more. We can use anonymous function as an argument:

scala> highF(a => a.+(10), 2)
res25: Int = 12

Posted in IT

FlameGraph

Thanks to Erkki I found project FlameGraph.

Just to remember to myself one session:

su – hdfs

28 git clone https://github.com/jvm-profiling-tools/async-profiler
29 git clone https://github.com/BrendanGregg/FlameGraph
30 cd async-profiler
33 JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/ make
34 cd ../
35 mkdir async-profiler-output
36 cd async-profiler-output
37 jps (get hdfs datanode process)
38 ../async-profiler/profiler.sh -t -d 10 -o collapsed -f /tmp/collapsed.txt 231591
41 ../FlameGraph/flamegraph.pl –colors=java /tmp/collapsed.txt > flamegraph_yarn-bigdata40_hdfs_namenode.svg

 

Got nice flame

Posted in IT

my vim tips

  1. :sp filename for a horizontal split
  2. :vsp filename or :vs filename for a vertical split

move between splits: CNTR+w and arrow

Posted in IT

Enabling LLAP in HDP-2.6

I got following errors when tried to enable Hive LLAP via ambari:

 

2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 616, in <module>
    HiveServerInteractive().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server_interactive.py", line 123, in start
    raise Fail("Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.")
resource_management.core.exceptions.Fail: Skipping START of Hive Server Interactive since LLAP app couldn't be STARTED.

WARN cli.LlapStatusServiceDriver: Watch timeout 200s exhausted before desired state RUNNING is attained.
INFO cli.LlapStatusServiceDriver: LLAP status finished
2017-10-30 14:37:31,924 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' current state is LAUNCHING.
2017-10-30 14:37:31,925 - LLAP app 'llap0' deployment unsuccessful.

Command failed after 1 tries

In my case the solution was:
num_retries_for_checking_llap_status from 10 to 30
Posted in IT

Spark to HBase via HBase REST

import scalaj.http._

val hkey = Base64.getEncoder.encodeToString(key.getBytes(StandardCharsets.UTF_8))
val value = Base64.getEncoder.encodeToString(rawValue.getBytes(StandardCharsets.UTF_8))
val data = "{\"Row\":[{\"key\":\" " + hkey + " \", \"Cell\":[{\"column\":\"Y2Y6Y29sMw==\", \"$\":\" " + value + "  \"}]}]}"
Http("http://bigdata33.webmedia.int:8080/deepscan_data_1_1/" + key + "/cf:content").postData(data).header("content-type", "application/json").asString

In case you need PUT instead of POST:
Http("http://bigdata41.webmedia.int:9090/nifi-api/processors/10001184-103f-112d-799b-662b43e70ced").postData(data).header("content-type", "application/json").method("put").asString
Posted in IT

Change GIT password from command line

$ git config credential.helper store
$ git push https://github.com/repo.git

Username for 'https://github.com': <USERNAME>
Password for 'https://USERNAME@github.com': <PASSWORD>

Ambari 2.5.0.3 install HDF Nifi

Tried to install Nifi using Ambari.

Got error message:

File “/var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py”, line 47, in <module> stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”) NameError: name ‘get_component_version_with_stack_selector’ is not defined

 

All hosts in Ambari Hosts menu showed Stack: HDP, Name: HDP-2.6.1.0, Status: Current.

 

Could not resolve it via Ambari. Even hdf-select in node where to I tried to install Nifi I cot executing:

hdf-select

nifi – 3.0.0.0-453

Solution for me was to change some lines in /var/lib/ambari-agent/cache/common-services/NIFI/1.0.0/package/scripts/params.py. Commented out red line and added green:

 

if stack_name == “HDP”:

  # Override HDP stack root

  stack_root = “/usr/hdf”

  # Override HDP stack version

  #stack_version_buildnum = get_component_version_with_stack_selector(“/usr/bin/hdf-select”, “nifi”)

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

elif not stack_version_buildnum and stack_name:

  stack_version_buildnum = get_component_version(stack_name, “nifi”)

Posted in IT

R csv to libcsv

library(“e1071″, lib.loc=”~/Library/R/3.3/library”)
library(“SparseM”, lib.loc=”~/Library/R/3.3/library”)
data <- read.csv(‘/Users/margusja/Downloads/mnist_test.csv’)

dim(data)

x <- as.matrix(data[,2:785])
y <- data[,1]

xs <- as.matrix.csr(x)
write.matrix.csr(xs, y =y, file=”test.txt”)

Posted in IT