Hive UDF – Margus Roo

Sometimes (often) we need some custom functions to work with records. Hive has most necessary functions but still if you find yourself in situation where you need do some hack in your programming language after you got records there is place to consider to use Hive UDF.

In example in case we need add string “Hello Margusja” before field. Yes there is concat in Hive string functions but this is an example how to build and deploy UDF’s. So in case there is no any alternative to put two string together we are coing to build own UDF.

Java code is very simple – you just have to extend org.apache.hadoop.hive.ql.exec.UDF:

package com.margusja.example;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class DemoUDF extends UDF {

String hello = “Hello Margusja”;

public Text evaluate(final Text s) {
if (s == null) { return null; }
return new Text(hello + ” ” + s );
}
}

build and package it in example HiveDemoUDF.jar

Now in hive command line add it to classpath:

hive> add jar /tmp/HiveDemoUDF.jar;

Added /tmp/HiveDemoUDF.jar to class path
Added resource: /tmp/HiveDemoUDF.jar

Now you can use your brand new UDF:

hive> select my_lower(“input”);
Query ID = margusja_20141106153636_564cd6c4-01f1-4daa-841c-4388255135a8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_1414681778119_0094, Tracking URL = http://nn1.server.int:8088/proxy/application_1414681778119_0094/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414681778119_0094
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-11-06 15:36:21,935 Stage-1 map = 0%, reduce = 0%
2014-11-06 15:36:31,206 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
MapReduce Total cumulative CPU time: 1 seconds 180 msec
Ended Job = job_1414681778119_0094
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.18 sec HDFS Read: 281 HDFS Write: 21 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 180 msec
OK
Hello Margusja input
Time taken: 21.417 seconds, Fetched: 1 row(s)