Apache Hive-0.12 and Hadoop-2.2.0

http://hive.apache.org/

The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

[root@vm24 ~]# yum install hive
Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: ftp.hosteurope.de * epel: ftp.lysator.liu.se * extras: ftp.hosteurope.de * rpmforge: mirror.bacloud.com * updates: ftp.hosteurope.de Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package hive.noarch 0:0.12.0.2.0.6.1-101.el6 will be installed --> Finished Dependency Resolution


Dependencies Resolved
================================================================================================================================================================================================================================================================================

 Package                                                     Arch                                                          Version                                                                       Repository                                                        Size

================================================================================================================================================================================================================================================================================

Installing:

 hive                                                        noarch                                                        0.12.0.2.0.6.1-101.el6                                                        HDP-2.0.6                                                         44 M
Transaction Summary

================================================================================================================================================================================================================================================================================

Install       1 Package(s)
Total download size: 44 M

Installed size: 207 M

Is this ok [y/N]: y

Downloading Packages:

hive-0.12.0.2.0.6.1-101.el6.noarch.rpm                                                                                                                                                                                                                   |  44 MB     00:19

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing : hive-0.12.0.2.0.6.1-101.el6.noarch                                                                                                                                                                                                                         1/1

  Verifying  : hive-0.12.0.2.0.6.1-101.el6.noarch                                                                                                                                                                                                                           1/1 
Installed:

  hive.noarch 0:0.12.0.2.0.6.1-101.el6

Complete!
[root@vm24 ~]#

Olulisemad kataloogid, mis tekkisid (rpm -ql hive)
/usr/lib/hive/ – see peaks olema hive home
/var/lib/hive
/var/lib/hive/metastore
/var/log/hive
/var/run/hive

[root@vm24 ~]# su – hive
[hive@vm24 ~]$ export HIVE_HOME=/usr/lib/hive
[hive@vm24 ~]$ export HADOOP_HOME=/usr/lib/hadoop

[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hive
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hive/warehouse
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chmod g+w /tmp
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chmod g+w /user/hive/warehouse
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chown -R hive /user/hive/
[hdfs@vm24 ~]$
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
[hive@vm24 ~]$

Ilmselt olen segamine ajanud hadoop ja hadoop-hdfs
[hive@vm24 ~]$ export HADOOP_HOME=/usr/lib/hadoop
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
Error: JAVA_HOME is not set and could not be found. Unable to determine Hadoop version information. 'hadoop version' returned: Error: JAVA_HOME is not set and could not be found.

[hive@vm24 ~]$
[hive@vm24 ~]$ export JAVA_HOME=/usr
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
14/03/07 11:49:15 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.12.0.2.0.6.1-101.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive>

Session on hive:
[hive@vm24 ~]$ wget https://hadoop-clusternet.googlecode.com/svn-history/r20/trunk/clusternet/thirdparty/data/ml-data.tar__0.gz
–2014-03-07 11:53:56– https://hadoop-clusternet.googlecode.com/svn-history/r20/trunk/clusternet/thirdparty/data/ml-data.tar__0.gz
Resolving hadoop-clusternet.googlecode.com… 2a00:1450:4001:c02::52, 173.194.70.82
Connecting to hadoop-clusternet.googlecode.com|2a00:1450:4001:c02::52|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 4948405 (4.7M) [application/octet-stream]
Saving to: “ml-data.tar__0.gz”

100%[======================================================================================================================================================================================================================================>] 4,948,405 609K/s in 7.1s

2014-03-07 11:54:03 (681 KB/s) – “ml-data.tar__0.gz” saved [4948405/4948405]

[hive@vm24 ~]$
[hive@vm24 ~]$ tar zxvf ml-data.tar__0.gz
ml-data/
ml-data/README
ml-data/allbut.pl
ml-data/mku.sh
ml-data/u.data
ml-data/u.genre
ml-data/u.info
ml-data/u.item
ml-data/u.occupation
ml-data/u.user
ml-data/ub.test
ml-data/u1.test
ml-data/u1.base
ml-data/u2.test
ml-data/u2.base
ml-data/u3.test
ml-data/u3.base
ml-data/u4.test
ml-data/u4.base
ml-data/u5.test
ml-data/u5.base
ml-data/ua.test
ml-data/ua.base
ml-data/ub.base
[hive@vm24 ~]$
hive> CREATE TABLE u_data (
> userid INT,
> movieid INT,
> rating INT,
> unixtime STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘\t’
> STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH ‘ml-data/u.data’
> OVERWRITE INTO TABLE u_data;
Copying data from file:/home/hive/ml-data/u.data Copying file: file:/home/hive/ml-data/u.data Loading data to table default.u_data Table default.u_data stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 1979173, raw_data_size: 0] OK Time taken: 3.0 seconds
hive>
hive> SELECT COUNT(*) FROM u_data;
Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_1394027471317_0016, Tracking URL = http://vm38:8088/proxy/application_1394027471317_0016/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1394027471317_0016 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-03-07 11:59:47,212 Stage-1 map = 0%, reduce = 0% 2014-03-07 11:59:57,933 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 11:59:58,998 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:00,094 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:01,157 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:02,212 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:03,268 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:04,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:05,378 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:06,434 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:07,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:08,573 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:09,630 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:10,697 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.14 sec 2014-03-07 12:00:11,745 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.14 sec MapReduce Total cumulative CPU time: 5 seconds 140 msec Ended Job = job_1394027471317_0016 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.14 sec HDFS Read: 1979386 HDFS Write: 7 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 140 msec OK 100000 Time taken: 67.285 seconds, Fetched: 1 row(s)
hive>

Siin on ka näha, et hadoop arvutusosa tegeleb antud tööga(1394027471317_0016):

[hive@vm24 ~]$ hive –service hiveserver
Starting Hive Thrift Server
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
…
Start Web UI

/etc/hive/conf/hive-site.xml hive.hwi.war.file
lib/hive-hwi-0.12.0.2.0.6.1-101.war

[hive@vm24 ~]$ hive –service hwi
14/03/11 15:14:57 INFO hwi.HWIServer: HWI is starting up
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/03/11 15:14:59 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
14/03/11 15:14:59 INFO mortbay.log: jetty-6.1.26
14/03/11 15:14:59 INFO mortbay.log: Extract /usr/lib/hive/lib/hive-hwi-0.12.0.2.0.6.1-101.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.12.0.2.0.6.1.101.war__hwi__4ykn6s/webapp
14/03/11 15:15:00 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999

http://vm24:9999/hwi/
[hive@vm24 ~]$ hive –service metastore -p 10000
Starting Hive Metastore Server
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

…
Eelnevalt teised hive teenused sulgeda, kuna praeguse seadistusega lukustatakse Derby andmebaas

Metastore