Hadoop HBase

https://hbase.apache.org/

Use Apache HBase when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Name : hbase
Arch : noarch
Version : 0.96.1.2.0.6.1
Release : 101.el6
Size : 44 M
Repo : HDP-2.0.6
Summary : HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
URL : http://hbase.apache.org/
License : APL2
Description : HBase is an open-source, distributed, column-oriented store modeled after Google’ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase
: provides Bigtable-like capabilities on top of Hadoop. HBase includes:
:
: * Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
: * Query predicate push down via server side scan and get filters
: * Optimizations for real time queries
: * A high performance Thrift gateway
: * A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
: * Cascading source and sink modules
: * Extensible jruby-based (JIRB) shell
: * Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

/etc/hosts
90.190.106.56 vm37.dbweb.ee

[root@vm37 ~]# yum install hbase

Resolving Dependencies
–> Running transaction check
—> Package hbase.noarch 0:0.96.1.2.0.6.1-101.el6 will be installed

Total download size: 44 M
Installed size: 50 M
Is this ok [y/N]: y
Downloading Packages:
hbase-0.96.1.2.0.6.1-101.el6.noarch.rpm | 44 MB 00:23
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : hbase-0.96.1.2.0.6.1-101.el6.noarch 1/1
Verifying : hbase-0.96.1.2.0.6.1-101.el6.noarch 1/1

Installed:
hbase.noarch 0:0.96.1.2.0.6.1-101.el6

Complete!
[root@vm37 ~]#

important directories:
/etc/hbase/ – conf
/usr/bin/ – binaries
/usr/lib/hbase/ – libaries
/usr/lib/hbase/logs
/usr/lib/hbase/pids
/var/log/hbase
/var/run/hbase

etc/hbase/conf.dist/hbase-site.xml:

hbase.rootdir
hdfs://vm38.dbweb.ee:8020/user/hbase/data hbase.zookeeper.property.dataDir
hdfs://vm38.dbweb.ee:8020/user/hbase/data hbase.zookeeper.property.clientPort
2181 hbase.zookeeper.quorum
localhost hbase.cluster.distributed
true

[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hbase
[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hbase/data
[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chown -R hbase /user/hbase

[root@vm37 ~]# su – hbase
[root@vm37 ~]#export JAVA_HOME=/usr
[root@vm37 ~]#export HBASE_LOG_DIR=/var/log/hbase/
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start master
#[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start zookeeper – we have distributed zookeepers quad now
starting zookeeper, logging to /var/log/hbase//hbase-hbase-zookeeper-vm37.dbweb.ee.out
[hbase@vm37 ~]$HADOOP_CONF_DIR=/etc/hadoop/conf
starting master, logging to /var/log/hbase//hbase-hbase-master-vm37.dbweb.ee.out
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start regionserver
starting regionserver, logging to /var/log/hbase//hbase-hbase-regionserver-vm37.dbweb.ee.out

….
Problem:
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.3.1.el6.x86_64
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hbase
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hbase
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hbase
2014-03-10 10:44:23,333 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=master:60000, quorum=localhost:2181, baseZNode=/hbase
2014-03-10 10:44:23,360 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=master:60000 connecting to ZooKeeper ensemble=localhost:2181
2014-03-10 10:44:23,366 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:23,374 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1072)
2014-03-10 10:44:23,481 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:23,484 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1072)
2014-03-10 10:44:23,491 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
2014-03-10 10:44:23,491 INFO [main] util.RetryCounter: Sleeping 1000ms before retry #0…
2014-03-10 10:44:24,585 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:24,585 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
Solution:
Zookeeper have to configured and running before master
….

[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase shell
2014-03-10 10:24:32,720 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help’ for list of supported commands.
Type “exit” to leave the HBase Shell
Version 0.96.1.2.0.6.1-101-hadoop2, rcf3f71e5014c66e85c10a244fa9a1e3c43cef077, Wed Jan 8 21:59:02 PST 2014
hbase(main):001:0>
hbase(main):001:0> create ‘test’, ‘cf’
0 row(s) in 11.6950 seconds
=> Hbase::Table – test
hbase(main):002:0> list ‘test’
TABLE
test
1 row(s) in 3.9510 seconds
=> [“test”]
hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
0 row(s) in 0.1420 seconds
hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
0 row(s) in 0.0170 seconds
hbase(main):006:0> put ‘test’, ‘row3’, ‘cf:c’, ‘value3’
0 row(s) in 0.0090 seconds
hbase(main):007:0> scan ‘test’
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1394440138295, value=value1
row2 column=cf:b, timestamp=1394440145368, value=value2
row3 column=cf:c, timestamp=1394440161856, value=value3
3 row(s) in 0.0660 seconds
hbase(main):008:0> get ‘test’, ‘row1’
COLUMN CELL
cf:a timestamp=1394440138295, value=value1
1 row(s) in 0.0390 seconds
hbase(main):009:0> disable ‘test’
0 row(s) in 2.6660 seconds
hbase(main):010:0> drop ‘test’
0 row(s) in 0.5050 seconds
hbase(main):011:0> exit
[hbase@vm37 ~]$


Problem:
2014-03-10 11:16:33,892 WARN [RpcServer.handler=16,port=60000] master.HMaster: Table Namespace Manager not ready yet
hbase(main):001:0> create ‘test’, ‘cf’

ERROR: java.io.IOException: Table Namespace Manager not ready yet, try again later
at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3092)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1729)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1768)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
Solution: At least one regionalserver have to by configured and running

hbase(main):007:0> status
1 servers, 0 dead, 3.0000 average load

http://vm37:16010/master-status

Map/Reduced Export
[hbase@vm37 ~]$ hbase org.apache.hadoop.hbase.mapreduce.Export test test_out2 and result will be in hdfs://server/user/hbase/test_out2/

hbase(main):001:0> create ‘test2’, ‘cf’
hbase(main):002:0> scan ‘test2’
ROW COLUMN+CELL
0 row(s) in 0.0440 seconds

Map/Reduced Import
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Import test2 hdfs://vm38.dbweb.ee:8020/user/hbase/test_out2

hbase(main):004:0> scan ‘test2’
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1394445121367, value=value1
row2 column=cf:b, timestamp=1394445137811, value=value2
row3 column=cf:c, timestamp=1394445149457, value=value3
3 row(s) in 0.0230 seconds

hbase(main):005:0>

 

Add a new regionserver:

Just add new record in master

[root@vm37 kafka_2.9.1-0.8.1.1]# vim /etc/hbase/conf/regionservers

In hbase-site.xml (master and regionserver(s) ) set at least one common zookeepr server in hbase.zookeeper.quorum.

In slave start regionserver:

/usr/lib/hbase/bin/hbase-daemon.sh –config /etc/hbase/conf start regionserver

Check http://master:16010/master-status are regionservers available