Linux – Page 7 – Margus Roo

Audio (Estonian) to text with Kaldi

Posted on March 14, 2014 - April 7, 2015 by margusja

https://github.com/alumae/kaldi-offline-transcriber

CentOS release 6.5 (Final) Linux vm38 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[root@h14 ~]# yum groupinstall “Development Tools”
[root@h14 ~]# yum install zlib-devel

[root@h14 ~]# yum install java-1.7.0-openjdk.x86_64

[root@vm38 ~]# yum install ffmpeg

[root@vm38 ~]# yum install sox
[root@vm38 ~]# yum install atlas
[root@vm38 ~]# yum install atlas-devel

[root@vm38 ~]# su – margusja
[margusja@vm38 ~]$ mkdir kaldi
[margusja@vm38 ~]$ cd kaldi/
[margusja@vm38 ~]$ mkdir tool
[margusja@vm38 ~]$ cd tools /
~~[margusja@vm38 ~]$ svn co svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk~~ Hetkel annab alltoodud probleemi ID-2
~~[margusja@vm38 tools]$ svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk~~

svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk // 4xxxx series build
[margusja@vm38 ~]$ cd kaldi-trunk/
[margusja@vm38 ~]$ cd tools/

Downloaded http://sourceforge.net/projects/math-atlas/files/Stable/3.10.0/atlas3.10.0.tar.bz2 and build it – huge work!
[margusja@vm38 ~]$ make – ~~Kuna on vana co siis, Makefile sees olevad viited välistele ressursidele on muutunud, mida tuleb uuendada~~
[margusja@vm38 tools]$ cd ../src/
[margusja@vm38 ~]$ ./configure
[margusja@vm38 ~]$ make depend
[margusja@vm38 ~]$ make test (optional)
[margusja@vm38 ~]$ make valgrind (optional – memory tests can contain errors – takes long time)
[margusja@vm38 ~]$ make

[root@h14 ~]# wget http://mirror-fpt-telecom.fpt.net/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
[root@h14 ~]# rpm -i epel-release-6-8.noarch.rpm
[root@vm38 ~]# yum install python-pip

[root@vm38 ~]$ CPPFLAGS=”-I/home/margusja/kaldi/tools/kaldi-trunk/tools/openfst/include -L/home/margusja/kaldi/tools/kaldi-trunk/tools/openfst/lib” pip install pyfst

[margusja@vm38 ~]$ cd /home/margusja/kaldi/tools/
[margusja@vm38 tools]$ git clone https://github.com/alumae/kaldi-offline-transcriber.git
[margusja@vm38 tools]$ cd kaldi-offline-transcriber/
[margusja@vm38 kaldi-offline-transcriber]$ curl http://www.phon.ioc.ee/~tanela/kaldi-offline-transcriber-data.tgz | tar xvz

[margusja@vm38 kaldi-offline-transcriber]$ vim Makefile.options // Inside it add a line KALDI_ROOT=/home/margusja/kaldi/tools/kaldi-trunk – whatever where is your path

[margusja@vm38 kaldi-offline-transcriber]$ make .init

…
Problem ID-1:
sox formats: no handler for file extension `mp3′
Solution:
Convert mp3 to ogg

…
Problem ID-2:
steps/decode_nnet_cpu.sh –num-threads 1 –skip-scoring true –cmd “$decode_cmd” –nj 1 \
–transform-dir build/trans/test3/tri3b_mmi_pruned/decode \
build/fst/tri3b/graph_prunedlm build/trans/test3 `dirname build/trans/test3/nnet5c1_pruned/decode/log`
steps/decode_nnet_cpu.sh –num-threads 1 –skip-scoring true –cmd run.pl –nj 1 –transform-dir build/trans/test3/tri3b_mmi_pruned/decode build/fst/tri3b/graph_prunedlm build/trans/test3 build/trans/test3/nnet5c1_pruned/decode
steps/decode_nnet_cpu.sh: feature type is lda
steps/decode_nnet_cpu.sh: using transforms from build/trans/test3/tri3b_mmi_pruned/decode
run.pl: job failed, log is in build/trans/test3/nnet5c1_pruned/decode/log/decode.1.log
make: *** [build/trans/test3/nnet5c1_pruned/decode/log] Error 1

Solution:
[margusja@vm38 tools]$ svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk
…
Problem ID-3
make /build/output/[file].txt annab
EFFECT OPTIONS (effopts): effect dependent; see –help-effect
sox: unrecognized option `–norm’
sox: SoX v14.2.0

Failed: invalid option
Solution hetkel Makefile seest eemaldada –norm võti sox käsult.

…
Problem ID-4
Decoding done.
(cd build/trans/test2/nnet5c1_pruned; ln -s ../../../fst/tri3b/graph_prunedlm graph)
rm -rf build/trans/test2/nnet5c1_pruned_rescored_main
mkdir -p build/trans/test2/nnet5c1_pruned_rescored_main
(cd build/trans/test2/nnet5c1_pruned_rescored_main; for f in ../../../fst/nnet5c1/*; do ln -s $f; done)
local/lmrescore_lowmem.sh –cmd “$decode_cmd” –mode 1 build/fst/data/prunedlm build/fst/data/mainlm \
build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode || exit 1;
local/lmrescore_lowmem.sh –cmd run.pl –mode 1 build/fst/data/prunedlm build/fst/data/mainlm build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode
run.pl: job failed, log is in build/trans/test2/nnet5c1_pruned_rescored_main/decode/log/rescorelm.JOB.log
queue.pl: probably you forgot to put JOB=1:$nj in your script.
make: *** [build/trans/test2/nnet5c1_pruned_rescored_main/decode/log] Error 1

local/lmrescore_lowmem.sh –cmd utils/run.pl –mode 1 build/fst/data/prunedlm build/fst/data/mainlm build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode
run.pl: job failed, log is in build/trans/test2/nnet5c1_pruned_rescored_main/decode/log/rescorelm.JOB.log
queue.pl: probably you forgot to put JOB=1:$nj in your script.

…
Problem ID-5:
/usr/bin/ld: skipping incompatible /usr/lib/libz.so when searching for -lz

Solution:
[root@h14 ~]# rpm -qif /usr/lib/libz.so
Name : zlib-devel Relocations: (not relocatable)
Version : 1.2.3 Vendor: CentOS
Release : 29.el6 Build Date: Fri 22 Feb 2013 01:01:21 AM EET
Install Date: Fri 14 Mar 2014 10:21:49 AM EET Build Host: c6b9.bsys.dev.centos.org
Group : Development/Libraries Source RPM: zlib-1.2.3-29.el6.src.rpm
Size : 117494 License: zlib and Boost
Signature : RSA/SHA1, Sat 23 Feb 2013 07:53:47 PM EET, Key ID 0946fca2c105b9de
Packager : CentOS BuildSystem <http://bugs.centos.org>
URL : http://www.gzip.org/zlib/
Summary : Header files and libraries for Zlib development
Description :
The zlib-devel package contains the header files and libraries needed
to develop programs that use the zlib compression and decompression
library.
[root@h14 ~]# yum install zlib-devel

Hadoop HBase

Posted on March 10, 2014 - May 6, 2014 by margusja

https://hbase.apache.org/

Use Apache HBase when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Name : hbase
Arch : noarch
Version : 0.96.1.2.0.6.1
Release : 101.el6
Size : 44 M
Repo : HDP-2.0.6
Summary : HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
URL : http://hbase.apache.org/
License : APL2
Description : HBase is an open-source, distributed, column-oriented store modeled after Google’ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase
: provides Bigtable-like capabilities on top of Hadoop. HBase includes:
:
: * Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
: * Query predicate push down via server side scan and get filters
: * Optimizations for real time queries
: * A high performance Thrift gateway
: * A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
: * Cascading source and sink modules
: * Extensible jruby-based (JIRB) shell
: * Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

/etc/hosts
90.190.106.56 vm37.dbweb.ee

[root@vm37 ~]# yum install hbase
…
Resolving Dependencies
–> Running transaction check
—> Package hbase.noarch 0:0.96.1.2.0.6.1-101.el6 will be installed
…
Total download size: 44 M
Installed size: 50 M
Is this ok [y/N]: y
Downloading Packages:
hbase-0.96.1.2.0.6.1-101.el6.noarch.rpm | 44 MB 00:23
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : hbase-0.96.1.2.0.6.1-101.el6.noarch 1/1
Verifying : hbase-0.96.1.2.0.6.1-101.el6.noarch 1/1

Installed:
hbase.noarch 0:0.96.1.2.0.6.1-101.el6

Complete!
[root@vm37 ~]#

important directories:
/etc/hbase/ – conf
/usr/bin/ – binaries
/usr/lib/hbase/ – libaries
/usr/lib/hbase/logs
/usr/lib/hbase/pids
/var/log/hbase
/var/run/hbase

etc/hbase/conf.dist/hbase-site.xml:

hbase.rootdir
hdfs://vm38.dbweb.ee:8020/user/hbase/data hbase.zookeeper.property.dataDir
hdfs://vm38.dbweb.ee:8020/user/hbase/data hbase.zookeeper.property.clientPort
2181 hbase.zookeeper.quorum
localhost hbase.cluster.distributed
true

[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hbase
[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hbase/data
[hdfs@vm37 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chown -R hbase /user/hbase

[root@vm37 ~]# su – hbase
[root@vm37 ~]#export JAVA_HOME=/usr
[root@vm37 ~]#export HBASE_LOG_DIR=/var/log/hbase/
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start master
#[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start zookeeper – we have distributed zookeepers quad now
starting zookeeper, logging to /var/log/hbase//hbase-hbase-zookeeper-vm37.dbweb.ee.out
[hbase@vm37 ~]$HADOOP_CONF_DIR=/etc/hadoop/conf
starting master, logging to /var/log/hbase//hbase-hbase-master-vm37.dbweb.ee.out
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase-daemon.sh start regionserver
starting regionserver, logging to /var/log/hbase//hbase-hbase-regionserver-vm37.dbweb.ee.out

….
Problem:
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.3.1.el6.x86_64
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hbase
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hbase
2014-03-10 10:44:23,331 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hbase
2014-03-10 10:44:23,333 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=master:60000, quorum=localhost:2181, baseZNode=/hbase
2014-03-10 10:44:23,360 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=master:60000 connecting to ZooKeeper ensemble=localhost:2181
2014-03-10 10:44:23,366 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:23,374 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1072)
2014-03-10 10:44:23,481 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:23,484 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1072)
2014-03-10 10:44:23,491 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
2014-03-10 10:44:23,491 INFO [main] util.RetryCounter: Sleeping 1000ms before retry #0…
2014-03-10 10:44:24,585 INFO [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-03-10 10:44:24,585 WARN [main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
Solution:
Zookeeper have to configured and running before master
….

[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase shell
2014-03-10 10:24:32,720 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help’ for list of supported commands.
Type “exit” to leave the HBase Shell
Version 0.96.1.2.0.6.1-101-hadoop2, rcf3f71e5014c66e85c10a244fa9a1e3c43cef077, Wed Jan 8 21:59:02 PST 2014
hbase(main):001:0>
hbase(main):001:0> create ‘test’, ‘cf’
0 row(s) in 11.6950 seconds
=> Hbase::Table – test
hbase(main):002:0> list ‘test’
TABLE
test
1 row(s) in 3.9510 seconds
=> [“test”]
hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
0 row(s) in 0.1420 seconds
hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
0 row(s) in 0.0170 seconds
hbase(main):006:0> put ‘test’, ‘row3’, ‘cf:c’, ‘value3’
0 row(s) in 0.0090 seconds
hbase(main):007:0> scan ‘test’
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1394440138295, value=value1
row2 column=cf:b, timestamp=1394440145368, value=value2
row3 column=cf:c, timestamp=1394440161856, value=value3
3 row(s) in 0.0660 seconds
hbase(main):008:0> get ‘test’, ‘row1’
COLUMN CELL
cf:a timestamp=1394440138295, value=value1
1 row(s) in 0.0390 seconds
hbase(main):009:0> disable ‘test’
0 row(s) in 2.6660 seconds
hbase(main):010:0> drop ‘test’
0 row(s) in 0.5050 seconds
hbase(main):011:0> exit
[hbase@vm37 ~]$

…
Problem:
2014-03-10 11:16:33,892 WARN [RpcServer.handler=16,port=60000] master.HMaster: Table Namespace Manager not ready yet
hbase(main):001:0> create ‘test’, ‘cf’

ERROR: java.io.IOException: Table Namespace Manager not ready yet, try again later
at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3092)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1729)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1768)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
Solution: At least one regionalserver have to by configured and running
…

hbase(main):007:0> status
1 servers, 0 dead, 3.0000 average load

http://vm37:16010/master-status

Map/Reduced Export
[hbase@vm37 ~]$ hbase org.apache.hadoop.hbase.mapreduce.Export test test_out2 and result will be in hdfs://server/user/hbase/test_out2/

hbase(main):001:0> create ‘test2’, ‘cf’
hbase(main):002:0> scan ‘test2’
ROW COLUMN+CELL
0 row(s) in 0.0440 seconds

Map/Reduced Import
[hbase@vm37 ~]$ /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.Import test2 hdfs://vm38.dbweb.ee:8020/user/hbase/test_out2

hbase(main):004:0> scan ‘test2’
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1394445121367, value=value1
row2 column=cf:b, timestamp=1394445137811, value=value2
row3 column=cf:c, timestamp=1394445149457, value=value3
3 row(s) in 0.0230 seconds

hbase(main):005:0>

Add a new regionserver:

Just add new record in master

[root@vm37 kafka_2.9.1-0.8.1.1]# vim /etc/hbase/conf/regionservers

In hbase-site.xml (master and regionserver(s) ) set at least one common zookeepr server in hbase.zookeeper.quorum.

In slave start regionserver:

/usr/lib/hbase/bin/hbase-daemon.sh –config /etc/hbase/conf start regionserver

Check http://master:16010/master-status are regionservers available

Apache Hive-0.12 and Hadoop-2.2.0

Posted on March 7, 2014 - March 11, 2014 by margusja

http://hive.apache.org/

The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

[root@vm24 ~]# yum install hive
Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: ftp.hosteurope.de * epel: ftp.lysator.liu.se * extras: ftp.hosteurope.de * rpmforge: mirror.bacloud.com * updates: ftp.hosteurope.de Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package hive.noarch 0:0.12.0.2.0.6.1-101.el6 will be installed --> Finished Dependency Resolution


Dependencies Resolved
================================================================================================================================================================================================================================================================================

 Package                                                     Arch                                                          Version                                                                       Repository                                                        Size

================================================================================================================================================================================================================================================================================

Installing:

 hive                                                        noarch                                                        0.12.0.2.0.6.1-101.el6                                                        HDP-2.0.6                                                         44 M
Transaction Summary

================================================================================================================================================================================================================================================================================

Install       1 Package(s)
Total download size: 44 M

Installed size: 207 M

Is this ok [y/N]: y

Downloading Packages:

hive-0.12.0.2.0.6.1-101.el6.noarch.rpm                                                                                                                                                                                                                   |  44 MB     00:19

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing : hive-0.12.0.2.0.6.1-101.el6.noarch                                                                                                                                                                                                                         1/1

  Verifying  : hive-0.12.0.2.0.6.1-101.el6.noarch                                                                                                                                                                                                                           1/1 
Installed:

  hive.noarch 0:0.12.0.2.0.6.1-101.el6

Complete!
[root@vm24 ~]#

Olulisemad kataloogid, mis tekkisid (rpm -ql hive)
/usr/lib/hive/ – see peaks olema hive home
/var/lib/hive
/var/lib/hive/metastore
/var/log/hive
/var/run/hive

[root@vm24 ~]# su – hive
[hive@vm24 ~]$ export HIVE_HOME=/usr/lib/hive
[hive@vm24 ~]$ export HADOOP_HOME=/usr/lib/hadoop

[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hive
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -mkdir /user/hive/warehouse
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chmod g+w /tmp
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chmod g+w /user/hive/warehouse
[hdfs@vm24 ~]$ /usr/lib/hadoop-hdfs/bin/hdfs dfs -chown -R hive /user/hive/
[hdfs@vm24 ~]$
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
[hive@vm24 ~]$

Ilmselt olen segamine ajanud hadoop ja hadoop-hdfs
[hive@vm24 ~]$ export HADOOP_HOME=/usr/lib/hadoop
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
Error: JAVA_HOME is not set and could not be found. Unable to determine Hadoop version information. 'hadoop version' returned: Error: JAVA_HOME is not set and could not be found.

[hive@vm24 ~]$
[hive@vm24 ~]$ export JAVA_HOME=/usr
[hive@vm24 ~]$ /usr/lib/hive/bin/hive
14/03/07 11:49:15 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/07 11:49:15 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.12.0.2.0.6.1-101.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive>

Session on hive:
[hive@vm24 ~]$ wget https://hadoop-clusternet.googlecode.com/svn-history/r20/trunk/clusternet/thirdparty/data/ml-data.tar__0.gz
–2014-03-07 11:53:56– https://hadoop-clusternet.googlecode.com/svn-history/r20/trunk/clusternet/thirdparty/data/ml-data.tar__0.gz
Resolving hadoop-clusternet.googlecode.com… 2a00:1450:4001:c02::52, 173.194.70.82
Connecting to hadoop-clusternet.googlecode.com|2a00:1450:4001:c02::52|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 4948405 (4.7M) [application/octet-stream]
Saving to: “ml-data.tar__0.gz”

100%[======================================================================================================================================================================================================================================>] 4,948,405 609K/s in 7.1s

2014-03-07 11:54:03 (681 KB/s) – “ml-data.tar__0.gz” saved [4948405/4948405]

[hive@vm24 ~]$
[hive@vm24 ~]$ tar zxvf ml-data.tar__0.gz
ml-data/
ml-data/README
ml-data/allbut.pl
ml-data/mku.sh
ml-data/u.data
ml-data/u.genre
ml-data/u.info
ml-data/u.item
ml-data/u.occupation
ml-data/u.user
ml-data/ub.test
ml-data/u1.test
ml-data/u1.base
ml-data/u2.test
ml-data/u2.base
ml-data/u3.test
ml-data/u3.base
ml-data/u4.test
ml-data/u4.base
ml-data/u5.test
ml-data/u5.base
ml-data/ua.test
ml-data/ua.base
ml-data/ub.base
[hive@vm24 ~]$
hive> CREATE TABLE u_data (
> userid INT,
> movieid INT,
> rating INT,
> unixtime STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘\t’
> STORED AS TEXTFILE;

hive> LOAD DATA LOCAL INPATH ‘ml-data/u.data’
> OVERWRITE INTO TABLE u_data;
Copying data from file:/home/hive/ml-data/u.data Copying file: file:/home/hive/ml-data/u.data Loading data to table default.u_data Table default.u_data stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 1979173, raw_data_size: 0] OK Time taken: 3.0 seconds
hive>
hive> SELECT COUNT(*) FROM u_data;
Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_1394027471317_0016, Tracking URL = http://vm38:8088/proxy/application_1394027471317_0016/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1394027471317_0016 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-03-07 11:59:47,212 Stage-1 map = 0%, reduce = 0% 2014-03-07 11:59:57,933 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 11:59:58,998 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:00,094 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:01,157 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:02,212 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:03,268 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:04,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:05,378 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:06,434 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:07,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:08,573 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:09,630 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.14 sec 2014-03-07 12:00:10,697 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.14 sec 2014-03-07 12:00:11,745 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.14 sec MapReduce Total cumulative CPU time: 5 seconds 140 msec Ended Job = job_1394027471317_0016 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.14 sec HDFS Read: 1979386 HDFS Write: 7 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 140 msec OK 100000 Time taken: 67.285 seconds, Fetched: 1 row(s)
hive>

Siin on ka näha, et hadoop arvutusosa tegeleb antud tööga(1394027471317_0016):

[hive@vm24 ~]$ hive –service hiveserver
Starting Hive Thrift Server
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 15:21:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
…
Start Web UI

/etc/hive/conf/hive-site.xml hive.hwi.war.file
lib/hive-hwi-0.12.0.2.0.6.1-101.war

[hive@vm24 ~]$ hive –service hwi
14/03/11 15:14:57 INFO hwi.HWIServer: HWI is starting up
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 15:14:58 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/03/11 15:14:59 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
14/03/11 15:14:59 INFO mortbay.log: jetty-6.1.26
14/03/11 15:14:59 INFO mortbay.log: Extract /usr/lib/hive/lib/hive-hwi-0.12.0.2.0.6.1-101.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.12.0.2.0.6.1.101.war__hwi__4ykn6s/webapp
14/03/11 15:15:00 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999

http://vm24:9999/hwi/
[hive@vm24 ~]$ hive –service metastore -p 10000
Starting Hive Metastore Server
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/11 16:00:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

…
Eelnevalt teised hive teenused sulgeda, kuna praeguse seadistusega lukustatakse Derby andmebaas

Metastore

Protected: Hadoop 2.2.0 add slave

Posted on March 7, 2014 - March 19, 2014 by margusja

Pig install, configure to use remote hadoop-yarn resourcemanager and a simple session

Posted on March 6, 2014 - March 6, 2014 by margusja

https://pig.apache.org/ pig.noarch : Pig is a platform for analyzing large data sets

[root@vm24 ~]# yum install pig

Loading mirror speeds from cached hostfile
* base: mirrors.coreix.net
* epel: ftp.lysator.liu.se
* extras: mirrors.coreix.net
* rpmforge: mirror.bacloud.com
* updates: mirrors.coreix.net
Setting up Install Process
Resolving Dependencies
–> Running transaction check
—> Package pig.noarch 0:0.12.0.2.0.6.1-101.el6 will be installed
–> Processing Dependency: hadoop-client for package: pig-0.12.0.2.0.6.1-101.el6.noarch
–> Running transaction check
—> Package hadoop-client.x86_64 0:2.2.0.2.0.6.0-101.el6 will be installed
–> Processing Dependency: hadoop-yarn = 2.2.0.2.0.6.0-101.el6 for package: hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64
–> Processing Dependency: hadoop-mapreduce = 2.2.0.2.0.6.0-101.el6 for package: hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64
–> Processing Dependency: hadoop-hdfs = 2.2.0.2.0.6.0-101.el6 for package: hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64
–> Processing Dependency: hadoop = 2.2.0.2.0.6.0-101.el6 for package: hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64
–> Running transaction check
—> Package hadoop.x86_64 0:2.2.0.2.0.6.0-101.el6 will be installed
—> Package hadoop-hdfs.x86_64 0:2.2.0.2.0.6.0-101.el6 will be installed
—> Package hadoop-mapreduce.x86_64 0:2.2.0.2.0.6.0-101.el6 will be installed
—> Package hadoop-yarn.x86_64 0:2.2.0.2.0.6.0-101.el6 will be installed
–> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================================================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================================================================================================================================================================
Installing:
pig noarch 0.12.0.2.0.6.1-101.el6 HDP-2.0.6 64 M
Installing for dependencies:
hadoop x86_64 2.2.0.2.0.6.0-101.el6 HDP-2.0.6 18 M
hadoop-client x86_64 2.2.0.2.0.6.0-101.el6 HDP-2.0.6 9.2 k
hadoop-hdfs x86_64 2.2.0.2.0.6.0-101.el6 HDP-2.0.6 13 M
hadoop-mapreduce x86_64 2.2.0.2.0.6.0-101.el6 HDP-2.0.6 11 M
hadoop-yarn x86_64 2.2.0.2.0.6.0-101.el6 HDP-2.0.6 9.5 M

Transaction Summary
================================================================================================================================================================================================================================================================================
Install 6 Package(s)

Total download size: 115 M
Installed size: 191 M
Is this ok [y/N]: y
Downloading Packages:
(1/6): hadoop-2.2.0.2.0.6.0-101.el6.x86_64.rpm (2/6): hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64.rpm (3/6): hadoop-hdfs-2.2.0.2.0.6.0-101.el6.x86_64.rpm (4/6): hadoop-mapreduce-2.2.0.2.0.6.0-101.el6.x86_64.rpm (5/6): hadoop-yarn-2.2.0.2.0.6.0-101.el6.x86_64.rpm (6/6): pig-0.12.0.2.0.6.1-101.el6.noarch.rpm ———————— Total Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : hadoop-2.2.0.2.0.6.0-101.el6.x86_64 Installing : hadoop-yarn-2.2.0.2.0.6.0-101.el6.x86_64 warning: group yarn does not exist – using Installing : hadoop-mapreduce-2.2.0.2.0.6.0-101.el6.x86_64 Installing : hadoop-hdfs-2.2.0.2.0.6.0-101.el6.x86_64 Installing : hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64 Installing : pig-0.12.0.2.0.6.1-101.el6.noarch Verifying : hadoop-yarn-2.2.0.2.0.6.0-101.el6.x86_64 Verifying : hadoop-client-2.2.0.2.0.6.0-101.el6.x86_64 Verifying : hadoop-2.2.0.2.0.6.0-101.el6.x86_64 Verifying : hadoop-hdfs-2.2.0.2.0.6.0-101.el6.x86_64 Verifying : pig-0.12.0.2.0.6.1-101.el6.noarch Verifying : hadoop-mapreduce-2.2.0.2.0.6.0-101.el6.x86_64 | 18 MB 00:11
| 9.2 kB 00:00
| 13 MB 00:05
| 11 MB 00:06
| 9.5 MB 00:05
| 64 MB 00:26
;——————————————————————————————————————————————————————————————————————————————————–
2.0 MB/s | 115 MB 00:56
1/6
2/6
root
3/6
4/6
5/6
6/6
1/6
2/6
3/6
4/6
5/6
6/6

Installed:
pig.noarch 0:0.12.0.2.0.6.1-101.el6

Dependency Installed:
hadoop.x86_64 0:2.2.0.2.0.6.0-101.el6 hadoop-client.x86_64 0:2.2.0.2.0.6.0-101.el6 hadoop-hdfs.x86_64 0:2.2.0.2.0.6.0-101.el6 hadoop-mapreduce.x86_64 0:2.2.0.2.0.6.0-101.el6 hadoop-yarn.x86_64 0:2.2.0.2.0.6.0-101.el6

Complete!
[root@vm24 ~]#

[root@vm24 ~]# su – margusja
[margusja@vm24 ~]$ pig
which: no hbase in (:/usr/local/apache-maven-3.1.1/bin:/usr/lib64/qt-3.3/bin:/usr/local/maven/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/margusja/bin)
2014-03-06 10:17:18,392 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.0.2.0.6.1-101 (rexported) compiled Jan 08 2014, 22:49:47
2014-03-06 10:17:18,393 [main] INFO org.apache.pig.Main – Logging error messages to: /home/margusja/pig_1394093838389.log
2014-03-06 10:17:18,690 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/margusja/.pigbootup not found
2014-03-06 10:17:19,680 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-03-06 10:17:19,680 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
2014-03-06 10:17:19,680 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: file:///
2014-03-06 10:17:19,692 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-03-06 10:17:22,675 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>

—
Siinkohal tehes päringut ebaõnnestume, sest pig ei tea olulisi parameetreid hadoop ja yarn keskkondade kohta.
Üks võimalus, mida mina kasutan – määrata PIG_CLASSPATH=/etc/hadoop/conf, kus omakorda
yarn-site.xml:

yarn.application.classpath
/etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/* yarn.resourcemanager.address
vm38:8032 yarn.log-aggregation-enable
true yarn.resourcemanager.scheduler.address
vm38:8030 yarn.nodemanager.aux-services
mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler

mapred-site.xml:

mapreduce.framework.name
yarn yarn.app.mapreduce.am.staging-dir
/user

core-site.xml:

fs.defaultFS
hdfs://vm38:8020

Nüüd on pig kliendil piisavalt informatsiooni, et saata map-reduce tööd hadoop-yarn ressursijaotajale, kes omakorda jagab töö temale kättesaadavate ressursside (nodemanageride) vahel.

Näide pig sessioonist:
[margusja@vm24 ~]$ env
SHELL=/bin/bash
TERM=xterm-256color
HADOOP_HOME=/usr/lib/hadoop
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
QTINC=/usr/lib64/qt-3.3/include
USER=margusja
LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lz=38;5;9:*.xz=38;5;9:*.bz2=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.bz=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.rar=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.axv=38;5;13:*.anx=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.axa=38;5;45:*.oga=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
MAIL=/var/spool/mail/margusja
PATH=/usr/local/apache-maven-3.1.1/bin:/usr/lib64/qt-3.3/bin:/usr/local/maven/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/margusja/bin
PWD=/home/margusja
JAVA_HOME=/usr/lib/jvm/jre-1.7.0
EDITOR=/usr/bin/vim
PIG_CLASSPATH=/etc/hadoop/conf
LANG=en_US.UTF-8
HISTCONTROL=ignoredups
M2_HOME=/usr/local/apache-maven-3.1.1
SHLVL=1
HOME=/home/margusja
LOGNAME=margusja
QTLIB=/usr/lib64/qt-3.3/lib
CVS_RSH=ssh
LESSOPEN=|/usr/bin/lesspipe.sh %s
G_BROKEN_FILENAMES=1
_=/bin/env
[margusja@vm24 ~]$
[margusja@vm24 ~]$ pig
which: no hbase in (:/usr/local/apache-maven-3.1.1/bin:/usr/lib64/qt-3.3/bin:/usr/local/maven/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/margusja/bin)
2014-03-06 11:55:56,557 [main] INFO org.apache.pig.Main – Apache Pig version 0.12.0.2.0.6.1-101 (rexported) compiled Jan 08 2014, 22:49:47
2014-03-06 11:55:56,558 [main] INFO org.apache.pig.Main – Logging error messages to: /home/margusja/pig_1394099756554.log
2014-03-06 11:55:56,605 [main] INFO org.apache.pig.impl.util.Utils – Default bootup file /home/margusja/.pigbootup not found
2014-03-06 11:55:57,292 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-03-06 11:55:57,292 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
2014-03-06 11:55:57,292 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://vm38:8020
2014-03-06 11:55:57,304 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-03-06 11:56:02,676 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>
grunt> A = load ‘passwd’ using PigStorage(‘:’); (passwd fail peab olema eelnevalt vastava kasutaja dfs kodukatakoogis – /usr/lib/hadoop-hdfs/bin/hdfs dfs -put /etc/passwd /user/margusja)
grunt> B = foreach A generate $0 as id; (passwd failis omistame esimesel real oleva id muutujasse)
grunt> dump B;
2014-03-06 12:28:36,225 [main] INFO org.apache.pig.tools.pigstats.ScriptState – Pig features used in the script: UNKNOWN
2014-03-06 12:28:36,287 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer – {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-03-06 12:28:36,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler – File concatenation threshold: 100 optimistic? false
2014-03-06 12:28:36,499 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer – MR plan size before optimization: 1
2014-03-06 12:28:36,499 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer – MR plan size after optimization: 1
2014-03-06 12:28:36,926 [main] INFO org.apache.hadoop.yarn.client.RMProxy – Connecting to ResourceManager at vm38/90.190.106.33:8032
2014-03-06 12:28:37,167 [main] INFO org.apache.pig.tools.pigstats.ScriptState – Pig script settings are added to the job
2014-03-06 12:28:37,194 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-03-06 12:28:37,204 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – creating jar file Job5693330381910866671.jar
2014-03-06 12:28:45,595 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – jar file Job5693330381910866671.jar created
2014-03-06 12:28:45,595 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.jar is deprecated. Instead, use mapreduce.job.jar
2014-03-06 12:28:45,635 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler – Setting up single store job
2014-03-06 12:28:45,658 [main] INFO org.apache.pig.data.SchemaTupleFrontend – Key [pig.schematuple] is false, will not generate code.
2014-03-06 12:28:45,658 [main] INFO org.apache.pig.data.SchemaTupleFrontend – Starting process to move generated code to distributed cache
2014-03-06 12:28:45,661 [main] INFO org.apache.pig.data.SchemaTupleFrontend – Setting key [pig.schematuple.classes] with classes to deserialize []
2014-03-06 12:28:45,737 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 1 map-reduce job(s) waiting for submission.
2014-03-06 12:28:45,765 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy – Connecting to ResourceManager at vm38/x.x.x.x:8032
2014-03-06 12:28:45,873 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
2014-03-06 12:28:45,875 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-03-06 12:28:45,875 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2014-03-06 12:28:45,875 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.job.name is deprecated. Instead, use mapreduce.job.name
2014-03-06 12:28:45,875 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
2014-03-06 12:28:45,876 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-03-06 12:28:45,876 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2014-03-06 12:28:45,876 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
2014-03-06 12:28:45,876 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-03-06 12:28:46,822 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat – Total input paths to process : 1
2014-03-06 12:28:46,822 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
2014-03-06 12:28:46,858 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths (combined) to process : 1
2014-03-06 12:28:46,992 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter – number of splits:1
2014-03-06 12:28:47,008 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – user.name is deprecated. Instead, use mapreduce.job.user.name
2014-03-06 12:28:47,009 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
2014-03-06 12:28:47,011 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max
2014-03-06 12:28:47,014 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-03-06 12:28:47,014 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation – mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2014-03-06 12:28:47,674 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter – Submitting tokens for job: job_1394027471317_0013
2014-03-06 12:28:48,137 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl – Submitted application application_1394027471317_0013 to ResourceManager at vm38/x.x.x.x:8032
2014-03-06 12:28:48,221 [JobControl] INFO org.apache.hadoop.mapreduce.Job – The url to track the job: http://vm38:8088/proxy/application_1394027471317_0013/
2014-03-06 12:28:48,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – HadoopJobId: job_1394027471317_0013
2014-03-06 12:28:48,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Processing aliases A,B
2014-03-06 12:28:48,222 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – detailed locations: M: A[1,4],B[2,4] C: R:
2014-03-06 12:28:48,293 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete
2014-03-06 12:29:06,570 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 50% complete
2014-03-06 12:29:09,274 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
2014-03-06 12:29:09,277 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.2.0.2.0.6.0-101 0.12.0.2.0.6.1-101 margusja 2014-03-06 12:28:37 2014-03-06 12:29:09 UNKNOWN

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1394027471317_0013 1 0 7 7 7 7 n/a n/a n/a n/a A,B MAP_ONLY hdfs://vm38:8020/tmp/temp1191617276/tmp1745379757,

Input(s):
Successfully read 46 records (2468 bytes) from: “hdfs://vm38:8020/user/margusja/passwd”

Output(s):
Successfully stored 46 records (528 bytes) in: “hdfs://vm38:8020/tmp/temp1191617276/tmp1745379757”

Counters:
Total records written : 46
Total bytes written : 528
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1394027471317_0013

2014-03-06 12:29:09,414 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Success!
2014-03-06 12:29:09,419 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
2014-03-06 12:29:09,419 [main] INFO org.apache.pig.data.SchemaTupleBackend – Key [pig.schematuple] was not set… will not generate code.
2014-03-06 12:29:17,690 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat – Total input paths to process : 1
2014-03-06 12:29:17,690 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil – Total input paths to process : 1
(root)
(bin)
(daemon)
(adm)
(lp)
(sync)
(shutdown)
(halt)
(mail)
(uucp)
(operator)
(games)
(gopher)
(ftp)
(nobody)
(vcsa)
(saslauth)
(postfix)
(sshd)
(ntp)
(bacula)
(apache)
(mysql)
(web)
(zabbix)
(hduser)
(margusja)
(zend)
(dbus)
(rstudio-server)
(tcpdump)
(postgres)
(puppet)
(ambari-qa)
(hdfs)
(mapred)
(zookeeper)
(nagios)
(yarn)
(hive)
(hbase)
(oozie)
(hcat)
(rrdcached)
(sqoop)
(hue)
grunt>

Centos – how to purge swap on the fly

Posted on March 22, 2013 by margusja

[root@vm37 ~]# free -m
total used free shared buffers cached
Mem: 3881 3383 498 0 89 879
-/+ buffers/cache: 2415 1466
Swap: 991 53 938
[root@vm37 ~]# swapoff -a && swapon -a
[root@vm37 ~]# free -m
total used free shared buffers cached
Mem: 3881 3438 443 0 90 880
-/+ buffers/cache: 2467 1414
Swap: 991 0 991
[root@vm37 ~]#

Dilbert’s salari theorem

Posted on March 1, 2013 by margusja

Selenium RC headless server setup (CentOS 6)

Posted on January 17, 2013 by margusja

# yum install xorg-x11-server-Xvfb.x86_64

# yum install firefox.x86_64

# Xvfb :99 -screen 0 800x600x16 (käivitame virtuaalse X’i)

# export DISPLAY=:99 (et RC teaks millist akent kasutada)

# java -jar selenium-server-standalone-2.28.0.jar (käivitame seleniumi teste serveeriva teenuse)

…

Edasi on juba testimine, näiteks:

[margusja@vm37 selenium_tests]$ python ./test.py
.
———————————————————————-
Ran 1 test in 8.501s

OK
[margusja@vm37 selenium_tests]$

Tarzan suurlinnas: võrgusuhtluse eripäradest

Posted on December 26, 2012 - December 26, 2012 by margusja

Viimased 30 aastat on e-suhtlus olnud meie igapäevaelu üks lahutamatuid osasid. Kellel vähem, kellel rohkem, mingil määral puutub arenenud IT-ühiskonnas sellega kokku pea igaüks.

Antud kirjatüki eesmärk on tuua Virginia Shea 10 käsust üks näide, mis seostub minu IT-kogemusega. Ma teen seda, samas tahan toonitada, et minu arvates võiks teema taandada tänaseks, kus virtuaalmaailma ja reaalsuse piir on palju õhem, kui ta oli ajal, mil need 10 käsku maha sai kirjutatud, reeglitele, mis kehtivad reaalses maailmas.

Minu tõlgendus:

Ole inimene. – No seda võiks ka reaalelus.
Käitu sama malli järgi nagu igapäevaelus. – Kogu 10 käsku saaks taandada antud käsule.
Tea, kus sa oled. – Väga kenasti reaalses maailmas rakendatav reegel.
Austa teiste inimeste aega ja võrguühendust. Sama kehtib ka reaalelus. Tänaseks on aeg tähtsam, kui võrguühendus.
Näe võrgus hea välja. Reaalselt võiks ju ka. Ma ei pea silmas ainult füüsilist poolt.
Jaga oma teadmisi. Antud käsk on pigem vabavara maailma kuuluv. Mina olen isiklikult seisukohal, et osad teadmised võiks mugavalt siiski tagada teadmiste omanikule mugava sissetuleku. Kisub sinna patenditeemasse.
Aita piirata sõimusõdu. – See punkt on ehk väga virtuaalsuhtluse maailma kuuluv ka täna.
Austa teiste inimeste privaatsust. – Samuti võiks kehtida reaalses maailmas
Ära kuritarvita oma võimu. – Antud käsk on sama oluline reaalses maailmas, kui ta on virtuaalmaailmas.
Andesta teistele nende eksimused. – Isegi kristlased soovitavad seda

Tänaseks on sirgunud mitu põlvkonda, kes on kasvanud koos internetiga ja e-suhtlus on nende igapäevatoiming. Tulenevalt sellest on uutele põlvkondadele reeglid, mis ülal on toodud, enesestmõistetavad. Teisest küljest jälle, kui reaalsus ja virtuaalsus aina lähenevad, siis oleme me täna tunnistajateks olukordadele, kus virtuaalmaailma abil tehakse toiminguid, millede kolimist reaalmaailmast mujale ei pidanud 50 aastat tagasi suurem osa võimatuks. Ei ole tavatu, kui e-kanalit kasutatakse abieluettepaneku tegemiseks, samas ka lahutusotsuse kohale toimetamiseks.

Samas, punktid ei tule ju niisama, seega tagasi teemasse. Valiksin ülaltoodud listist käsu number 8 (Austa teiste inimeste privaatsust).

Olles aastaid olnud paljude serverite haldaja (samuti ka täna), on mul reaalne ligipääs infole, mis ei ole minu jaoks mõeldud. Samuti olen pidanud päris palju tegelema keskkonnaga, kus inimesed soetavad virtuaalkeskkonnas tutvust.

Pean tunnistama, et ma olen leidnud ennast mõtlemast, et kui huvitav oleks tuhnida teise inimeste privaatinfos. Toonitan, et sellised mõtted liikusid minu peas, minu IT-alase elu alguses, kus mul ei olnud isegi internetiühendust.

Hiljem tuli töö ja sellega seoses ka arusaam, et mingile infole on ligipääs nii lihtne, et seda infot tuleb austada – see on lihtsalt eetika küsimus, seda võimu mitte kuritarvitada. Või tõesti, ehk on asi selles, et kui midagi on liiga lihtne, siis pole see ka huvitav. Mine tea 🙂

Kokkuvõtteks võiks tõdeda, et tänaseks on reaal- ja virtuaalmaailma vaheline lõhe palju väiksem, kui ajal, mil ülaltoodud reeglid kirjutati. Ja analüüsides neid reegleid, leiad mina, et ka päriselus kehtivad reeglid sobivad virtuaalmaailmasse ja vastupidi.

Funktsiooni tuletis, logaritmilise diferentseerimise näide

Posted on December 23, 2012 - December 23, 2012 by margusja