Audio (Estonian) to text with Kaldi

https://github.com/alumae/kaldi-offline-transcriber

CentOS release 6.5 (Final) Linux vm38 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[root@h14 ~]# yum groupinstall “Development Tools”
[root@h14 ~]# yum install zlib-devel

[root@h14 ~]# yum install java-1.7.0-openjdk.x86_64

[root@vm38 ~]# yum install ffmpeg

[root@vm38 ~]# yum install sox
[root@vm38 ~]# yum install atlas
[root@vm38 ~]# yum install atlas-devel

[root@vm38 ~]# su – margusja
[margusja@vm38 ~]$ mkdir kaldi
[margusja@vm38 ~]$ cd kaldi/
[margusja@vm38 ~]$ mkdir tool
[margusja@vm38 ~]$ cd tools /
[margusja@vm38 ~]$ svn co svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk Hetkel annab alltoodud probleemi ID-2
[margusja@vm38 tools]$ svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk

svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk // 4xxxx series build
[margusja@vm38 ~]$ cd kaldi-trunk/
[margusja@vm38 ~]$ cd tools/

Downloaded http://sourceforge.net/projects/math-atlas/files/Stable/3.10.0/atlas3.10.0.tar.bz2 and build it – huge work!
[margusja@vm38 ~]$ make – Kuna on vana co siis, Makefile sees olevad viited välistele ressursidele on muutunud, mida tuleb uuendada
[margusja@vm38 tools]$ cd ../src/
[margusja@vm38 ~]$ ./configure
[margusja@vm38 ~]$ make depend
[margusja@vm38 ~]$ make test (optional)
[margusja@vm38 ~]$ make valgrind (optional – memory tests can contain errors – takes long time)
[margusja@vm38 ~]$ make

[root@h14 ~]# wget http://mirror-fpt-telecom.fpt.net/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
[root@h14 ~]# rpm -i epel-release-6-8.noarch.rpm
[root@vm38 ~]# yum install python-pip

[root@vm38 ~]$ CPPFLAGS=”-I/home/margusja/kaldi/tools/kaldi-trunk/tools/openfst/include -L/home/margusja/kaldi/tools/kaldi-trunk/tools/openfst/lib” pip install pyfst

[margusja@vm38 ~]$ cd /home/margusja/kaldi/tools/
[margusja@vm38 tools]$ git clone https://github.com/alumae/kaldi-offline-transcriber.git
[margusja@vm38 tools]$ cd kaldi-offline-transcriber/
[margusja@vm38 kaldi-offline-transcriber]$ curl http://www.phon.ioc.ee/~tanela/kaldi-offline-transcriber-data.tgz | tar xvz

[margusja@vm38 kaldi-offline-transcriber]$ vim Makefile.options // Inside it add a line KALDI_ROOT=/home/margusja/kaldi/tools/kaldi-trunk – whatever where is your path

[margusja@vm38 kaldi-offline-transcriber]$ make .init


Problem ID-1:
sox formats: no handler for file extension `mp3′
Solution:
Convert mp3 to ogg


Problem ID-2:
steps/decode_nnet_cpu.sh –num-threads 1 –skip-scoring true –cmd “$decode_cmd” –nj 1 \
–transform-dir build/trans/test3/tri3b_mmi_pruned/decode \
build/fst/tri3b/graph_prunedlm build/trans/test3 `dirname build/trans/test3/nnet5c1_pruned/decode/log`
steps/decode_nnet_cpu.sh –num-threads 1 –skip-scoring true –cmd run.pl –nj 1 –transform-dir build/trans/test3/tri3b_mmi_pruned/decode build/fst/tri3b/graph_prunedlm build/trans/test3 build/trans/test3/nnet5c1_pruned/decode
steps/decode_nnet_cpu.sh: feature type is lda
steps/decode_nnet_cpu.sh: using transforms from build/trans/test3/tri3b_mmi_pruned/decode
run.pl: job failed, log is in build/trans/test3/nnet5c1_pruned/decode/log/decode.1.log
make: *** [build/trans/test3/nnet5c1_pruned/decode/log] Error 1

Solution:
[margusja@vm38 tools]$ svn co -r 2720 svn://svn.code.sf.net/p/kaldi/code/trunk kaldi-trunk

Problem ID-3
make /build/output/[file].txt annab
EFFECT OPTIONS (effopts): effect dependent; see –help-effect
sox: unrecognized option `–norm’
sox: SoX v14.2.0

Failed: invalid option
Solution hetkel Makefile seest eemaldada –norm võti sox käsult.


Problem ID-4
Decoding done.
(cd build/trans/test2/nnet5c1_pruned; ln -s ../../../fst/tri3b/graph_prunedlm graph)
rm -rf build/trans/test2/nnet5c1_pruned_rescored_main
mkdir -p build/trans/test2/nnet5c1_pruned_rescored_main
(cd build/trans/test2/nnet5c1_pruned_rescored_main; for f in ../../../fst/nnet5c1/*; do ln -s $f; done)
local/lmrescore_lowmem.sh –cmd “$decode_cmd” –mode 1 build/fst/data/prunedlm build/fst/data/mainlm \
build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode || exit 1;
local/lmrescore_lowmem.sh –cmd run.pl –mode 1 build/fst/data/prunedlm build/fst/data/mainlm build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode
run.pl: job failed, log is in build/trans/test2/nnet5c1_pruned_rescored_main/decode/log/rescorelm.JOB.log
queue.pl: probably you forgot to put JOB=1:$nj in your script.
make: *** [build/trans/test2/nnet5c1_pruned_rescored_main/decode/log] Error 1

local/lmrescore_lowmem.sh –cmd utils/run.pl –mode 1 build/fst/data/prunedlm build/fst/data/mainlm build/trans/test2 build/trans/test2/nnet5c1_pruned/decode build/trans/test2/nnet5c1_pruned_rescored_main/decode
run.pl: job failed, log is in build/trans/test2/nnet5c1_pruned_rescored_main/decode/log/rescorelm.JOB.log
queue.pl: probably you forgot to put JOB=1:$nj in your script.


Problem ID-5:
/usr/bin/ld: skipping incompatible /usr/lib/libz.so when searching for -lz

Solution:
[root@h14 ~]# rpm -qif /usr/lib/libz.so
Name : zlib-devel Relocations: (not relocatable)
Version : 1.2.3 Vendor: CentOS
Release : 29.el6 Build Date: Fri 22 Feb 2013 01:01:21 AM EET
Install Date: Fri 14 Mar 2014 10:21:49 AM EET Build Host: c6b9.bsys.dev.centos.org
Group : Development/Libraries Source RPM: zlib-1.2.3-29.el6.src.rpm
Size : 117494 License: zlib and Boost
Signature : RSA/SHA1, Sat 23 Feb 2013 07:53:47 PM EET, Key ID 0946fca2c105b9de
Packager : CentOS BuildSystem <http://bugs.centos.org>
URL : http://www.gzip.org/zlib/
Summary : Header files and libraries for Zlib development
Description :
The zlib-devel package contains the header files and libraries needed
to develop programs that use the zlib compression and decompression
library.
[root@h14 ~]# yum install zlib-devel