Configuring Hue 2.5 on MapR Hadoop Distribution on

CONFIGURING HUE 2.5.0 ON MAPR HADOOP DISTRIBUTION ON AWS EMR
This is a PDF version of the blog published at http://www.agileiss.com/configuring-hue-on-aws-mapr/
This post summarizes experience of installing and configuring Hue 2.5 on MapR M3 cluster on
Amazon EMR. Our goal was to create a semi-transient cluster on AWS EMR to analyze raw Spotify
data at Universal Music. The cluster would be kept alive until we wouldn’t need it anymore and then
shut down. AWS-Integrated MapR distribution provided only Hive and Pig which is enough for batchprocessing scenarios, but we needed more of a visual environment to specify metadata around raw
text files stored in S3 and be able to run ad-hoc queries. So we needed Hue and all the components
that it assembles together in a visual integrated environment. This step-by-step guide is explaining
how we’ve done it.
Note: We assume that the reader of this guide knows how to launch AWS EMR clusters, how to connect to a master
node via SSH and how to operate in a Linux environment. If any of that is not familiar, please refer to AWS EMR
documentation and online Linux courses to bring yourself up to speed.
STEP 1: LAUNCHING AWS EMR MAPR M3 CLUSTER
From AWS EMR launch a new 3-node MapR M3 cluster without installing Hive and Pig as a bootstrap
action. The reason for this is that we’ll be installing Hive and Pig from MapR packages to make
installation consistent with MapR documentation.
1|P a ge
© 2014. Agile ISS. All rights reserved.
Note: Hive and Pig are still going to be installed and configured automatically during cluster configuration as part of
AMI on the master node, but we’ll ignore them and install newer versions from MapR packages anyways. It’s a bit
messy, since un-selecting Hive and Pig from cluster launch doesn’t make them not installed, but we can leave with it
as it doesn’t seem to cause troubles at the end.
When the cluster gets into Waiting state, in your browser go to https://master-node-public-dns:8453.
Accept all the prompts and you should see MapR Management Console.
STEP 2: GRANTING MYSQL PRIVILEGES
Granting MySQL privileges to hadoop user.
mysql> GRANT ALL PRIVILEGES ON *.* TO ‘hadoop’@'localhost’ WITH GRANT OPTION; \
GRANT ALL PRIVILEGES ON *.* TO ‘hadoop’@'%’ WITH GRANT OPTION; \
SET PASSWORD FOR ‘hadoop’@'localhost’ = PASSWORD(‘hadoop’); \
SET PASSWORD FOR ‘hadoop’@'%’ = PASSWORD(‘hadoop’);
Note: This is the most straightforward and the least secure configuration not suitable for production. Production
environment will require more thought out security.
2|P age
© 2014. Agile ISS. All rights reserved.
STEP 3: MODIFYING CORE-SITE.XML
Add the following properties to the core-site.xml configuration file inside <configuration> tag:
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>[public dns of the master node]</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
STEP 4: MODIFYING MAPRED-SITE.XML
Add the following properties to the mapred-site.xml configuration file inside <configuration> tag:
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
</property>
STEP 5: INSTALLING PIG
Connect to the master node via SSH and run the following command:
$ sudo apt-get update
$ sudo apt-get install mapr-pig
3|P age
© 2014. Agile ISS. All rights reserved.
After installation is completed, run pig command to test that it’s working
STEP 6: INSTALLING HIVE & HIVE METASTORE
1. Connect to the master node via SSH and run the following commands:
$ sudo apt-get install mapr-hive
$ sudo apt-get install mapr-hivemetastore
2. Add the following line to /etc/profile
export HIVE_HOME=/opt/mapr/hive/hive-0.12
3. Refresh /etc/profile
$ source /etc/profile
STEP 7: INSTALLING OOZIE
1. Connect to the master node via SSH and run the following commands:
$ sudo apt-get install mapr-oozie mapr-oozie-internal
2. Create libext directory and copy Hadoop jars in there
$ sudo mkdir /opt/mapr/oozie/oozie-4.0.0/libext
$ cd /opt/mapr/oozie/oozie-4.0.0/libext
$ sudo cp /home/hadoop/*.jar /opt/mapr/oozie/oozie-4.0.0/libext
$ sudo cp /home/hadoop/lib/*.jar /opt/mapr/oozie/oozie-4.0.0/libext
3. Get extensions library into libext directory
$ sudo wget -P /opt/mapr/oozie/oozie-4.0.0/libext http://extjs.com/deploy/ext-2.2.zip
4. Create Oozie WAR file
$ sudo /opt/mapr/oozie/oozie-4.0.0/bin/oozie-setup.sh prepare-war
5. Create Oozie shared library directory in MapRFS
4|P age
© 2014. Agile ISS. All rights reserved.
$ hadoop fs -mkdir /oozie/share
$ hadoop fs -chmod 777 /oozie/share
6. Extracting Oozie shared library and copying it to MapRFS
$ cd /opt/mapr/oozie/oozie-4.0.0/
$ sudo tar xzf oozie-sharelib*.tar.gz
$ hadoop fs -copyFromLocal /opt/mapr/oozie/oozie-4.0.0/share/lib/ /oozie/share
7. Copying Oozie examples to MapRFS
$ hadoop fs -chmod 777 /oozie
$ sudo tar xzf oozie-examples.tar.gz
$ hadoop fs -copyFromLocal /opt/mapr/oozie/oozie-4.0.0/examples/ /oozie
STEP 8: MODIFYING OOZIE-SITE.XML
1. In oozie-site.xml modify property oozie.service.HadoopAccessorService.hadoop.configurations to
look like following:
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/opt/mapr/hadoop/hadoop-0.20.2/conf</value>
</property>
2. Inside <configuration> tag add the following properties
<property>
<name>oozie.service.ProxyUserService.proxyuser.hadoop.hosts</name>
<value>[public dns of the master node]</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
5|P age
© 2014. Agile ISS. All rights reserved.
STEP 9: INSTALLING HUE
1. Connect to the master node via SSH and run the following commands:
$ wget -O /var/tmp/mapr-hue_2.5.0.22919_all.deb http://package.mapr.com/releases/ecosystemall/ubuntu/mapr-hue_2.5.0.22919_all.deb
$ sudo dpkg -i /var/tmp/mapr-hue_2.5.0.22919_all.deb
$ sudo apt-get install -f
Note: Normally we would install Hue just with “sudo apt-get install mapr-hue” command, but at the time of us working
on it, mapr-hue package has been upgraded to Hue 3.5.0 and we couldn’t configure it with the same set of steps
described here, so we decided to publish the guide for Hue 2.5.0 and later figure out how to configure 3.5.0 and
update the guide.
STEP 10: MODIFYING HUE.INI
Modify the following properties in hue.ini file to look like this:
webhdfs_url=http://master-node-public-dns:14000/webhdfs/v1
server_user=hadoop
server_group=hadoop
default_user=hadoop
default_hdfs_superuser=hadoop
hive_home_dir=/opt/mapr/hive/hive-0.12
hive_conf_dir=./opt/mapr/hive/hive-0.12/conf
STEP 11: MODIFYING HIVE-SITE.XML
1. Add the following properties inside <configuration> tag in hive-site.xml file:
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
</property>
6|P age
© 2014. Agile ISS. All rights reserved.
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/oozie/share/lib</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_12?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
STEP 12: INSTALLING HTTPFS AND HIVESERVER2
Connect to the master node via SSH and run the following commands:
$ sudo apt-get install mapr-httpfs
$ sudo apt-get install mapr-hiveserver2
7|P age
© 2014. Agile ISS. All rights reserved.
STEP 13: COPYING HUE PLUGINS
Connect to the master node via SSH and run the following command:
$ sudo cp -f =/opt/mapr/hue/hue-2.5.0/desktop/libs/hadoop/java-lib/hue-plugins-*.jar
/opt/mapr/hadoop/hadoop-0.20.2/lib/
STEP 14: RESTARTING SERVICES AND TESTING
1. Restart JobTracker
$ maprcli node services -jobtracker restart -nodes [master node public DNS]
2. Check that Thrift plugins are running
$ tail –lines=500 /opt/mapr/hadoop/hadoop*/logs/*jobtracker*.log|grep ThriftPlugin
If response looks similar to this, the Thrift plugins are running fine
2014-05-17 20:05:21,953 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Starting Thrift server
2014-05-17 20:05:21,963 INFO org.apache.hadoop.thriftfs.ThriftPluginServer: Thrift server listening
on 0.0.0.0:9290
3. Confirming that JobTracker is started and can connect through Thrift plugin port
$ lsof -i:9290
The response should look similar to this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java
6280 hadoop 118u IPv4 4024466
0t0 TCP *:9290 (LISTEN)
4. Restarting warden
$ sudo service mapr-warden stop
$ sudo service mapr-warden start
8|P age
© 2014. Agile ISS. All rights reserved.
5. Confirming that Oozie is working
$ sudo /opt/mapr/oozie/oozie-4.0.0/bin/oozie admin -oozie http://localhost:11000/oozie-status
The response should be:
System mode: NORMAL
You can also go in your browser to http://master-node-public-dns:11000 to check that Oozie Web
Console is running
9|P age
© 2014. Agile ISS. All rights reserved.
6. In your browser go to http://master-node-public-dns:8888 and login with hadoop as username and
mapr as password
The message should say “All OK. Configuration check passed”. If something is wrong with the
configuration, the message will point out to the problems with the configuration.
This concludes the installation and configuration of Hue 2.5.0 on MapR M3 Hadoop distribution on
Amazon EMR. In the next posts we’ll publish a bootstrap script for fully automating this installation and
configuration. Stay tuned!
Resources:
1. http://doc.mapr.com
2. http://www.hadoopinrealworld.com/building-running-and-testing-apache-oozie-4-0-0-2/
10 | P a g e
© 2014. Agile ISS. All rights reserved.