Apache HadoopΒΆ
First would be setting up the
ssh
, which you can refer hereHaving the right version of Java
sudo apt-get install openjdk-8-jdk
java -version # 1.8.x.yyy
Download the hadoop 3.1.2 version
Extract to say `/opt/binaries/hadoop/
tar xzf hadoop-3.1.2.tar.gz
mv hadoop-3.1.2.tar.gz hadoop
Following files are needed before you can start the service:
vim ~/.bashrc
export HADOOP_HOME="/opt/binaries/hadoop" export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME}
Prepare our HDFS
bin/hdfs namenode -format
Start HDFS
/opt/binaries/hadoop/sbin/start-dfs.sh
. Check the url @ http://localhost:9870Start Yarn
/opt/binaries/hadoop/sbin/start-yarn.sh
. Check the url @ localhost:8088Use the command
jps
to see the following list of Java containers running:NameNode DataNode ResourceManager NodeManager SecondaryNameNode
References
https://data-flair.training/blogs/installation-of-hadoop-3-on-ubuntu/
https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae