Apache Hadoop¶

First would be setting up the ssh, which you can refer here
Having the right version of Java

sudo apt-get install openjdk-8-jdk
java -version # 1.8.x.yyy

Download the hadoop 3.1.2 version
Extract to say `/opt/binaries/hadoop/

    tar xzf hadoop-3.1.2.tar.gz
    mv hadoop-3.1.2.tar.gz hadoop

Following files are needed before you can start the service:

vim ~/.bashrc

export HADOOP_HOME="/opt/binaries/hadoop"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin 
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}

Prepare our HDFS bin/hdfs namenode -format
Start HDFS /opt/binaries/hadoop/sbin/start-dfs.sh. Check the url @ http://localhost:9870
Start Yarn /opt/binaries/hadoop/sbin/start-yarn.sh. Check the url @ localhost:8088
Use the command jps to see the following list of Java containers running:
```
NameNode
DataNode
ResourceManager
NodeManager
SecondaryNameNode
```

References

https://data-flair.training/blogs/installation-of-hadoop-3-on-ubuntu/
https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae