Stackoverflow Exploration¶
Requirements¶
Use stack overflow dataset from https://snap.stanford.edu/data/sx-stackoverflow.html
Create a external table with Hive meta store
Demonstrate sample queries in SparkSQL, ThriftServer and Hive
Implementation¶
A Jupyter Notebook for the task
Explore teh data @ https://archive.org/details/stackexchange
Download the data from https://archive.org/download/stackexchange
Setup data load for XML format