# Stackoverflow Exploration ## Requirements - Use stack overflow dataset from [https://snap.stanford.edu/data/sx-stackoverflow.html](https://snap.stanford.edu/data/sx-stackoverflow.html) - Create a external table with Hive meta store - Demonstrate sample queries in SparkSQL, ThriftServer and Hive ------------------------------------------------------------------------------------------------------------------------ ## Implementation - A **[Jupyter Notebook](https://github.com/gyan42/spark-streaming-playground/tree/master/notebooks/stackoverflow/DataAnalysis.ipynb)** for the task - Explore teh data @ [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange) - Download the data from [https://archive.org/download/stackexchange](https://archive.org/download/stackexchange) - Setup data load for XML format