ssp.spark.streaming.ml


class ssp.spark.streaming.ml.sentiment_analysis_model.SentimentSparkModel(spark=None, spark_master='spark://IMCHLT276:7077', sentiment_dataset_path='data/dataset/sentiment140/', model_dir='~/ssp/data/model/sentiment/', hdfs_host=None, hdfs_port=None)[source]

Bases: object

Build text classification model for tweet sentiment classification If HDFS details are given, model will be stored in HDFS

Parameters
  • spark – Sparksession

  • spark_master – Spark master URL

  • sentiment_dataset_path – Tweeter Kaggle sentiment dataset path

  • model_dir – Model sava directory

  • hdfs_host – HDFS host url

  • hdfs_port – HDFS port

build_naive_pipeline(input_col='text')[source]
build_ngrams_wocs(inputcol='text', outputcol='target', n=3)[source]
evaluate(model)[source]
predict(df)[source]
prepare_data()[source]
train()[source]