ssp.spark.streaming.nlp¶

class ssp.spark.streaming.nlp.ner_extraction.NerExtraction(kafka_bootstrap_servers='localhost:9092', kafka_topic='ai_tweets_topic', checkpoint_dir='hdfs://localhost:9000/tmp/ssp/data/lake/checkpoint/', bronze_parquet_dir='hdfs://localhost:9000/tmp/ssp/data/lake/bronze/', warehouse_location='/opt/spark-warehouse/', spark_master='spark://IMCHLT276:7077', postgresql_host='localhost', postgresql_port='5432', postgresql_database='sparkstreamingdb', postgresql_user='sparkstreaming', postgresql_password='sparkstreaming', processing_time='5 seconds', is_live_stream=True, is_docker=False)[source]¶

Bases: ssp.spark.streaming.common.twitter_streamer_base.TwitterStreamerBase

Uses external REST endpoint to get NER tags

Parameters

kafka_bootstrap_servers – (str) host_url:port
kafka_topic – (str) Live stream Kafka topic
checkpoint_dir – (str) Spark Streaming checkpoint directory
bronze_parquet_dir – (str) Input stream directory path. For local paths prefix it with “file///”
warehouse_location – (str) Spark warehouse location
spark_master – (str) Spark master url
postgresql_host – (str) Postgresql host url
postgresql_port – (str) Postgres port
postgresql_database – (str) Database name
postgresql_user – (str) Postgresql user name
postgresql_password – (str) Postgresql user password
processing_time – (str) Spark Streaming process interval
is_live_stream – (bool) Use live stream or to use streamed directory as input
is_docker – (bool) Run environment local machine or docker, to use appropriate host name in REST endpoints

hdfs_process()[source]¶

online_process()[source]¶

process()[source]¶

class ssp.spark.streaming.nlp.spark_dl_text_classification.SreamingTextClassifier(kafka_bootstrap_servers='localhost:9092', kafka_topic='ai_tweets_topic', checkpoint_dir='hdfs://localhost:9000/tmp/ssp/data/lake/checkpoint/', bronze_parquet_dir='hdfs://localhost:9000/tmp/ssp/data/lake/bronze/', warehouse_location='/opt/spark-warehouse/', spark_master='spark://IMCHLT276:7077', postgresql_host='localhost', postgresql_port='5432', postgresql_database='sparkstreamingdb', postgresql_user='sparkstreaming', postgresql_password='sparkstreaming', processing_time='5 seconds', tokenizer_path=<object object>, is_live_stream=True, is_docker=False)[source]¶

Bases: ssp.spark.streaming.common.twitter_streamer_base.TwitterStreamerBase

Classifies the incoming tweet text using the DL model build using Tensorflow serving

Parameters

kafka_bootstrap_servers – (str) host_url:port
kafka_topic – (str) Live stream Kafka topic
checkpoint_dir – (str) Spark Streaming checkpoint directory
bronze_parquet_dir – (str) Input stream directory path. For local paths prefix it with “file///”
warehouse_location – (str) Spark warehouse location
spark_master – (str) Spark master url
postgresql_host – (str) Postgresql host url
postgresql_port – (str) Postgres port
postgresql_database – (str) Database name
postgresql_user – (str) Postgresql user name
postgresql_password – (str) Postgresql user password
processing_time – (str) Spark Streaming process interval
is_live_stream – (bool) Use live stream or to use streamed directory as input
is_docker – (bool) Run environment local machine or docker, to use appropriate host name in REST endpoints
tokenizer_path – Keras tokenizer store / saved path

hdfs_process()[source]¶

online_process()[source]¶

process()[source]¶