ssp.spark.streaming.consumer¶
SSP modules that handles all data at ingestion level from Twitter stream
- 
class 
ssp.spark.streaming.consumer.twiteer_stream_consumer.TwitterDataset(kafka_bootstrap_servers='localhost:9092', kafka_topic='mix_tweets_topic', checkpoint_dir='hdfs://localhost:9000/tmp/ssp/data/lake/checkpoint/', bronze_parquet_dir='hdfs://localhost:9000/tmp/ssp/data/lake/bronze/', warehouse_location='/opt/spark-warehouse/', spark_master='spark://IMCHLT276:7077', postgresql_host='localhost', postgresql_port='5432', postgresql_database='sparkstreamingdb', postgresql_user='sparkstreaming', postgresql_password='sparkstreaming', raw_tweet_table_name_prefix='raw_tweet_dataset', processing_time='5 seconds')[source]¶ Bases:
ssp.spark.streaming.common.twitter_streamer_base.TwitterStreamerBaseTwitter ingestion class
Starts the Spark Structured Streaming against the Kafka topic and dumps the data to HDFS / Postgresql
- Parameters
 kafka_bootstrap_servers – (str) Kafka bootstram server
kafka_topic – (str) Kafka topic
checkpoint_dir – (str) Checkpoint directory for fault tolerance
bronze_parquet_dir – (str) Store path to dump tweet data
warehouse_location – Spark warehouse location
spark_master – Spark Master URL
postgresql_host – Postgresql host
postgresql_port – Postgresql port
postgresql_database – Postgresql Database
postgresql_user – Postgresql user name
postgresql_password – Postgresql user password
raw_tweet_table_name_prefix – Postgresql table name prefix
processing_time – Processing time trigger
- 
ssp.spark.streaming.consumer.twiteer_stream_consumer.check_n_mk_dirs(path, is_remove=False)[source]¶ 
- 
ssp.spark.streaming.consumer.twiteer_stream_consumer.check_table_exists(table_name, schema='public')[source]¶ 
- 
ssp.spark.streaming.consumer.twiteer_stream_consumer.filter_possible_ai_tweet_udf(text)¶ 
- 
ssp.spark.streaming.consumer.twiteer_stream_consumer.get_raw_dataset(path)[source]¶ Combines all parquet file as one in given path :param path: Folder path :return:
SSP modules that handles all data at ingestion level from Twitter stream