ssp.dl.tf.classifier


class ssp.dl.tf.classifier.naive_text_classifier.NaiveTextClassifier(train_df_or_path=None, test_df_or_path=None, dev_df_or_path=None, model_root_dir=None, model_version=1, wipe_old_data=False, text_column='text', label_column='label', num_words=8000, seq_len=128, embedding_size=64, batch_size=64, hdfs_host=None, hdfs_port=None)[source]

Bases: object

Trains simple DL model with embedding and feed forward network HDFS will be considered for storage if HDFS host and port are given

Parameters
  • train_df_or_path – Train data pandas dataframe or path

  • test_df_or_path – Test data pandas dataframe or path

  • dev_df_or_path – Dev data pandas dataframe or path

  • model_root_dir – Local directory path or HDFS path

  • wipe_old_data – Clean old model data

  • text_column – Name of the text column

  • label_column – Name of the label colum

  • num_words – Vocab size

  • seq_len – Max length of the sequence

  • embedding_size – Embedding size

  • batch_size – Train batch size

  • hdfs_host – HDFS host

  • hdfs_port – HDFS port

define_model()[source]
evaluate()[source]
export_tf_model(model, export_path)[source]
fit_tokenizer()[source]
load()[source]
load_model()[source]

Loads models Loads from HDFS if host and port number are available or local file system is used.

Returns

static load_parquet_data(train_file_path, test_file_path, dev_file_path)[source]
load_tokenizer(tokenizer_path=None)[source]

Loads model and tokenizer. Loads from HDFS if host and port number are available or local file system is used.

Returns

predict(X)[source]
preprocess_train_data()[source]
save()[source]
train()[source]
transform(text_list)[source]