ssp.snorkel¶

class ssp.snorkel.labelling_function.SSPLabelEvaluator(text_column='text', label_column='label', raw_tweet_table_name_prefix='raw_tweet_dataset', postgresql_host='localhost', postgresql_port='5432', postgresql_database='sparkstreamingdb', postgresql_user='sparkstreaming', postgresql_password='sparkstreaming')[source]¶

Bases: ssp.posgress.dataset_base.PostgresqlDatasetBase

run_labeler(version=0)[source]¶

class ssp.snorkel.labelling_function.SSPTweetLabeller(input_col='text', output_col='slabel')[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Snorkel Transformer uses LFs to train a Label Model, that can annotate AI text and non AI text :param input_col: Name of the input text column if Dataframe is used :param output_col: Name of the ouput label column if Dataframe is used

ABSTAIN = -1¶

NEGATIVE = 0¶

POSITIVE = 1¶

static bigram_check(x, word1, word2)[source]¶

evaluate(X, y)[source]¶

fit(X, y=None)[source]¶

Parameters

X – (Dataframe) / (List) Input text
y – None

Returns

Numpy Array [num of samples, num of LF functions]

is_ai_tweet = LabelingFunction is_ai_tweet, Preprocessors: []¶

is_not_ai_tweet = LabelingFunction is_not_ai_tweet, Preprocessors: []¶

static negative_search(data, positive_keywords, false_positive_keywords)[source]¶

normalize_prob(res)[source]¶

not_ai = LabelingFunction not_ai, Preprocessors: []¶

not_big_data = LabelingFunction not_big_data, Preprocessors: []¶

not_cv = LabelingFunction not_cv, Preprocessors: []¶

not_data_science = LabelingFunction not_data_science, Preprocessors: []¶

not_neural_network = LabelingFunction not_neural_network, Preprocessors: []¶

not_nlp = LabelingFunction not_nlp, Preprocessors: []¶

static positive_search(data, key_words)[source]¶

predict(X)[source]¶

transform(X, y=None)[source]¶