reverend package¶
Submodules¶
reverend.thomas module¶
-
class
reverend.thomas.
Bayes
(tokenizer=None, combiner=None, data_class=None, training_data=None)[source]¶ Bases:
object
-
get_tokens
(obj)[source]¶ By default, we expect obj to be a screen and split on whitespace.
Note that this does not change the case. In some applications you may want to lowecase everthing so that “king” and “King” generate the same token.
Override this in your subclass for objects other than text.
Alternatively, you can pass in a tokenizer as part of instance creation.
-
guess
(message)[source]¶ Guess which buckets the message belongs to.
Parameters: - message (str) – The message string to tokenize and subsequently
- classify. –
Returns: List of tuple pairs indicating which bucket(s) the message string is guessed to be classified under, and the ratio of certainty for this guess. As an example, a 99% probability that the input is a
fowl
would look like[('fowl', 0.9999)]
.Return type: list of tuple
-
load
(file_path='bayesdata.dat')[source]¶ Load trained model data from a file path.
Parameters: file_path (str) – Path of database file.
-
load_handler
(file_handler)[source]¶ Load trained model data from an open file handler.
Parameters: file_handler (file) – Open file pointer, or file-like object.
-
merge_pools
(dest_pool, source_pool)[source]¶ Merge an existing pool into another.
The data from source_pool is merged into dest_pool. The arguments are the names of the pools to be merged. The pool named source_pool is left in tact and you may want to call remove_pool() to get rid of it.
-
pool_names
()[source]¶ Return a sorted list of Pool names.
Does not include the system pool ‘__Corpus__’.
-
static
robinson
(probs, _)[source]¶ Computes the probability of a message being spam (Robinson’s method) P = 1 - prod(1-p)^(1/n) Q = 1 - prod(p)^(1/n) S = (1 + (P-Q)/(P+Q)) / 2 Courtesy of http://christophe.delord.free.fr/en/index.html
-
static
robinson_fisher
(probs, _)[source]¶ Computes the probability of a message being spam (Robinson-Fisher method) H = C-1( -2.ln(prod(p)), 2*n ) S = C-1( -2.ln(prod(1-p)), 2*n ) I = (1 + H - S) / 2 Courtesy of http://christophe.delord.free.fr/en/index.html
-
save
(file_path='bayesdata.dat')[source]¶ Save the trained model to the appropriate path.
Parameters: file_path (str) – Path of database file.
-
save_handler
(file_handler)[source]¶ Save the trained model to the open file handler.
Parameters: file_handler (file) – Open file pointer, or file-like object.
-