pliers.filters.WordStemmingFilter

class pliers.filters.WordStemmingFilter(stemmer='porter', tokenize=True, case_sensitive=False, *args, **kwargs)[source]

Bases: TextFilter

Nltk-based word stemming and lemmatization Filter.

Parameters
  • stemmer (str, Stemmer) – If a string, must be the name of one of the stemming and lemmatization modules available in nltk.stem. Valid values are ‘porter’, ‘snowball’, ‘isri’, ‘lancaster’, ‘regexp’, ‘wordnet’, or ‘rslp’. Alternatively, an initialized nltk StemmerI instance can be passed.

  • tokenize (bool) – if True, tokenize using nltk.word_tokenize and apply stemmer/lemmatizer to each token. If False, do not tokenize before stemming/lemmatizing.

  • case_sensitive (bool) – if False (default), input is lower-cased before stemming or lemmatizing.

  • args – Optional positional and keyword args passed onto the nltk stemmer/lemmatizer.

  • kwargs – Optional positional and keyword args passed onto the nltk stemmer/lemmatizer.

__init__(stemmer='porter', tokenize=True, case_sensitive=False, *args, **kwargs)[source]
transform(stims, validation='strict', *args, **kwargs)

Executes the transformation on the passed stim(s).

Parameters
  • stims (str, Stim, list) –

    One or more stimuli to process. Must be one of:

    • A string giving the path to a file that can be read in as a Stim (e.g., a .txt file, .jpg image, etc.)

    • A Stim instance of any type.

    • An iterable of stims, where each element is either a string or a Stim.

  • validation (str) –

    String specifying how validation errors should be handled. Must be one of:

    • ’strict’: Raise an exception on any validation error

    • ’warn’: Issue a warning for all validation errors

    • ’loose’: Silently ignore all validation errors

  • args – Optional positional arguments to pass onto the internal _transform call.

  • kwargs – Optional positional arguments to pass onto the internal _transform call.