pliers.extractors.WordEmbeddingExtractor

class pliers.extractors.WordEmbeddingExtractor(embedding_file, binary=False, prefix='embedding_dim', unk_vector=None)[source]

Bases: TextExtractor

An extractor that uses a word embedding file to look up embedding vectors for text.

Parameters
  • embedding_file (str) – Path to a word embedding file. Assumed to be in word2vec format compatible with gensim.

  • binary (bool) – Flag indicating whether embedding file is saved in a binary format.

  • prefix (str) – Prefix for feature names in the ExtractorResult.

  • unk_vector (numpy array or str) – Default vector to use for texts not found in the embedding file. If None is specified, uses a vector with all zeros. If ‘random’ is specified, uses a vector with random values between -1.0 and 1.0. Must have the same dimensions as the embeddings.

__init__(embedding_file, binary=False, prefix='embedding_dim', unk_vector=None)[source]
transform(stim, *args, **kwargs)

Executes the transformation on the passed stim(s).

Parameters
  • stims (str, Stim, list) –

    One or more stimuli to process. Must be one of:

    • A string giving the path to a file that can be read in as a Stim (e.g., a .txt file, .jpg image, etc.)

    • A Stim instance of any type.

    • An iterable of stims, where each element is either a string or a Stim.

  • validation (str) –

    String specifying how validation errors should be handled. Must be one of:

    • ’strict’: Raise an exception on any validation error

    • ’warn’: Issue a warning for all validation errors

    • ’loose’: Silently ignore all validation errors

  • args – Optional positional arguments to pass onto the internal _transform call.

  • kwargs – Optional positional arguments to pass onto the internal _transform call.