pliers.extractors.merge_results

pliers.extractors.merge_results(results, format='wide', timing=True, metadata=True, extractor_names=True, object_id=True, extractor_params=False, aggfunc=None, invalid_results='ignore', **to_df_kwargs)[source]

Merges a list of ExtractorResults instances and returns a pandas DF.

Parameters
  • results (list, tuple) – A list of ExtractorResult instances to merge.

  • format (str) – Format to return the data in. Can be either ‘wide’ or ‘long’. In the wide case, every extracted feature is a column, and every Stim is a row. In the long case, every row contains a single Stim/Extractor/feature combination.

  • timing (bool, str) – Whether or not to include columns for onset, order, and duration.

  • metadata (bool) – if True, includes Stim metadata columns in the returned DataFrame. These columns include ‘stim_name’, ‘class’, ‘filename’, ‘history’, and ‘source_file’. Note that these values are often long strings, so the returned DF will be considerably larger.

  • extractor_names (str, bool) –

    How to handle extractor names when returning results. The specific behavior depends on whether format is ‘long’ or ‘wide’. Valid values include:

    • ’prepend’ or True: In both ‘long’ and ‘wide’ formats, feature names will be prepended with the Extractor name (e.g., “FaceExtractor#face_likelihood”).

    • ’drop’ or False: In both ‘long’ and ‘wide’ formats, extractor names will be omitted entirely from the result. Note that this can create feature name conflicts when merging results from multiple Extractors, so is generally discouraged.

    • ’column’: In ‘long’ format, extractor name will be included as a separate column. Not valid for ‘wide’ format (and will raise an error).

    • ’multi’: In ‘wide’ format, a MultiIndex will be used for the columns, with the first level of the index containing the Extractor name and the second level containing the feature name. This value is invalid if format=’long’ (and will raise and error).

  • object_id (bool) – If True, attempts to intelligently add an ‘object_id’ column that differentiates between multiple objects in the results that may share onsets/orders/durations (and would otherwise be impossible to distinguish). This frequently occurs for ImageExtractors that identify multiple target objects (e.g., faces) within a single ImageStim. Default is ‘auto’, which includes the ‘object_id’ column if and only if it has a non-constant value.

  • extractor_params (bool) – If True, returns serialized extractor_params of the extractor, i.e. log_attributes at time of extraction. If format=’wide’, merge_results returns one column per extractor, each named ExtractorName#FeatureName#extractor_params. If format=’long’, returns only one column named extractor_params.

  • aggfunc (str, Callable) – If format=’wide’ and extractor_names=’drop’, it’s possible for name clashes between features to occur. In such cases, the aggfunc argument is passed onto pandas’ pivot_table function, and specifies how to aggregate multiple values for the same index. Can be a callable or any string value recognized by pandas. By default (None), ‘mean’ will be used for numeric columns and ‘first’ will be used for object/categorical columns.

  • invalid_results (str) –

    Specifies desired action for treating elements of the passed in results argument that are not ExtractorResult objects. Valid values include:

    • ’ignore’ will ignore them and merge the valid

      ExtractorResults.

    • ’fail’ will raise an exception on any invalid input

Returns: a pandas DataFrame. For format details, see ‘format’ argument.