pliers.datasets.fetch_dictionary

pliers.datasets.fetch_dictionary(name, url=None, format=None, index=0, rename=None, save=True, force_retrieve=False)[source]

Retrieve a dictionary of text norms from the web or local storage.

Parameters
  • name (str) – The name of the dictionary. If no url is passed, this must match either one of the keys in the predefined dictionary file (see dictionaries.json), or the name assigned to a previous dictionary retrieved from a specific URL.

  • url (str) – The URL of dictionary file to retrieve. Optional if name matches an existing dictionary.

  • format (str) – One of ‘csv’, ‘tsv’, ‘xls’, or None. Used to read data appropriately. Note that most forms of compression will be detected and handled automatically, so the format string refers only to the format of the decompressed file. When format is None, the format will be inferred from the filename.

  • index (str, int) – The name or numeric index of the column to used as the dictionary index. Passed directly to pd.ix.

  • rename (dict) – An optional dictionary passed to pd.rename(); can be used to rename columns in the loaded dictionary. Note that the locally-saved dictionary will retain the renamed columns.

  • save (bool) – Whether or not to save the dictionary locally the first time it is retrieved.

  • force_retrieve (bool) – If True, remote dictionary will always be downloaded, even if a local copy exists (and the local copy will be overwritten).

Returns: A pandas DataFrame indexed by strings (typically words).