Provider Shootout

The PyEnchant source distribution contains the script “tools/shootout.py”, which can be used to run a comparative test between the different providers available in an enchant installation. The idea is loosely based on the aspell comparison tests run by Kevin Atkinson (http://aspell.net/test/) and used the test data that he provides, but the code is pure Python and written from scratch.

The following table summarises results from three spellchecking providers, all using PyEnchant 3.2.2 and Enchant 2.7.3:

  • aspell: Aspell backend, version 0.60.8.1

  • hunspell: Hunspell backend, version 1.7.2, with dictionaries from SCOWL

  • nuspell: Nuspell backend, version 5.1.4, with the same dictionaries as Hunspell

Provider

EXISTED

SUGGESTED

FIRST

FIRST5

FIRST10

AVG DIST

TIME

aspell

97.7

88.9

57.1

81.0

85.3

1.61

0.5

hunspell

97.7

76.5

54.7

75.0

76.3

0.58

11.8

nuspell

97.7

77.3

55.3

74.8

76.5

0.70

17.2

The statistics were collected on test data containing over 500 US English words and express the following quantities:

  • EXISTED: percentage of correctly-spelled test words that were marked correct by the spellchecker. This tests the provider’s coverage of the language.

  • SUGGESTED: percentage of misspelled test words for which the correct spelling was suggested by the spellchecker. This tests the ability of the provider to guess the correct spelling of a word.

  • FIRST: percentage of misspelled test words for which the correct spelling was the first suggestion made by the spellchecker.

  • FIRST5: percentage of misspelled test words for which the correct spelling was in the first five suggestions made by the spellchecker.

  • FIRST10: percentage of misspelled test words for which the correct spelling was in the first ten suggestions made by the spellchecker.

  • AVG DIST: average position of the correct spelling of a word within the list of suggestions returned by the spellchecker, with zero meaning the word was at the front of the list.

  • TIME: duration of the test, in seconds, averaged over three runs, on an Intel Core i7-860 processor.