enchant.checker: High-level spellchecking functionality
This package is designed to host higher-level spellchecking functionality than is available in the base enchant package. It should make writing applications that follow common usage idioms significantly easier.
The most useful class is SpellChecker
, which implements a spellchecking
loop over a block of text. It is capable of modifying the text in-place
if given an array of characters to work with.
This package also contains several interfaces to the SpellChecker class, such as a wxPython GUI dialog and a command-line interface.
- class enchant.checker.SpellChecker(lang: Dict | str | None = None, text: str | None = None, tokenize: Type[tokenize] | Filter | None = None, chunkers: List[Chunker] | None = None, filters: List[Filter] | None = None)
Class implementing stateful spellchecking behaviour.
This class is designed to implement a spell-checking loop over a block of text, correcting/ignoring/replacing words as required. This loop is implemented using an iterator paradigm so it can be embedded inside other loops of control.
The SpellChecker object is stateful, and the appropriate methods must be called to alter its state and affect the progress of the spell checking session. At any point during the checking session, the attribute
word
will hold the current erroneously spelled word under consideration. The action to take on this word is determined by calling methods such asreplace()
,replace_always()
andignore_always()
. Once this is done, callingnext()
advances to the next misspelled word.As a quick (and rather silly) example, the following code replaces each misspelled word with the string “SPAM”:
>>> text = "This is sme text with a fw speling errors in it." >>> chkr = SpellChecker("en_US",text) >>> for err in chkr: ... err.replace("SPAM") ... >>> chkr.get_text() 'This is SPAM text with a SPAM SPAM errors in it.' >>>
Internally, the SpellChecker always works with arrays of (possibly unicode) character elements. This allows the in-place modification of the string as it is checked, and is the closest thing Python has to a mutable string. The text can be set as any of a normal string, unicode string, character array or unicode character array. The
get_text()
method will return the modified array object if an array is used, or a new string object if a string it used.Words input to the SpellChecker may be either plain strings or unicode objects. They will be converted to the same type as the text being checked, using python’s default encoding/decoding settings.
If using an array of characters with this object and the array is modified outside of the spellchecking loop, use the method
set_offset()
to reposition the internal loop pointer to make sure it doesn’t skip any words.- add(word: str | None = None) None
Add given word to the personal word list.
If no word is given, the current erroneous word is added.
- add_to_personal(word: str | None = None) None
Add given word to the personal word list.
If no word is given, the current erroneous word is added.
- check(word: str) bool
Check correctness of the given word.
- coerce_string(text: str, enc: str | None = None) str
Coerce string into the required type.
This method can be used to automatically ensure that strings are of the correct type required by this checker - either unicode or standard. If there is a mismatch, conversion is done using python’s default encoding unless another encoding is specified.
- get_text() str
Return the spell-checked text.
- ignore_always(word: str | None = None) None
Add given word to list of words to ignore.
If no word is given, the current erroneous word is added.
- leading_context(chars: int) str
Get chars characters of leading context.
This method returns up to chars characters of leading context - the text that occurs in the string immediately before the current erroneous word.
- next() SpellChecker
Process text up to the next spelling error.
This method is designed to support the iterator protocol. Each time it is called, it will advance the
word
attribute to the next spelling error in the text. When no more errors are found, it will raiseStopIteration
.The method will always return self, so that it can be used sensibly in common idioms such as:
for err in checker: err.do_something()
- replace(repl: str) None
Replace the current erroneous word with the given string.
- replace_always(word: str, repl: str | None = None) None
Always replace given word with given replacement.
If a single argument is given, this is used to replace the current erroneous word. If two arguments are given, that combination is added to the list for future use.
- set_offset(off: int, whence: int = 0) None
Set the offset of the tokenization routine.
For more details on the purpose of the tokenization offset, see the documentation of the module
enchant.tokenize
. The optional argument whence indicates the method by which to change the offset:0 (the default) treats off as an increment
1 treats off as a distance from the start
2 treats off as a distance from the end
- set_text(text: str) None
Set the text to be spell-checked.
This method must be called, or the text argument supplied to the constructor, before calling the method
next()
.
- suggest(word: str | None = None) List[str]
Return suggested spellings for the given word.
If no word is given, the current erroneous word is used.
- trailing_context(chars: int) str
Get chars characters of trailing context.
This method returns up to chars characters of trailing context - the text that occurs in the string immediately after the current erroneous word.
- wants_unicode() bool
Check whether the checker wants unicode strings.
This method will return True if the checker wants unicode strings as input, False if it wants normal strings. It’s important to provide the correct type of string to the checker.