Create a system for auto-applying spelling fixes
Why didn't I do something like this years ago?
This commit is contained in:
parent
8f663d2ce8
commit
dc1e13ee63
221
config.org
221
config.org
|
@ -4084,6 +4084,227 @@ tweaks.
|
|||
(advice-add 'jinx-next :after (lambda (_) (left-word))))
|
||||
#+end_src
|
||||
|
||||
**** Saving corrections as abbreviations
|
||||
|
||||
#+call: confpkg(after="jinx")
|
||||
|
||||
I think it would be neat to be able to save persistent misspellings as
|
||||
abbreviations. I'm not the first to have this neat idea, as the jinx wiki has a
|
||||
for doing exactly this.
|
||||
|
||||
However, I want to do something a bit more persistent. For starters, let's write
|
||||
corrections to a file.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defvar spelling-correction-history-file
|
||||
(file-name-concat (or (getenv "XDG_STATE_HOMEE") "~/.local/state")
|
||||
"emacs" "spelling-corrections.txt")
|
||||
"File where a spell check record will be saved.")
|
||||
#+end_src
|
||||
|
||||
For simplicity of operation, I think we can just append each correction the file
|
||||
as =<misspelled> <corrected>= lines. In the Emacs session though, I think we'll
|
||||
want to have a hash table of the counts of each correction. We can have the
|
||||
misspelled words as the keys, and then have each value be an alist of
|
||||
src_elisp{(correction . count)} pairs. This table can be lazily built and
|
||||
processed after startup.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defvar spelling-correction-table (make-hash-table :test #'equal))
|
||||
#+end_src
|
||||
|
||||
We probably want to also specify a threshold number of misspellings that trigger
|
||||
entry to the abbrev table, both on load and when made during the current Emacs
|
||||
session. For now, I'll try a value of three for on-load and two for misspellings
|
||||
made in the current Emacs session. I think I want to avoid a value of one since
|
||||
that makes it easy for a misspelling with multiple valid corrections to become
|
||||
associated with a single correction too soon. This is a rare concern, but it
|
||||
would be annoying enough to run into that I think it's worth requiring a second
|
||||
misspelling.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defvar spelling-correction-history-abbrev-threshold 3
|
||||
"The number of recorded identical misspellings to create an abbrev.
|
||||
This applies to misspellings read from the history file")
|
||||
(defvar spelling-correction-live-abbrev-threshold 2
|
||||
"The number of identical misspellings to create an abbrev.
|
||||
This applies to misspellings made in the current Emacs session.")
|
||||
#+end_src
|
||||
|
||||
At this point we need to actually implement this functionality, starting with
|
||||
updating the table when a correction is either read from the history file or
|
||||
occurs live.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defun spelling-correction-update-table (misspelling corrected)
|
||||
"Update the MISPELLING to CORRECTED entry in the table.
|
||||
Returns the number of times this correction has occurred."
|
||||
(if-let ((correction-counts
|
||||
(gethash misspelling spelling-correction-table)))
|
||||
(if-let ((record-cons (assoc corrected correction-counts)))
|
||||
(setcdr record-cons (1+ (cdr record-cons)))
|
||||
(puthash misspelling
|
||||
(push (cons corrected 1) correction-counts)
|
||||
spelling-correction-table)
|
||||
1)
|
||||
(puthash misspelling
|
||||
(list (cons corrected 1))
|
||||
spelling-correction-table)
|
||||
1))
|
||||
#+end_src
|
||||
|
||||
We could call ~define-abbrev~ directly, but since we'll be doing so in multiple
|
||||
places, I think it's nice to have a single place where the abbrev table so any
|
||||
changes to the abbrev table (or similar) only need to be made in one place.
|
||||
|
||||
We could use the global abbrev table, but I'd rather have one dedicated to
|
||||
spelling corrections. Let's manage this entirely separately to the global abbrev
|
||||
file too.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defvar spelling-correction-abbrev-file
|
||||
(file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
|
||||
"emacs" "spelling-abbrevs.el")
|
||||
"File to save spell check records in.")
|
||||
|
||||
(defvar spelling-correction-abbrev-table nil
|
||||
"The spelling abbrev table.")
|
||||
|
||||
(defun spelling-correction-setup-abbrevs ()
|
||||
"Setup `spelling-correction-abbrev-table'.
|
||||
Also set it as a parent of `global-abbrev-table'."
|
||||
(unless spelling-correction-abbrev-table
|
||||
(setq spelling-correction-abbrev-table (make-abbrev-table))
|
||||
(abbrev-table-put
|
||||
global-abbrev-table :parents
|
||||
(cons spelling-correction-abbrev-table
|
||||
(abbrev-table-get global-abbrev-table :parents)))
|
||||
(add-hook 'kill-emacs-hook #'spelling-correction-save-abbrevs))
|
||||
(when (file-exists-p spelling-correction-abbrev-file)
|
||||
(read-abbrev-file spelling-correction-abbrev-file t)))
|
||||
|
||||
(defun spelling-correction-save-abbrevs ()
|
||||
"Write `spelling-correction-abbrev-table'."
|
||||
(unless (file-exists-p spelling-correction-abbrev-file)
|
||||
(make-directory (file-name-directory spelling-correction-abbrev-file) t))
|
||||
(let ((coding-system-for-write 'utf-8))
|
||||
(with-temp-buffer
|
||||
(insert-abbrev-table-description 'spelling-correction-abbrev-table nil)
|
||||
(when (unencodable-char-position (point-min) (point-max) 'utf-8)
|
||||
(setq coding-system-for-write 'utf-8-emacs))
|
||||
(goto-char (point-min))
|
||||
(insert (format ";;-*-coding: %s;-*-\n\n" coding-system-for-write))
|
||||
(write-region nil nil spelling-correction-abbrev-file 0))))
|
||||
#+end_src
|
||||
|
||||
Now we can write the update function that's run on a live spelling correction.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defun record-spelling-correction (misspelling corrected)
|
||||
"Record the correction of MISPELLING to CORRECTED."
|
||||
(let ((write-region-inhibit-fsync t) ; Quicker writes
|
||||
(coding-system-for-write 'utf-8)
|
||||
(inhibit-message t))
|
||||
(write-region
|
||||
(concat misspelling " " corrected "\n")
|
||||
nil
|
||||
spelling-correction-history-file
|
||||
t))
|
||||
(when (and (>= (spelling-correction-update-table misspelling corrected)
|
||||
spelling-correction-live-abbrev-threshold)
|
||||
(= (length (gethash misspelling spelling-correction-table))
|
||||
1))
|
||||
(define-abbrev spelling-correction-abbrev-table misspelling corrected)
|
||||
(message "Created new abbreviation: %s ⟶ %s" misspelling corrected)))
|
||||
#+end_src
|
||||
|
||||
The only thing left to be done now is load the history file. I think I'd like to
|
||||
split the actual reading and the abbrev generation into two parts though.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defun spelling-correction-read-history ()
|
||||
"Read the history file into the correction table."
|
||||
(if (file-exists-p spelling-correction-history-file)
|
||||
(with-temp-buffer
|
||||
(insert-file-contents spelling-correction-history-file)
|
||||
(goto-char (point-min))
|
||||
(while (< (point) (point-max))
|
||||
(let ((pt (point))
|
||||
misspelling corrected)
|
||||
(setq misspelling
|
||||
(and (forward-word)
|
||||
(buffer-substring pt (point)))
|
||||
pt (1+ (point)))
|
||||
(setq corrected
|
||||
(and (forward-word)
|
||||
(buffer-substring pt (point)))
|
||||
pt (point))
|
||||
(when (and misspelling corrected)
|
||||
(spelling-correction-update-table misspelling corrected))
|
||||
(forward-line 1))))
|
||||
(make-directory (file-name-directory spelling-correction-history-file))
|
||||
(write-region "" nil spelling-correction-history-file)))
|
||||
|
||||
(defun spelling-correction-remove-invalid-abbrevs ()
|
||||
"Ensure that all entries of the abbrev table are valid."
|
||||
(obarray-map
|
||||
(lambda (misspelling)
|
||||
(when (stringp misspelling) ; Abbrev's obarrays start with a symbol
|
||||
(let ((corrections (gethash misspelling spelling-correction-table)))
|
||||
(unless (and (= (length corrections) 1)
|
||||
(>= (cdar corrections)
|
||||
spelling-correction-history-abbrev-threshold))
|
||||
(define-abbrev spelling-correction-abbrev-table misspelling nil)))))
|
||||
spelling-correction-abbrev-table))
|
||||
|
||||
(defun spelling-correction-create-history-abbrevs ()
|
||||
"Apply the history threshold to the current correction table."
|
||||
(maphash
|
||||
(lambda (misspelling corrections)
|
||||
(when (and (= (length corrections) 1)
|
||||
(>= (cdar corrections)
|
||||
spelling-correction-history-abbrev-threshold))
|
||||
(unless (obarray-get spelling-correction-abbrev-table misspelling)
|
||||
(define-abbrev spelling-correction-abbrev-table
|
||||
misspelling (caar corrections)))))
|
||||
spelling-correction-table))
|
||||
|
||||
(defun spelling-correction-load-history ()
|
||||
"Read and process the history file into abbrevs."
|
||||
(spelling-correction-read-history)
|
||||
(spelling-correction-setup-abbrevs)
|
||||
(spelling-correction-remove-invalid-abbrevs)
|
||||
(spelling-correction-create-history-abbrevs))
|
||||
#+end_src
|
||||
|
||||
We don't want to load the history eagerly, but we do want it available soon
|
||||
after startup. I think an idle timer would be a good way to do this.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(run-with-idle-timer 0.5 nil #'spelling-correction-load-history)
|
||||
#+end_src
|
||||
|
||||
-----
|
||||
|
||||
There we go, that's a complete self-managing abbrev-run frequent-misspelling
|
||||
correction system. We can do this by taking note of a helpful [[https://github.com/minad/jinx/wiki#save-misspelling-and-correction-as-abbreviation][code snippet]] in
|
||||
the Jinx wiki for immediately saving all corrected misspellings into the global
|
||||
abbrev list.
|
||||
|
||||
We'll also add ~:before~ advice to ~jinx--correct-replace~, and after getting the
|
||||
word just forward the correction to ~record-spelling-correction~.
|
||||
|
||||
#+begin_src emacs-lisp
|
||||
(defun record-jinx-spelling-correction (overlay corrected)
|
||||
(let ((text
|
||||
(buffer-substring-no-properties
|
||||
(overlay-start overlay)
|
||||
(overlay-end overlay))))
|
||||
(record-spelling-correction text corrected)))
|
||||
|
||||
(advice-add 'jinx--correct-replace :before #'record-jinx-spelling-correction)
|
||||
#+end_src
|
||||
|
||||
**** Downloading dictionaries
|
||||
|
||||
Let's get a nice big dictionary from [[http://app.aspell.net/create][SCOWL Custom List/Dictionary Creator]] with
|
||||
|
|
Loading…
Reference in New Issue