Compare commits

...

2 Commits

Author SHA1 Message Date
TEC dc1e13ee63
Create a system for auto-applying spelling fixes
Why didn't I do something like this years ago?
2024-03-25 17:04:03 +08:00
TEC 8f663d2ce8
Adjust my abbrev config 2024-03-25 17:04:03 +08:00
1 changed files with 241 additions and 8 deletions

View File

@ -3535,16 +3535,28 @@ let's change that, and do a few other similar tweaks while we're at it.
** Tools
*** Abbrev
#+call: confpkg("Multi-mode abbrev")
#+call: confpkg()
Thanks to [[https://emacs.stackexchange.com/questions/45462/use-a-single-abbrev-table-for-multiple-modes/45476#45476][use a single abbrev-table for multiple modes? - Emacs Stack Exchange]] I
have the following.
Abbrev mode is great, and something I make use of in multiple ways. As such, I
want it on by default.
#+begin_src emacs-lisp :tangle no
(add-hook 'doom-first-buffer-hook
(defun +abbrev-file-name ()
(setq-default abbrev-mode t)
(setq abbrev-file-name (expand-file-name "abbrev.el" doom-private-dir))))
#+begin_src emacs-lisp
(setq-default abbrev-mode t)
#+end_src
Abbrev-mode can save and load abbreviations from an "abbrev file", which I'd
like to locate in my Doom config folder.
#+begin_src emacs-lisp
(setq abbrev-file-name (expand-file-name "abbrev.el" doom-private-dir))
#+end_src
I need to think more on how I want to manage abbrev changes in the current
session, but for now I'm going to be overly cautious and avoid any modifications
to the global abbrev file that I don't make myself.
#+begin_src emacs-lisp
(setq save-abbrevs nil)
#+end_src
*** Very large files
@ -4072,6 +4084,227 @@ tweaks.
(advice-add 'jinx-next :after (lambda (_) (left-word))))
#+end_src
**** Saving corrections as abbreviations
#+call: confpkg(after="jinx")
I think it would be neat to be able to save persistent misspellings as
abbreviations. I'm not the first to have this neat idea, as the jinx wiki has a
for doing exactly this.
However, I want to do something a bit more persistent. For starters, let's write
corrections to a file.
#+begin_src emacs-lisp
(defvar spelling-correction-history-file
(file-name-concat (or (getenv "XDG_STATE_HOMEE") "~/.local/state")
"emacs" "spelling-corrections.txt")
"File where a spell check record will be saved.")
#+end_src
For simplicity of operation, I think we can just append each correction the file
as =<misspelled> <corrected>= lines. In the Emacs session though, I think we'll
want to have a hash table of the counts of each correction. We can have the
misspelled words as the keys, and then have each value be an alist of
src_elisp{(correction . count)} pairs. This table can be lazily built and
processed after startup.
#+begin_src emacs-lisp
(defvar spelling-correction-table (make-hash-table :test #'equal))
#+end_src
We probably want to also specify a threshold number of misspellings that trigger
entry to the abbrev table, both on load and when made during the current Emacs
session. For now, I'll try a value of three for on-load and two for misspellings
made in the current Emacs session. I think I want to avoid a value of one since
that makes it easy for a misspelling with multiple valid corrections to become
associated with a single correction too soon. This is a rare concern, but it
would be annoying enough to run into that I think it's worth requiring a second
misspelling.
#+begin_src emacs-lisp
(defvar spelling-correction-history-abbrev-threshold 3
"The number of recorded identical misspellings to create an abbrev.
This applies to misspellings read from the history file")
(defvar spelling-correction-live-abbrev-threshold 2
"The number of identical misspellings to create an abbrev.
This applies to misspellings made in the current Emacs session.")
#+end_src
At this point we need to actually implement this functionality, starting with
updating the table when a correction is either read from the history file or
occurs live.
#+begin_src emacs-lisp
(defun spelling-correction-update-table (misspelling corrected)
"Update the MISPELLING to CORRECTED entry in the table.
Returns the number of times this correction has occurred."
(if-let ((correction-counts
(gethash misspelling spelling-correction-table)))
(if-let ((record-cons (assoc corrected correction-counts)))
(setcdr record-cons (1+ (cdr record-cons)))
(puthash misspelling
(push (cons corrected 1) correction-counts)
spelling-correction-table)
1)
(puthash misspelling
(list (cons corrected 1))
spelling-correction-table)
1))
#+end_src
We could call ~define-abbrev~ directly, but since we'll be doing so in multiple
places, I think it's nice to have a single place where the abbrev table so any
changes to the abbrev table (or similar) only need to be made in one place.
We could use the global abbrev table, but I'd rather have one dedicated to
spelling corrections. Let's manage this entirely separately to the global abbrev
file too.
#+begin_src emacs-lisp
(defvar spelling-correction-abbrev-file
(file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
"emacs" "spelling-abbrevs.el")
"File to save spell check records in.")
(defvar spelling-correction-abbrev-table nil
"The spelling abbrev table.")
(defun spelling-correction-setup-abbrevs ()
"Setup `spelling-correction-abbrev-table'.
Also set it as a parent of `global-abbrev-table'."
(unless spelling-correction-abbrev-table
(setq spelling-correction-abbrev-table (make-abbrev-table))
(abbrev-table-put
global-abbrev-table :parents
(cons spelling-correction-abbrev-table
(abbrev-table-get global-abbrev-table :parents)))
(add-hook 'kill-emacs-hook #'spelling-correction-save-abbrevs))
(when (file-exists-p spelling-correction-abbrev-file)
(read-abbrev-file spelling-correction-abbrev-file t)))
(defun spelling-correction-save-abbrevs ()
"Write `spelling-correction-abbrev-table'."
(unless (file-exists-p spelling-correction-abbrev-file)
(make-directory (file-name-directory spelling-correction-abbrev-file) t))
(let ((coding-system-for-write 'utf-8))
(with-temp-buffer
(insert-abbrev-table-description 'spelling-correction-abbrev-table nil)
(when (unencodable-char-position (point-min) (point-max) 'utf-8)
(setq coding-system-for-write 'utf-8-emacs))
(goto-char (point-min))
(insert (format ";;-*-coding: %s;-*-\n\n" coding-system-for-write))
(write-region nil nil spelling-correction-abbrev-file 0))))
#+end_src
Now we can write the update function that's run on a live spelling correction.
#+begin_src emacs-lisp
(defun record-spelling-correction (misspelling corrected)
"Record the correction of MISPELLING to CORRECTED."
(let ((write-region-inhibit-fsync t) ; Quicker writes
(coding-system-for-write 'utf-8)
(inhibit-message t))
(write-region
(concat misspelling " " corrected "\n")
nil
spelling-correction-history-file
t))
(when (and (>= (spelling-correction-update-table misspelling corrected)
spelling-correction-live-abbrev-threshold)
(= (length (gethash misspelling spelling-correction-table))
1))
(define-abbrev spelling-correction-abbrev-table misspelling corrected)
(message "Created new abbreviation: %s ⟶ %s" misspelling corrected)))
#+end_src
The only thing left to be done now is load the history file. I think I'd like to
split the actual reading and the abbrev generation into two parts though.
#+begin_src emacs-lisp
(defun spelling-correction-read-history ()
"Read the history file into the correction table."
(if (file-exists-p spelling-correction-history-file)
(with-temp-buffer
(insert-file-contents spelling-correction-history-file)
(goto-char (point-min))
(while (< (point) (point-max))
(let ((pt (point))
misspelling corrected)
(setq misspelling
(and (forward-word)
(buffer-substring pt (point)))
pt (1+ (point)))
(setq corrected
(and (forward-word)
(buffer-substring pt (point)))
pt (point))
(when (and misspelling corrected)
(spelling-correction-update-table misspelling corrected))
(forward-line 1))))
(make-directory (file-name-directory spelling-correction-history-file))
(write-region "" nil spelling-correction-history-file)))
(defun spelling-correction-remove-invalid-abbrevs ()
"Ensure that all entries of the abbrev table are valid."
(obarray-map
(lambda (misspelling)
(when (stringp misspelling) ; Abbrev's obarrays start with a symbol
(let ((corrections (gethash misspelling spelling-correction-table)))
(unless (and (= (length corrections) 1)
(>= (cdar corrections)
spelling-correction-history-abbrev-threshold))
(define-abbrev spelling-correction-abbrev-table misspelling nil)))))
spelling-correction-abbrev-table))
(defun spelling-correction-create-history-abbrevs ()
"Apply the history threshold to the current correction table."
(maphash
(lambda (misspelling corrections)
(when (and (= (length corrections) 1)
(>= (cdar corrections)
spelling-correction-history-abbrev-threshold))
(unless (obarray-get spelling-correction-abbrev-table misspelling)
(define-abbrev spelling-correction-abbrev-table
misspelling (caar corrections)))))
spelling-correction-table))
(defun spelling-correction-load-history ()
"Read and process the history file into abbrevs."
(spelling-correction-read-history)
(spelling-correction-setup-abbrevs)
(spelling-correction-remove-invalid-abbrevs)
(spelling-correction-create-history-abbrevs))
#+end_src
We don't want to load the history eagerly, but we do want it available soon
after startup. I think an idle timer would be a good way to do this.
#+begin_src emacs-lisp
(run-with-idle-timer 0.5 nil #'spelling-correction-load-history)
#+end_src
-----
There we go, that's a complete self-managing abbrev-run frequent-misspelling
correction system. We can do this by taking note of a helpful [[https://github.com/minad/jinx/wiki#save-misspelling-and-correction-as-abbreviation][code snippet]] in
the Jinx wiki for immediately saving all corrected misspellings into the global
abbrev list.
We'll also add ~:before~ advice to ~jinx--correct-replace~, and after getting the
word just forward the correction to ~record-spelling-correction~.
#+begin_src emacs-lisp
(defun record-jinx-spelling-correction (overlay corrected)
(let ((text
(buffer-substring-no-properties
(overlay-start overlay)
(overlay-end overlay))))
(record-spelling-correction text corrected)))
(advice-add 'jinx--correct-replace :before #'record-jinx-spelling-correction)
#+end_src
**** Downloading dictionaries
Let's get a nice big dictionary from [[http://app.aspell.net/create][SCOWL Custom List/Dictionary Creator]] with