Create a system for auto-applying spelling fixes

Why didn't I do something like this years ago?
Refine hippie-expand config
2024-03-25 23:00:27 +08:00 · 2024-03-25 23:00:27 +08:00
1 changed files with 284 additions and 5 deletions
--- a/config.org
+++ b/config.org
@ -1485,26 +1485,43 @@ By default, it completes (in order):
 + Dabbrev (kill ring)
 + Known elisp symbols

-I find that "previous lines" completions often appear when I actually want a
+I find that ~try-expand-line~ completions often appear when I actually want a
 dabbrev completion, so let's deprioritise it somewhat. If I actually want to try
 for a line expansion, it's fairly easy to deliberately trigger it --- just
 invoke ~hippie-expand~ after typing a space and there will be no dabbrev
 candidates.

+Speaking of dabbrev, I do think of hippie-expand mostly as "a stangely named
+dabbrev+", so let's prioritise the dabbrev-related expanders a bit. I'll also
+toss in a nice non-default expansion generator as the first dabbrev candidate
+function: ~try-expand-dabbrev-visible~.
+
+There's another cool source of multi-word expansion (actually multi-line) that
+isn't used by default, ~try-expand-dabbrev-from-kill~. I personally think this one
+is quite neat, but don't want it to interfere with more common single-word
+completions, and so will place it just above ~try-expand-line~.
+
 #+begin_src emacs-lisp
 (setq hippie-expand-try-functions-list
-      '(try-complete-file-name-partially
-        try-complete-file-name
-        try-expand-all-abbrevs
-        try-expand-list
+      '(try-expand-list
+        try-expand-dabbrev-visible
        try-expand-dabbrev
+        try-expand-all-abbrevs
        try-expand-dabbrev-all-buffers
+        try-complete-file-name-partially
+        try-complete-file-name
        try-expand-dabbrev-from-kill
+        try-expand-whole-kill
        try-expand-line
        try-complete-lisp-symbol-partially
        try-complete-lisp-symbol))
 #+end_src

+Unfortunately there's one aspect of ~try-expand-dabbrev-from-kill~ that I find
+lets me down a bit, which is that it fails to complete when the killed text
+starts with a newline and the current line does not. I'll see if I can do
+something about this in the future.
+
 *** Buffer defaults

 I'd much rather have my new buffers in ~org-mode~ than ~fundamental-mode~, hence
@ -4084,6 +4101,268 @@ tweaks.
  (advice-add 'jinx-next :after (lambda (_) (left-word))))
 #+end_src

+**** Autocorrect
+
+#+call: confpkg(after="jinx")
+
+If you want to write without looking like you skipped a chunk of
+primary/secondary school (as I do), then autocorrect is a handy thing to have.
+Beyond just misspellings, it can also help with typos, and lazy capitalisation
+(can you really be bothered to type "Lua\LaTeX" instead of "lualatex" every
+single time?). However, primarily thanks to smartphones, I more often hear
+people cursing autocorrect than praising it. With that in mind, I think it's
+worth giving some thought to how smartphone autocorrect gets it's bad reputation
+(despite largely doing a decent job):
+1. Typing is harder on smartphones, and so autocorrect makes bigger (more speculative) guesses
+2. People type (and mistype) differently, but autocorrect tries to have a "one
+   size fits all" profile that is refined over time
+3. As soon as you accept a particular correction, autocorrect can start applying
+   that even when the original typo is ambiguous and has multiple "corrected" forms
+4. It's hard to tell the phone to stop doing a particular autocorrect (see
+   "Emacs" recapitalised as "eMacs" on Apple devices)
+
+I think we can largely alleviate these problems by
+1. Being mainly used on devices with actual keyboards
+2. Starting with an empty autocorrect "profile", built up by the user over time
+3. Having a customisable threshold before a repeated correction is made into an
+   autocorrection, and blacklisting misspellings with multiple distinct corrections.
+4. Making it easy to blacklist certain words from becoming autocorrections
+
+Another complaint about autocorrect is that it lets you develop bad habits, and
+if anything a tool that got you to retype the correct spelling several times
+would be more valuable in the long run. I think this is a pretty reasonable
+complaint, and have two different trains of thought that both justify tracking
+corrections made:
+ I almost never leave Emacs for writing more than a text message, so what if I
+  type worse outside of it?
+ By tracking corrections made, you can also make a personal "most common
+  misspellings" training list to run through at your leasure. Just set the
+  "minimum replacement count" to a stupidly high number.
+
+For starters, let's write a record of all corrections made.
+
+#+begin_src emacs-lisp
+(defvar spelling-correction-history-file
+  (file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
+                    "emacs" "spelling-corrections.txt")
+  "File where a spell check record will be saved.")
+#+end_src
+
+For simplicity of operation, I think we can just append each correction the file
+as =<misspelled> <corrected>= lines. This has a number of advantages, such as
+avoiding recalculations while typing, avoiding race conditions with multiple
+Emacs sessions, and making merging data on different machines trivial.
+
+In the Emacs session though, I think we'll want to have a hash table of the
+counts of each correction. We can have the misspelled words as the keys, and
+then have each value be an alist of src_elisp{(correction . count)} pairs. This
+table can be lazily built and processed after startup.
+
+#+begin_src emacs-lisp
+(defvar spelling-correction-table (make-hash-table :test #'equal))
+#+end_src
+
+We probably want to also specify a threshold number of misspellings that trigger
+entry to the abbrev table, both on load and when made during the current Emacs
+session. For now, I'll try a value of three for on-load and two for misspellings
+made in the current Emacs session. I think I want to avoid a value of one since
+that makes it easy for a misspelling with multiple valid corrections to become
+associated with a single correction too soon. This is a rare concern, but it
+would be annoying enough to run into that I think it's worth requiring a second
+misspelling.
+
+#+begin_src emacs-lisp
+(defvar spelling-correction-history-abbrev-threshold 3
+  "The number of recorded identical misspellings to create an abbrev.
+This applies to misspellings read from the history file")
+(defvar spelling-correction-live-abbrev-threshold 2
+  "The number of identical misspellings to create an abbrev.
+This applies to misspellings made in the current Emacs session.")
+#+end_src
+
+At this point we need to actually implement this functionality, starting with
+updating the table when a correction is either read from the history file or
+occurs live.
+
+#+begin_src emacs-lisp
+(defun spelling-correction-update-table (misspelling corrected)
+  "Update the MISPELLING to CORRECTED entry in the table.
+Returns the number of times this correction has occurred."
+  (if-let ((correction-counts
+            (gethash misspelling spelling-correction-table)))
+      (if-let ((record-cons (assoc corrected correction-counts)))
+          (setcdr record-cons (1+ (cdr record-cons)))
+        (puthash misspelling
+                 (push (cons corrected 1) correction-counts)
+                 spelling-correction-table)
+        1)
+    (puthash misspelling
+             (list (cons corrected 1))
+             spelling-correction-table)
+    1))
+#+end_src
+
+We could call ~define-abbrev~ directly, but since we'll be doing so in multiple
+places, I think it's nice to have a single place where the abbrev table so any
+changes to the abbrev table (or similar) only need to be made in one place.
+
+We could use the global abbrev table, but I'd rather have one dedicated to
+spelling corrections. Let's manage this entirely separately to the global abbrev
+file too.
+
+#+begin_src emacs-lisp
+(defvar spelling-correction-abbrev-file
+  (file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
+                    "emacs" "spelling-abbrevs.el")
+  "File to save spell check records in.")
+
+(defvar spelling-correction-abbrev-table nil
+  "The spelling abbrev table.")
+
+(defvar spelling-correction-abbrev-table--saved-version 0
+  "The version of `spelling-correction-abbrev-table' saved to disk.")
+
+(defun spelling-correction-setup-abbrevs ()
+  "Setup `spelling-correction-abbrev-table'.
+Also set it as a parent of `global-abbrev-table'."
+  (unless spelling-correction-abbrev-table
+    (setq spelling-correction-abbrev-table (make-abbrev-table))
+    (abbrev-table-put
+     global-abbrev-table :parents
+     (cons spelling-correction-abbrev-table
+           (abbrev-table-get global-abbrev-table :parents)))
+    (add-hook 'kill-emacs-hook #'spelling-correction-save-abbrevs))
+  (when (file-exists-p spelling-correction-abbrev-file)
+    (read-abbrev-file spelling-correction-abbrev-file t)
+    (setq spelling-correction-abbrev-table--saved-version
+          (abbrev-table-get spelling-correction-abbrev-table
+                            :abbrev-table-modiff))))
+
+(defun spelling-correction-save-abbrevs ()
+  "Write `spelling-correction-abbrev-table'."
+  (when (> (abbrev-table-get spelling-correction-abbrev-table
+                             :abbrev-table-modiff)
+           spelling-correction-abbrev-table--saved-version)
+    (unless (file-exists-p spelling-correction-abbrev-file)
+      (make-directory (file-name-directory spelling-correction-abbrev-file) t))
+    (let ((coding-system-for-write 'utf-8))
+      (with-temp-buffer
+        (insert-abbrev-table-description 'spelling-correction-abbrev-table nil)
+        (when (unencodable-char-position (point-min) (point-max) 'utf-8)
+          (setq coding-system-for-write 'utf-8-emacs))
+        (goto-char (point-min))
+        (insert (format ";;-*-coding: %s;-*-\n\n" coding-system-for-write))
+        (write-region nil nil spelling-correction-abbrev-file 0)))
+    (setq spelling-correction-abbrev-table--saved-version
+          (abbrev-table-get spelling-correction-abbrev-table
+                            :abbrev-table-modiff))))
+#+end_src
+
+Now we can write the update function that's run on a live spelling correction.
+
+#+begin_src emacs-lisp
+(defun record-spelling-correction (misspelling corrected)
+  "Record the correction of MISPELLING to CORRECTED."
+  (let ((write-region-inhibit-fsync t) ; Quicker writes
+        (coding-system-for-write 'utf-8)
+        (inhibit-message t))
+    (write-region
+     (concat misspelling " " corrected "\n") nil
+     spelling-correction-history-file t))
+  (when (and (>= (spelling-correction-update-table misspelling corrected)
+                 spelling-correction-live-abbrev-threshold)
+             (= (length (gethash misspelling spelling-correction-table))
+                1))
+    (define-abbrev spelling-correction-abbrev-table misspelling corrected)
+    (message "Created new abbreviation: %s ⟶ %s"
+             (propertize misspelling 'face 'warning)
+             (propertize corrected 'face 'success))))
+#+end_src
+
+The only thing left to be done now is load the history file. I think I'd like to
+split the actual reading and the abbrev generation into two parts though.
+
+#+begin_src emacs-lisp
+(defun spelling-correction-read-history ()
+  "Read the history file into the correction table."
+  (if (file-exists-p spelling-correction-history-file)
+      (with-temp-buffer
+        (insert-file-contents spelling-correction-history-file)
+        (goto-char (point-min))
+        (while (< (point) (point-max))
+          (let ((pt (point))
+                misspelling corrected)
+            (setq misspelling
+                  (and (forward-word)
+                       (buffer-substring pt (point)))
+                  pt (1+ (point)))
+            (setq corrected
+                  (and (forward-word)
+                       (buffer-substring pt (point)))
+                  pt (point))
+            (when (and misspelling corrected)
+              (spelling-correction-update-table misspelling corrected))
+            (forward-line 1))))
+    (make-directory (file-name-directory spelling-correction-history-file))
+    (write-region "" nil spelling-correction-history-file)))
+
+(defun spelling-correction-remove-invalid-abbrevs ()
+  "Ensure that all entries of the abbrev table are valid."
+  (obarray-map
+   (lambda (misspelling)
+     (when (stringp misspelling) ; Abbrev's obarrays start with a symbol
+       (let ((corrections (gethash misspelling spelling-correction-table)))
+         (unless (and (= (length corrections) 1)
+                      (>= (cdar corrections)
+                          spelling-correction-history-abbrev-threshold))
+           (define-abbrev spelling-correction-abbrev-table misspelling nil)))))
+     spelling-correction-abbrev-table))
+
+(defun spelling-correction-create-history-abbrevs ()
+  "Apply the history threshold to the current correction table."
+  (maphash
+   (lambda (misspelling corrections)
+     (when (and (= (length corrections) 1)
+                (>= (cdar corrections)
+                    spelling-correction-history-abbrev-threshold))
+       (unless (obarray-get spelling-correction-abbrev-table misspelling)
+         (define-abbrev spelling-correction-abbrev-table
+           misspelling (caar corrections)))))
+   spelling-correction-table))
+
+(defun spelling-correction-load-history ()
+  "Read and process the history file into abbrevs."
+  (spelling-correction-read-history)
+  (spelling-correction-setup-abbrevs)
+  (spelling-correction-remove-invalid-abbrevs)
+  (spelling-correction-create-history-abbrevs))
+#+end_src
+
+We don't want to load the history eagerly, but we do want it available soon
+after startup. I think an idle timer would be a good way to do this.
+
+#+begin_src emacs-lisp
+(run-with-idle-timer 0.5 nil #'spelling-correction-load-history)
+#+end_src
+
+-----
+
+There we go, that's a complete self-managing abbrev-run frequent-misspelling
+correction system. We can hook this up to Jinx by taking note of a helpful [[https://github.com/minad/jinx/wiki#save-misspelling-and-correction-as-abbreviation][code
+snippet]] in the Jinx wiki for immediately saving all corrected misspellings into
+the global abbrev list.
+
+#+begin_src emacs-lisp
+(defun record-jinx-spelling-correction (overlay corrected)
+  (let ((text
+         (buffer-substring-no-properties
+          (overlay-start overlay)
+          (overlay-end overlay))))
+    (record-spelling-correction text corrected)))
+
+(advice-add 'jinx--correct-replace :before #'record-jinx-spelling-correction)
+#+end_src
+
 **** Downloading dictionaries

 Let's get a nice big dictionary from [[http://app.aspell.net/create][SCOWL Custom List/Dictionary Creator]] with
Author	SHA1	Message	Date
TEC	167886a6a9	Create a system for auto-applying spelling fixes Why didn't I do something like this years ago?	2024-03-25 23:00:27 +08:00
TEC	df76ad127d	Refine hippie-expand config	2024-03-25 23:00:27 +08:00