Compare commits

...

2 Commits

Author SHA1 Message Date
TEC 167886a6a9
Create a system for auto-applying spelling fixes
Why didn't I do something like this years ago?
2024-03-25 23:00:27 +08:00
TEC df76ad127d
Refine hippie-expand config 2024-03-25 23:00:27 +08:00
1 changed files with 284 additions and 5 deletions

View File

@ -1485,26 +1485,43 @@ By default, it completes (in order):
+ Dabbrev (kill ring)
+ Known elisp symbols
I find that "previous lines" completions often appear when I actually want a
I find that ~try-expand-line~ completions often appear when I actually want a
dabbrev completion, so let's deprioritise it somewhat. If I actually want to try
for a line expansion, it's fairly easy to deliberately trigger it --- just
invoke ~hippie-expand~ after typing a space and there will be no dabbrev
candidates.
Speaking of dabbrev, I do think of hippie-expand mostly as "a stangely named
dabbrev+", so let's prioritise the dabbrev-related expanders a bit. I'll also
toss in a nice non-default expansion generator as the first dabbrev candidate
function: ~try-expand-dabbrev-visible~.
There's another cool source of multi-word expansion (actually multi-line) that
isn't used by default, ~try-expand-dabbrev-from-kill~. I personally think this one
is quite neat, but don't want it to interfere with more common single-word
completions, and so will place it just above ~try-expand-line~.
#+begin_src emacs-lisp
(setq hippie-expand-try-functions-list
'(try-complete-file-name-partially
try-complete-file-name
try-expand-all-abbrevs
try-expand-list
'(try-expand-list
try-expand-dabbrev-visible
try-expand-dabbrev
try-expand-all-abbrevs
try-expand-dabbrev-all-buffers
try-complete-file-name-partially
try-complete-file-name
try-expand-dabbrev-from-kill
try-expand-whole-kill
try-expand-line
try-complete-lisp-symbol-partially
try-complete-lisp-symbol))
#+end_src
Unfortunately there's one aspect of ~try-expand-dabbrev-from-kill~ that I find
lets me down a bit, which is that it fails to complete when the killed text
starts with a newline and the current line does not. I'll see if I can do
something about this in the future.
*** Buffer defaults
I'd much rather have my new buffers in ~org-mode~ than ~fundamental-mode~, hence
@ -4084,6 +4101,268 @@ tweaks.
(advice-add 'jinx-next :after (lambda (_) (left-word))))
#+end_src
**** Autocorrect
#+call: confpkg(after="jinx")
If you want to write without looking like you skipped a chunk of
primary/secondary school (as I do), then autocorrect is a handy thing to have.
Beyond just misspellings, it can also help with typos, and lazy capitalisation
(can you really be bothered to type "Lua\LaTeX" instead of "lualatex" every
single time?). However, primarily thanks to smartphones, I more often hear
people cursing autocorrect than praising it. With that in mind, I think it's
worth giving some thought to how smartphone autocorrect gets it's bad reputation
(despite largely doing a decent job):
1. Typing is harder on smartphones, and so autocorrect makes bigger (more speculative) guesses
2. People type (and mistype) differently, but autocorrect tries to have a "one
size fits all" profile that is refined over time
3. As soon as you accept a particular correction, autocorrect can start applying
that even when the original typo is ambiguous and has multiple "corrected" forms
4. It's hard to tell the phone to stop doing a particular autocorrect (see
"Emacs" recapitalised as "eMacs" on Apple devices)
I think we can largely alleviate these problems by
1. Being mainly used on devices with actual keyboards
2. Starting with an empty autocorrect "profile", built up by the user over time
3. Having a customisable threshold before a repeated correction is made into an
autocorrection, and blacklisting misspellings with multiple distinct corrections.
4. Making it easy to blacklist certain words from becoming autocorrections
Another complaint about autocorrect is that it lets you develop bad habits, and
if anything a tool that got you to retype the correct spelling several times
would be more valuable in the long run. I think this is a pretty reasonable
complaint, and have two different trains of thought that both justify tracking
corrections made:
+ I almost never leave Emacs for writing more than a text message, so what if I
type worse outside of it?
+ By tracking corrections made, you can also make a personal "most common
misspellings" training list to run through at your leasure. Just set the
"minimum replacement count" to a stupidly high number.
For starters, let's write a record of all corrections made.
#+begin_src emacs-lisp
(defvar spelling-correction-history-file
(file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
"emacs" "spelling-corrections.txt")
"File where a spell check record will be saved.")
#+end_src
For simplicity of operation, I think we can just append each correction the file
as =<misspelled> <corrected>= lines. This has a number of advantages, such as
avoiding recalculations while typing, avoiding race conditions with multiple
Emacs sessions, and making merging data on different machines trivial.
In the Emacs session though, I think we'll want to have a hash table of the
counts of each correction. We can have the misspelled words as the keys, and
then have each value be an alist of src_elisp{(correction . count)} pairs. This
table can be lazily built and processed after startup.
#+begin_src emacs-lisp
(defvar spelling-correction-table (make-hash-table :test #'equal))
#+end_src
We probably want to also specify a threshold number of misspellings that trigger
entry to the abbrev table, both on load and when made during the current Emacs
session. For now, I'll try a value of three for on-load and two for misspellings
made in the current Emacs session. I think I want to avoid a value of one since
that makes it easy for a misspelling with multiple valid corrections to become
associated with a single correction too soon. This is a rare concern, but it
would be annoying enough to run into that I think it's worth requiring a second
misspelling.
#+begin_src emacs-lisp
(defvar spelling-correction-history-abbrev-threshold 3
"The number of recorded identical misspellings to create an abbrev.
This applies to misspellings read from the history file")
(defvar spelling-correction-live-abbrev-threshold 2
"The number of identical misspellings to create an abbrev.
This applies to misspellings made in the current Emacs session.")
#+end_src
At this point we need to actually implement this functionality, starting with
updating the table when a correction is either read from the history file or
occurs live.
#+begin_src emacs-lisp
(defun spelling-correction-update-table (misspelling corrected)
"Update the MISPELLING to CORRECTED entry in the table.
Returns the number of times this correction has occurred."
(if-let ((correction-counts
(gethash misspelling spelling-correction-table)))
(if-let ((record-cons (assoc corrected correction-counts)))
(setcdr record-cons (1+ (cdr record-cons)))
(puthash misspelling
(push (cons corrected 1) correction-counts)
spelling-correction-table)
1)
(puthash misspelling
(list (cons corrected 1))
spelling-correction-table)
1))
#+end_src
We could call ~define-abbrev~ directly, but since we'll be doing so in multiple
places, I think it's nice to have a single place where the abbrev table so any
changes to the abbrev table (or similar) only need to be made in one place.
We could use the global abbrev table, but I'd rather have one dedicated to
spelling corrections. Let's manage this entirely separately to the global abbrev
file too.
#+begin_src emacs-lisp
(defvar spelling-correction-abbrev-file
(file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state")
"emacs" "spelling-abbrevs.el")
"File to save spell check records in.")
(defvar spelling-correction-abbrev-table nil
"The spelling abbrev table.")
(defvar spelling-correction-abbrev-table--saved-version 0
"The version of `spelling-correction-abbrev-table' saved to disk.")
(defun spelling-correction-setup-abbrevs ()
"Setup `spelling-correction-abbrev-table'.
Also set it as a parent of `global-abbrev-table'."
(unless spelling-correction-abbrev-table
(setq spelling-correction-abbrev-table (make-abbrev-table))
(abbrev-table-put
global-abbrev-table :parents
(cons spelling-correction-abbrev-table
(abbrev-table-get global-abbrev-table :parents)))
(add-hook 'kill-emacs-hook #'spelling-correction-save-abbrevs))
(when (file-exists-p spelling-correction-abbrev-file)
(read-abbrev-file spelling-correction-abbrev-file t)
(setq spelling-correction-abbrev-table--saved-version
(abbrev-table-get spelling-correction-abbrev-table
:abbrev-table-modiff))))
(defun spelling-correction-save-abbrevs ()
"Write `spelling-correction-abbrev-table'."
(when (> (abbrev-table-get spelling-correction-abbrev-table
:abbrev-table-modiff)
spelling-correction-abbrev-table--saved-version)
(unless (file-exists-p spelling-correction-abbrev-file)
(make-directory (file-name-directory spelling-correction-abbrev-file) t))
(let ((coding-system-for-write 'utf-8))
(with-temp-buffer
(insert-abbrev-table-description 'spelling-correction-abbrev-table nil)
(when (unencodable-char-position (point-min) (point-max) 'utf-8)
(setq coding-system-for-write 'utf-8-emacs))
(goto-char (point-min))
(insert (format ";;-*-coding: %s;-*-\n\n" coding-system-for-write))
(write-region nil nil spelling-correction-abbrev-file 0)))
(setq spelling-correction-abbrev-table--saved-version
(abbrev-table-get spelling-correction-abbrev-table
:abbrev-table-modiff))))
#+end_src
Now we can write the update function that's run on a live spelling correction.
#+begin_src emacs-lisp
(defun record-spelling-correction (misspelling corrected)
"Record the correction of MISPELLING to CORRECTED."
(let ((write-region-inhibit-fsync t) ; Quicker writes
(coding-system-for-write 'utf-8)
(inhibit-message t))
(write-region
(concat misspelling " " corrected "\n") nil
spelling-correction-history-file t))
(when (and (>= (spelling-correction-update-table misspelling corrected)
spelling-correction-live-abbrev-threshold)
(= (length (gethash misspelling spelling-correction-table))
1))
(define-abbrev spelling-correction-abbrev-table misspelling corrected)
(message "Created new abbreviation: %s ⟶ %s"
(propertize misspelling 'face 'warning)
(propertize corrected 'face 'success))))
#+end_src
The only thing left to be done now is load the history file. I think I'd like to
split the actual reading and the abbrev generation into two parts though.
#+begin_src emacs-lisp
(defun spelling-correction-read-history ()
"Read the history file into the correction table."
(if (file-exists-p spelling-correction-history-file)
(with-temp-buffer
(insert-file-contents spelling-correction-history-file)
(goto-char (point-min))
(while (< (point) (point-max))
(let ((pt (point))
misspelling corrected)
(setq misspelling
(and (forward-word)
(buffer-substring pt (point)))
pt (1+ (point)))
(setq corrected
(and (forward-word)
(buffer-substring pt (point)))
pt (point))
(when (and misspelling corrected)
(spelling-correction-update-table misspelling corrected))
(forward-line 1))))
(make-directory (file-name-directory spelling-correction-history-file))
(write-region "" nil spelling-correction-history-file)))
(defun spelling-correction-remove-invalid-abbrevs ()
"Ensure that all entries of the abbrev table are valid."
(obarray-map
(lambda (misspelling)
(when (stringp misspelling) ; Abbrev's obarrays start with a symbol
(let ((corrections (gethash misspelling spelling-correction-table)))
(unless (and (= (length corrections) 1)
(>= (cdar corrections)
spelling-correction-history-abbrev-threshold))
(define-abbrev spelling-correction-abbrev-table misspelling nil)))))
spelling-correction-abbrev-table))
(defun spelling-correction-create-history-abbrevs ()
"Apply the history threshold to the current correction table."
(maphash
(lambda (misspelling corrections)
(when (and (= (length corrections) 1)
(>= (cdar corrections)
spelling-correction-history-abbrev-threshold))
(unless (obarray-get spelling-correction-abbrev-table misspelling)
(define-abbrev spelling-correction-abbrev-table
misspelling (caar corrections)))))
spelling-correction-table))
(defun spelling-correction-load-history ()
"Read and process the history file into abbrevs."
(spelling-correction-read-history)
(spelling-correction-setup-abbrevs)
(spelling-correction-remove-invalid-abbrevs)
(spelling-correction-create-history-abbrevs))
#+end_src
We don't want to load the history eagerly, but we do want it available soon
after startup. I think an idle timer would be a good way to do this.
#+begin_src emacs-lisp
(run-with-idle-timer 0.5 nil #'spelling-correction-load-history)
#+end_src
-----
There we go, that's a complete self-managing abbrev-run frequent-misspelling
correction system. We can hook this up to Jinx by taking note of a helpful [[https://github.com/minad/jinx/wiki#save-misspelling-and-correction-as-abbreviation][code
snippet]] in the Jinx wiki for immediately saving all corrected misspellings into
the global abbrev list.
#+begin_src emacs-lisp
(defun record-jinx-spelling-correction (overlay corrected)
(let ((text
(buffer-substring-no-properties
(overlay-start overlay)
(overlay-end overlay))))
(record-spelling-correction text corrected)))
(advice-add 'jinx--correct-replace :before #'record-jinx-spelling-correction)
#+end_src
**** Downloading dictionaries
Let's get a nice big dictionary from [[http://app.aspell.net/create][SCOWL Custom List/Dictionary Creator]] with