diff --git a/config.org b/config.org index b4b3e3b..d450d17 100644 --- a/config.org +++ b/config.org @@ -4101,6 +4101,268 @@ tweaks. (advice-add 'jinx-next :after (lambda (_) (left-word)))) #+end_src +**** Autocorrect + +#+call: confpkg("autocorrect", prefix="", after="jinx") + +If you want to write without looking like you skipped a chunk of +primary/secondary school (as I do), then autocorrect is a handy thing to have. +Beyond just misspellings, it can also help with typos, and lazy capitalisation +(can you really be bothered to consistently type "LuaLaTeX" instead of +"lualatex" and "SciFi" over "scifi"?). However, primarily thanks to smartphones, +I more often hear people cursing autocorrect than praising it. With that in +mind, I think it's worth giving some thought to how smartphone autocorrect gets +it's bad reputation (despite largely doing a decent job): +1. Typing is harder on smartphones, and so autocorrect makes bigger (more speculative) guesses +2. People type (and mistype) differently, but autocorrect tries to have a "one + size fits all" profile that is refined over time +3. As soon as you accept a particular correction, autocorrect can start applying + that even when the original typo is ambiguous and has multiple "corrected" forms +4. It's hard to tell the phone to stop doing a particular autocorrect (see + "Emacs" recapitalised as "eMacs" on Apple devices) + +I think we can largely alleviate these problems by +1. Being mainly used on devices with actual keyboards +2. Starting with an empty autocorrect "profile", built up by the user over time +3. Having a customisable threshold before a repeated correction is made into an + autocorrection, and blacklisting misspellings with multiple distinct corrections. +4. Making it easy to blacklist certain words from becoming autocorrections + +Another complaint about autocorrect is that it lets you develop bad habits, and +if anything a tool that got you to retype the correct spelling several times +would be more valuable in the long run. I think this is a pretty reasonable +complaint, and have two different trains of thought that both justify tracking +corrections made: ++ I almost never leave Emacs for writing more than a text message, so what if I + type worse outside of it? ++ By tracking corrections made, you can also make a personal "most common + misspellings" training list to run through at your leasure. Just set the + "minimum replacement count" to a stupidly high number. + +For starters, let's write a record of all corrections made. + +#+begin_src emacs-lisp +(defvar autocorrect-history-file + (file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state") + "emacs" "spelling-corrections.txt") + "File where a spell check record will be saved.") +#+end_src + +For simplicity of operation, I think we can just append each correction the file +as = = lines. This has a number of advantages, such as +avoiding recalculations while typing, avoiding race conditions with multiple +Emacs sessions, and making merging data on different machines trivial. + +In the Emacs session though, I think we'll want to have a hash table of the +counts of each correction. We can have the misspelled words as the keys, and +then have each value be an alist of src_elisp{(correction . count)} pairs. This +table can be lazily built and processed after startup. + +#+begin_src emacs-lisp +(defvar autocorrect-record-table (make-hash-table :test #'equal)) +#+end_src + +We probably want to also specify a threshold number of misspellings that trigger +entry to the abbrev table, both on load and when made during the current Emacs +session. For now, I'll try a value of three for on-load and two for misspellings +made in the current Emacs session. I think I want to avoid a value of one since +that makes it easy for a misspelling with multiple valid corrections to become +associated with a single correction too soon. This is a rare concern, but it +would be annoying enough to run into that I think it's worth requiring a second +misspelling. + +#+begin_src emacs-lisp +(defvar autocorrect-count-threshold-history 3 + "The number of recorded identical misspellings to create an abbrev. +This applies to misspellings read from the history file") +(defvar autocorrect-count-threshold-session 2 + "The number of identical misspellings to create an abbrev. +This applies to misspellings made in the current Emacs session.") +#+end_src + +At this point we need to actually implement this functionality, starting with +updating the table when a correction is either read from the history file or +occurs live. + +#+begin_src emacs-lisp +(defun autocorrect-update-table (misspelling corrected) + "Update the MISPELLING to CORRECTED entry in the table. +Returns the number of times this correction has occurred." + (if-let ((correction-counts + (gethash misspelling autocorrect-record-table))) + (if-let ((record-cons (assoc corrected correction-counts))) + (setcdr record-cons (1+ (cdr record-cons))) + (puthash misspelling + (push (cons corrected 1) correction-counts) + autocorrect-record-table) + 1) + (puthash misspelling + (list (cons corrected 1)) + autocorrect-record-table) + 1)) +#+end_src + +We could call ~define-abbrev~ directly, but since we'll be doing so in multiple +places, I think it's nice to have a single place where the abbrev table so any +changes to the abbrev table (or similar) only need to be made in one place. + +We could use the global abbrev table, but I'd rather have one dedicated to +spelling corrections. Let's manage this entirely separately to the global abbrev +file too. + +#+begin_src emacs-lisp +(defvar autocorrect-abbrev-file + (file-name-concat (or (getenv "XDG_STATE_HOME") "~/.local/state") + "emacs" "spelling-abbrevs.el") + "File to save spell check records in.") + +(defvar autocorrect-abbrev-table nil + "The spelling abbrev table.") + +(defvar autocorrect-abbrev-table--saved-version 0 + "The version of `autocorrect-abbrev-table' saved to disk.") + +(defun autocorrect--setup-abbrevs () + "Setup `autocorrect-abbrev-table'. +Also set it as a parent of `global-abbrev-table'." + (unless autocorrect-abbrev-table + (setq autocorrect-abbrev-table (make-abbrev-table)) + (abbrev-table-put + global-abbrev-table :parents + (cons autocorrect-abbrev-table + (abbrev-table-get global-abbrev-table :parents))) + (add-hook 'kill-emacs-hook #'autocorrect-save-abbrevs)) + (when (file-exists-p autocorrect-abbrev-file) + (read-abbrev-file autocorrect-abbrev-file t) + (setq autocorrect-abbrev-table--saved-version + (abbrev-table-get autocorrect-abbrev-table + :abbrev-table-modiff)))) + +(defun autocorrect-save-abbrevs () + "Write `autocorrect-abbrev-table'." + (when (> (abbrev-table-get autocorrect-abbrev-table + :abbrev-table-modiff) + autocorrect-abbrev-table--saved-version) + (unless (file-exists-p autocorrect-abbrev-file) + (make-directory (file-name-directory autocorrect-abbrev-file) t)) + (let ((coding-system-for-write 'utf-8)) + (with-temp-buffer + (insert-abbrev-table-description 'autocorrect-abbrev-table nil) + (when (unencodable-char-position (point-min) (point-max) 'utf-8) + (setq coding-system-for-write 'utf-8-emacs)) + (goto-char (point-min)) + (insert (format ";;-*-coding: %s;-*-\n\n" coding-system-for-write)) + (write-region nil nil autocorrect-abbrev-file 0))) + (setq autocorrect-abbrev-table--saved-version + (abbrev-table-get autocorrect-abbrev-table + :abbrev-table-modiff)))) +#+end_src + +Now we can write the update function that's run on a live spelling correction. + +#+begin_src emacs-lisp +(defun autocorrect-record-correction (misspelling corrected) + "Record the correction of MISPELLING to CORRECTED." + (let ((write-region-inhibit-fsync t) ; Quicker writes + (coding-system-for-write 'utf-8) + (inhibit-message t)) + (write-region + (concat misspelling " " corrected "\n") nil + autocorrect-history-file t)) + (when (and (>= (autocorrect-update-table misspelling corrected) + autocorrect-count-threshold-session) + (= (length (gethash misspelling autocorrect-record-table)) + 1)) + (define-abbrev autocorrect-abbrev-table misspelling corrected) + (message "Created new autocorrection: %s ⟶ %s" + (propertize misspelling 'face 'warning) + (propertize corrected 'face 'success)))) +#+end_src + +The only thing left to be done now is load the history file. I think I'd like to +split the actual reading and the abbrev generation into two parts though. + +#+begin_src emacs-lisp +(defun autocorrect--read-history () + "Read the history file into the correction table." + (if (file-exists-p autocorrect-history-file) + (with-temp-buffer + (insert-file-contents autocorrect-history-file) + (goto-char (point-min)) + (while (< (point) (point-max)) + (let ((pt (point)) + misspelling corrected) + (setq misspelling + (and (forward-word) + (buffer-substring pt (point))) + pt (1+ (point))) + (setq corrected + (and (forward-word) + (buffer-substring pt (point))) + pt (point)) + (when (and misspelling corrected) + (autocorrect-update-table misspelling corrected)) + (forward-line 1)))) + (make-directory (file-name-directory autocorrect-history-file)) + (write-region "" nil autocorrect-history-file))) + +(defun autocorrect--remove-invalid-abbrevs () + "Ensure that all entries of the abbrev table are valid." + (obarray-map + (lambda (misspelling) + (when (stringp misspelling) ; Abbrev's obarrays start with a symbol + (let ((corrections (gethash misspelling autocorrect-record-table))) + (unless (and (= (length corrections) 1) + (>= (cdar corrections) + autocorrect-count-threshold-history)) + (define-abbrev autocorrect-abbrev-table misspelling nil))))) + autocorrect-abbrev-table)) + +(defun autocorrect--create-history-abbrevs () + "Apply the history threshold to the current correction table." + (maphash + (lambda (misspelling corrections) + (when (and (= (length corrections) 1) + (>= (cdar corrections) + autocorrect-count-threshold-history)) + (unless (obarray-get autocorrect-abbrev-table misspelling) + (define-abbrev autocorrect-abbrev-table + misspelling (caar corrections))))) + autocorrect-record-table)) + +(defun autocorrect-setup () + "Read and process the history file into abbrevs." + (autocorrect--read-history) + (autocorrect--setup-abbrevs) + (autocorrect--remove-invalid-abbrevs) + (autocorrect--create-history-abbrevs)) +#+end_src + +We don't want to load the history eagerly, but we do want it available soon +after startup. I think an idle timer would be a good way to do this. + +#+begin_src emacs-lisp +(run-with-idle-timer 0.5 nil #'autocorrect-setup) +#+end_src + +----- + +There we go, that's a complete self-managing abbrev-run frequent-misspelling +correction system. We can hook this up to Jinx by taking note of a helpful [[https://github.com/minad/jinx/wiki#save-misspelling-and-correction-as-abbreviation][code +snippet]] in the Jinx wiki for immediately saving all corrected misspellings into +the global abbrev list. + +#+begin_src emacs-lisp +(defun autocorrect-record-jinx-correction (overlay corrected) + (let ((text + (buffer-substring-no-properties + (overlay-start overlay) + (overlay-end overlay)))) + (autocorrect-record-correction text corrected))) + +(advice-add 'jinx--correct-replace :before #'autocorrect-record-jinx-correction) +#+end_src + **** Downloading dictionaries Let's get a nice big dictionary from [[http://app.aspell.net/create][SCOWL Custom List/Dictionary Creator]] with