How to get the caret position in ContentEditable elements with respect to the innerText?

I am aware that there are a lot of existing questions about getting the caret position in ContentEditable elements. Almost all of those existing solutions provide the caret position with respect to the textContent.
Some examples are this one or this one.

We are currently developing two WebExtensions to do autocorrection as the user types. For example, if the user types :), it could autocorrect it to :grinning:. For the autocorrection to work, it needs to get the caret position with respect to the innerText. The textContent can include whitespace characters and other differences that do not actually appear when the text is rendered, which breaks the autocorrect feature.

Our current method is partially adapted from this answer:, which provides the caret position with respect to the innerHTML. It clones the element, inserts the null character, determines the index and then removes the null character:

// document.designMode is handled the same, see
if (target.isContentEditable || document.designMode === "on") { 
     const _range = document.getSelection().getRangeAt(0); 
     if (!_range.collapsed) { 
         return null; 
     const range = _range.cloneRange(); 
     const temp = document.createTextNode("\0"); 
     const caretposition = target.innerText.indexOf("\0"); 
     return caretposition; 

See our original source code for the full context. This method seems to work fine on 99% of websites, but breaks on Twitter. The cursor is constantly reset to the beginning of the line as the user is typing, which scrambles the text (see the corresponding issue for more information). We are guessing that Twitter does not like the null character, but we tried with other nonprinting characters and had the same issue.

We are looking for another method to determine the caret position with respect to the innerText that will work on all websites, including on Twitter. It needs to support recent versions of both Firefox and Chrome, including Firefox ESR. It also needs to be performant, since it runs on every keypress.

Corss-posted on Stackoverflow, please also have a look at the answers there, or feel free to answer there.

Bug report suggesting adding new APIs for that use case:

1 Like