What happens when you type a single letter into an “ordinary” text box

This morning I saw a tweet (a very unofficial one, more like a rumour) about Slack using ProseMirror for its new message box...

... turns out that’s not the case, as it’s still Quill.js under the hood (Slack does use ProseMirror’s cousin, CodeMirror, for the code snippets), but it lead to a short exploration of what modern editors do. Specifically (and very unscientifically), how many things happen when you focus the text area, press the key “a” and see the letter rendered in the browser.

Let's start with the already mentioned team chat behemoth, Slack.

Slack's message box timeline

Whoah. That’s a lot of stuff. It takes almost 4 frames to render (if aiming for 60 fps), and the call tree right at the bottom there is around 80 levels deep! Of course Slack has to do all sorts of processing on text entry, from mentions to formatting to commands, but it’s still surprising how hundreds of function calls are needed to get that one letter to render (and we're even ignoring a whole other universe of things that need to happen from the physical keypress to a fired keypress event in the browser)!

Alright, so Slack's text input processing is intense, to say the least, but how about some context? How does this compare? Let's take a look at Atlassian's Editor.

Atlassian's Editor timeline

That's a bit more understandable on the surface of it: the call tree's a bit flatter and there are also some patterns emerging with the Xo calls at the bottom there (it's React Fiber's beginWork). In fact, only the narrow light yellowy bit at the beginning is ProseMirror (which powers Atlassian's editor text editing bits), all of the green stuff is React. So it does look like you could take this apart and understand it a bit more easily, compared to Slack's implementation (it being open source also helps!). The downside is that it is slower still, taking a good 7 frames to render that letter. 🤷‍♂️

OK, how about something completely different, i.e. not built with a contenteditable under the hood – let's look at Google Docs (its editor, Kix, in a very broad sense, processes events and produces DOM nodes by itself, without using contenteditable)!

Google Docs timeline

Hmm, ok. There's a lot going on here as well – when examined a bit, it boils down to an edit-applied event being dispatched and several things responding to it. For example, the “stalactite” before last is the one that actually creates a text node and sets its content. A lot of this is layout and styles, setting the spaces, moving the cursor (in Google Docs the cursor is just an absolutely positioned div 😱), things that would be dealt with internally by contenteditable if used. Despite all that, it looks like Google Docs doesn't even skip a frame while processing that keypress - being the only “big” editor listed here to be this fast. Remarkable!

So we've seen how much beefy editors do for a single keystroke, how about the simplest ones? For example, the basic ProseMirror example:

Aaah, lovely! Some addTextNode, insertText, some updateState and updateChildren, et violà! The letter appears in about 10 ms, and all is well. Very nice.

Simpler, you say? Why yes! How about your basic dependable contenteditable, no magic attached:

No words needed.


And that is it, my dear friends, for this short trip down editor lane. While the above timelines are purely exploratory, they do raise interesting questions about the architecture and performance of modern web-based text editors. How close could we come to that simplicity and speed of a pure contenteditable while having to support the many features that the big editors need? And ... is it even worth to try?