Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Writing is a cognitively demanding activity that requires constant
decision-making, heavy reliance on working memory, and frequent shifts between
tasks of different goals. To build writing assistants that truly align with
writers' cognition, we must capture and decode the complete thought process
behind how writers transform ideas into final texts. We present ScholaWrite,
the first dataset of end-to-end scholarly writing, tracing the multi-month
journey from initial drafts to final manuscripts. We contribute three key
advances: (1) a Chrome extension that unobtrusively records keystrokes on
Overleaf, enabling the collection of realistic, in-situ writing data; (2) a
novel corpus of full scholarly manuscripts, enriched with fine-grained
annotations of cognitive writing intentions. The dataset includes \LaTeX-based
edits from five computer science preprints, capturing nearly 62K text changes
over four months; and (3) analyses and insights into the micro-dynamics of
scholarly writing, highlighting gaps between human writing processes and the
current capabilities of large language models (LLMs) in providing meaningful
assistance. ScholaWrite underscores the value of capturing end-to-end writing
data to develop future writing assistants that support, not replace, the
cognitive work of scientists.
Authors (6)
Khanh Chi Le
Linghe Wang
Minhwa Lee
Ross Volkov
Luan Tuyen Chau
Dongyeop Kang
Submitted
February 5, 2025
Key Contributions
This paper introduces ScholaWrite, the first dataset capturing the end-to-end scholarly writing process over several months. It includes a novel method for collecting realistic, in-situ writing data via a Chrome extension on Overleaf and a corpus of LaTeX-based edits enriched with cognitive writing intention annotations, providing valuable insights into the micro-dynamics of scholarly writing.
Business Value
Enables the development of more sophisticated AI-powered writing assistants that can better understand and support researchers throughout the entire writing lifecycle, potentially improving productivity and quality in academic publishing.