Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: To ensure a balance between open access to justice and personal data
protection, the South Korean judiciary mandates the de-identification of court
judgments before they can be publicly disclosed. However, the current
de-identification process is inadequate for handling court judgments at scale
while adhering to strict legal requirements. Additionally, the legal
definitions and categorizations of personal identifiers are vague and not
well-suited for technical solutions. To tackle these challenges, we propose a
de-identification framework called Thunder-DeID, which aligns with relevant
laws and practices. Specifically, we (i) construct and release the first Korean
legal dataset containing annotated judgments along with corresponding lists of
entity mentions, (ii) introduce a systematic categorization of Personally
Identifiable Information (PII), and (iii) develop an end-to-end deep neural
network (DNN)-based de-identification pipeline. Our experimental results
demonstrate that our model achieves state-of-the-art performance in the
de-identification of court judgments.
Authors (5)
Sungeun Hahm
Heejin Kim
Gyuseong Lee
Hyunji Park
Jaejin Lee
Key Contributions
Proposes Thunder-DeID, an end-to-end de-identification framework for Korean court judgments that aligns with legal requirements. It introduces the first Korean legal dataset with annotated judgments and PII, a systematic PII categorization, and a DNN-based pipeline to address the challenges of de-identification at scale.
Business Value
Enables public disclosure of court judgments while protecting personal data, fostering transparency in the judiciary and compliance with privacy regulations.