Deduplicating databases of deaths in war: advances in adaptive blocking, pairwise classification, and clustering

30 mins 54 secs,  56.53 MB,  MP3  44100 Hz,  249.77 kbits/sec
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Ball, P (Human Rights Data Analysis Group, Human Rights Data Analysis Group)
Monday 12th September 2016 - 12:00 to 12:30
 
Created: 2016-09-15 15:58
Collection: Data Linkage and Anonymisation
Publisher: Isaac Newton Institute
Copyright: Ball, P
Language: eng (English)
Distribution: World     (downloadable)
Explicit content: No
Aspect Ratio: 16:9
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
 
Abstract: Violent inter-state and civil wars are documented with lists of the casualties, each of which constitutes a partial, non-probability sample of the universe of deaths. There are often several lists, with duplicate entries within each list and among the lists, requiring record linkage to dedeuplicate the lists to create a unique enumeration of the known dead.

This talk will explore how we do record linkage, including: new advances in generating and learning from training data; an adaptive blocking approach; pairwise classification with string, date, and integer features and several classifiers; and a hybrid clustering method. Assessment metrics will be proposed for each stage, with real-world results from deduplicating more than 420,000 records of Syrian people killed since 2011.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 449.60 MB View Download
WebM 640x360    647.32 kbits/sec 146.42 MB View Download
iPod Video 480x270    522.25 kbits/sec 118.07 MB View Download
MP3 * 44100 Hz 249.77 kbits/sec 56.53 MB Listen Download
Auto (Allows browser to choose a format it supports)