Deduplicating databases of deaths in war: advances in adaptive blocking, pairwise classification, and clustering
30 mins 53 secs,
146.42 MB,
WebM
640x360,
29.97 fps,
44100 Hz,
647.32 kbits/sec
Share this media item:
Embed this media item:
Embed this media item:
About this item
Description: |
Ball, P (Human Rights Data Analysis Group, Human Rights Data Analysis Group)
Monday 12th September 2016 - 12:00 to 12:30 |
---|
Created: | 2016-09-15 15:58 |
---|---|
Collection: | Data Linkage and Anonymisation |
Publisher: | Isaac Newton Institute |
Copyright: | Ball, P |
Language: | eng (English) |
Distribution: | World (downloadable) |
Explicit content: | No |
Aspect Ratio: | 16:9 |
Screencast: | No |
Bumper: | UCS Default |
Trailer: | UCS Default |
Abstract: | Violent inter-state and civil wars are documented with lists of the casualties, each of which constitutes a partial, non-probability sample of the universe of deaths. There are often several lists, with duplicate entries within each list and among the lists, requiring record linkage to dedeuplicate the lists to create a unique enumeration of the known dead.
This talk will explore how we do record linkage, including: new advances in generating and learning from training data; an adaptive blocking approach; pairwise classification with string, date, and integer features and several classifiers; and a hybrid clustering method. Assessment metrics will be proposed for each stage, with real-world results from deduplicating more than 420,000 records of Syrian people killed since 2011. |
---|
Available Formats
Format | Quality | Bitrate | Size | |||
---|---|---|---|---|---|---|
MPEG-4 Video | 640x360 | 1.94 Mbits/sec | 449.60 MB | View | Download | |
WebM * | 640x360 | 647.32 kbits/sec | 146.42 MB | View | Download | |
iPod Video | 480x270 | 522.25 kbits/sec | 118.07 MB | View | Download | |
MP3 | 44100 Hz | 249.77 kbits/sec | 56.53 MB | Listen | Download | |
Auto | (Allows browser to choose a format it supports) |