DataSHIELD: taking the analysis to the data not the data to the analysis

46 mins 59 secs,  85.94 MB,  MP3  44100 Hz,  249.74 kbits/sec
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Burton, P (University of Bristol, University of Bristol)
Thursday 8th December 2016 - 14:15 to 15:00
 
Created: 2016-12-19 12:30
Collection: Data Linkage and Anonymisation
Publisher: Isaac Newton Institute
Copyright: Burton, P
Language: eng (English)
Distribution: World     (downloadable)
Explicit content: No
Aspect Ratio: 16:9
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
 
Abstract: Research in modern biomedicine and social science often requires sample sizes so large that they can only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important governance questions and can be controversial. These reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that circumvents some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. Commands are sent from a central analysis computer (AC) to several data computers (DCs) that store the data to be co-analysed. Each DC is located at one of the studies contributing data to the analysis. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands that are transmitted back and forth between the DCs and the AC. Technical implementation of DataSHIELD employs a specially modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is then controlled through a standard R environment at the AC. DataSHIELD is most often configured to carry out a – typically fully-efficient – analysis that is mathematically equivalent to placing all data from all studies in one central database and analysing them all together (with centre-effects, of course, where required). Alternatively, it can be set up for study-level meta-analysis: estimates and standard errors are derived independently from each study and are subject to centralized random effects meta-analysis at the AC. DataSHIELD is being developed as a flexible, easily extendible, open-source way to provide secure data access to a single study or data repository as well as for settings involving several studies. Although the talk will focus on the version of DataSHIELD that represents our current standard implementation, it will also explore some of our recent thinking in relation to issues such as vertically partitioned (record linkage) data, textual data and non-disclosive graphical visualisation.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 683.62 MB View Download
WebM 640x360    906.66 kbits/sec 311.78 MB View Download
iPod Video 480x270    522.23 kbits/sec 179.52 MB View Download
MP3 * 44100 Hz 249.74 kbits/sec 85.94 MB Listen Download
Auto (Allows browser to choose a format it supports)