Data perturbation for data science

44 mins 41 secs,  81.76 MB,  MP3  44100 Hz,  249.81 kbits/sec
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Samworth, R
Friday 29th June 2018 - 11:00 to 11:45
 
Created: 2018-06-29 17:05
Collection: Statistical scalability
Publisher: Isaac Newton Institute
Copyright: Samworth, R
Language: eng (English)
Distribution: World     (downloadable)
Explicit content: No
Aspect Ratio: 16:9
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
 
Abstract: When faced with a dataset and a problem of interest, should we propose a statistical model and use that to inform an appropriate algorithm, or dream up a potential algorithm and then seek to justify it? The former is the more traditional statistical approach, but the latter appears to be becoming more popular. I will discuss a class of algorithms that belong in the second category, namely those that involve data perturbation (e.g. subsampling, random projections, artificial noise, knockoffs,...). As examples, I will consider Complementary Pairs Stability Selection for variable selection and sparse PCA via random projections. This will involve joint work with Rajen Shah, Milana Gataric and Tengyao Wang.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 650.26 MB View Download
WebM 640x360    390.28 kbits/sec 127.63 MB View Download
iPod Video 480x270    522.14 kbits/sec 170.75 MB View Download
MP3 * 44100 Hz 249.81 kbits/sec 81.76 MB Listen Download
Auto (Allows browser to choose a format it supports)