Data perturbation for data science

44 mins 41 secs, 81.76 MB, MP3 44100 Hz, 249.81 kbits/sec

Share this media item:

Embed this media item:

<iframe width="_width_" height="_height_" src="https://sms.cam.ac.uk/media/2779250/embed" frameborder="0" scrolling="no" allowfullscreen></iframe>

Choose size:

About this item

Available Formats

About this item

Description:	Samworth, R Friday 29th June 2018 - 11:00 to 11:45


Created:	2018-06-29 17:05
Collection:	Statistical scalability
Publisher:	Isaac Newton Institute
Copyright:	Samworth, R
Language:	eng (English)
Distribution:	World (downloadable)
Explicit content:	No
Aspect Ratio:	16:9
Screencast:	No
Bumper:	UCS Default
Trailer:	UCS Default


Abstract:	When faced with a dataset and a problem of interest, should we propose a statistical model and use that to inform an appropriate algorithm, or dream up a potential algorithm and then seek to justify it? The former is the more traditional statistical approach, but the latter appears to be becoming more popular. I will discuss a class of algorithms that belong in the second category, namely those that involve data perturbation (e.g. subsampling, random projections, artificial noise, knockoffs,...). As examples, I will consider Complementary Pairs Stability Selection for variable selection and sparse PCA via random projections. This will involve joint work with Rajen Shah, Milana Gataric and Tengyao Wang.

Abstract:

When faced with a dataset and a problem of interest, should we propose a statistical model and use that to inform an appropriate algorithm, or dream up a potential algorithm and then seek to justify it? The former is the more traditional statistical approach, but the latter appears to be becoming more popular. I will discuss a class of algorithms that belong in the second category, namely those that involve data perturbation (e.g. subsampling, random projections, artificial noise, knockoffs,...). As examples, I will consider Complementary Pairs Stability Selection for variable selection and sparse PCA via random projections. This will involve joint work with Rajen Shah, Milana Gataric and Tengyao Wang.

Available Formats

Format	Quality	Bitrate	Size
MPEG-4 Video	640x360	1.94 Mbits/sec	650.26 MB	View	Download
WebM	640x360	390.28 kbits/sec	127.63 MB	View	Download
iPod Video	480x270	522.14 kbits/sec	170.75 MB	View	Download
MP3 *	44100 Hz	249.81 kbits/sec	81.76 MB	Listen	Download
Auto	(Allows browser to choose a format it supports)

Streaming Media Service Upload

Data perturbation for data science