Big data integration: challenges and new approaches

56 mins 19 secs,  266.98 MB,  WebM  640x360,  29.97 fps,  44100 Hz,  647.25 kbits/sec
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Rahm, E (Universität Leipzig)
Wednesday 14th September 2016 - 09:00 to 10:00
 
Created: 2016-09-21 16:45
Collection: Data Linkage and Anonymisation
Publisher: Isaac Newton Institute
Copyright: Rahm, E
Language: eng (English)
Distribution: World     (downloadable)
Explicit content: No
Aspect Ratio: 16:9
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
 
Abstract: Data integration is a key challenge for Big Data applications to semantically enrich and combine large sets of heterogeneous data for enhanced data analysis. In many cases, there is also a need to deal with a very high number of data sources, e.g., product offers from many e-commerce websites. We will discuss approaches to deal with the key data integration tasks of (large-scale) entity resolution and schema matching. In particular, we discuss parallel blocking and entity resolution on Hadoop platforms together with load balancing techniques to deal with data skew. We also discuss challenges and recent approaches for holistic data integration of many data sources, e.g., to create knowledge graphs or to make use of huge collections of web tables.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 820.00 MB View Download
WebM * 640x360    647.25 kbits/sec 266.98 MB View Download
iPod Video 480x270    522.21 kbits/sec 215.40 MB View Download
MP3 44100 Hz 249.76 kbits/sec 103.11 MB Listen Download
Auto (Allows browser to choose a format it supports)