Controlling Leakage and Disclosure Risk in Semantic Big Data pipelines thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Controlling Leakage and Disclosure Risk in Semantic Big Data pipelines

Published on Jul 28, 20161721 Views

In many Big Data environments, information is made available as huge data streams, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities.

Related categories

Chapter list

Controlling Leakage and Disclosure Risk in Seman6c Big Data pipelines00:00
Outline00:20
Big data initiative00:48
SESAR LAB01:50
Some activities02:02
Vision04:42
From classic datawarehouse to Big Data - 106:06
From classic datawarehouse to Big Data - 207:33
Internal vs. External data sources08:51
Processing Models09:20
Data Models11:24
Designing Data Representations for Big Data Applications12:23
Relational denormalization refresher15:54
Denormalization backsides17:08
Memcache17:43
Low-level representation18:40
Key-value reminder19:38
Example20:34
Denormalized Example20:57
Consensus22:04
Data Batch processing: Map/Reduce23:55
Practical MapReduce = HDFS+Hadoop29:01
Risk and Threats30:21
Risk Components30:22
Big Data Threats: Breach30:57
Big Data Threats: Leak33:09
Big Data Threats: Degradation33:47
Big Data Threats as APTs34:52
The Silos problem35:35
Data representa.on37:12
Breaking the Silos37:59
Tradeoffs38:38
Transparent De-normalization38:44
De-normalization gray area39:15
Degradation via faulty values40:02
Need for a de-normalization index40:47
From D-index to disclosure probability - 141:25
From D-index to disclosure probability - 241:46
Need for an accrual consensus index43:06
Independent interpretations of Φ44:47
Leak vs Breach vs Degradation revisited45:03
Some ideas47:04
Table47:29
A practical example48:18
Data structure48:20
Our dataset in neo4j49:31
Achieving the desired K-anonymity50:42
Segmentation51:03
Problem51:43
A possible countermeasure: Redundant Relations53:36
Secret54:20
Not a panacea w.r.t. distributions checks55:11
Hashing55:32
Technology cannot do it alone55:34
References57:12