Webpage Understanding: an Integrated Approach  thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Webpage Understanding: an Integrated Approach

Published on Sep 14, 20076578 Views

Recent work has shown the effectiveness of leveraging layout and tag-tree structure for segmenting webpages and labeling HTML elements. However, how to effectively segment and label the text contents

Related categories

Chapter list

Webpage Understanding: an Integrated Approach00:03
Outline00:32
Motivating Examples00:50
Characteristics of Webpage02:10
Tasks of Web Data Extraction03:39
slide 604:49
Existing Attempts – De-coupled Approaches04:57
Disadvantages05:36
Why no integrated approach?06:16
Outline07:10
Statistical Web Structure Mining Model (KDD 2006)07:33
Integrated Webpage Understanding Model08:50
Factorized Distribution09:59
Separate Learning13:04
Outline13:26
Experiments13:33
Extraction Accuracy14:20
NP-Chunking Features15:02
Conclusions & Future Work15:35