Smooth, Finite, and Convex Optimization Deep Learning Summer School thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Smooth, Finite, and Convex Optimization Deep Learning Summer School

Published on Sep 13, 20157633 Views

Related categories

Chapter list

Smooth, Finite, and Convex Optimization00:00
Context: Big Data and Big Models - 101:10
Context: Big Data and Big Models - 201:16
Common Framework: Empirical Risk Minimization - 101:25
Common Framework: Empirical Risk Minimization - 202:07
Common Framework: Empirical Risk Minimization - 302:18
Motivation: Why Learn about Convex Optimization? - 102:27
Motivation: Why Learn about Convex Optimization? - 202:45
Motivation: Why Learn about Convex Optimization? - 302:54
Motivation: Why Learn about Convex Optimization? - 403:03
Motivation: Why Learn about Convex Optimization? - 503:32
How hard is real-valued optimization? - 103:54
How hard is real-valued optimization? - 304:43
How hard is real-valued optimization? - 404:48
How hard is real-valued optimization? - 505:33
How hard is real-valued optimization? - 705:41
How hard is real-valued optimization? - 805:47
How hard is real-valued optimization? - 906:09
How hard is real-valued optimization? - 1006:16
How hard is real-valued optimization? - 1107:29
Convex Functions: Three Characterizations - 107:51
Convex Functions: Three Characterizations - 207:58
Convex Functions: Three Characterizations - 308:02
Convex Functions: Three Characterizations - 408:03
Convex Functions: Three Characterizations - 508:26
Convex Functions: Three Characterizations - 608:28
Convex Functions: Three Characterizations - 708:29
Convex Functions: Three Characterizations - 808:33
Convex Functions: Three Characterizations - 908:40
Convex Functions: Three Characterizations - 1008:46
Convex Functions: Three Characterizations - 1108:50
Convex Functions: Three Characterizations - 1208:54
Convex Functions: Three Characterizations - 1309:10
Convex Functions: Three Characterizations - 1409:12
Convex Functions: Three Characterizations - 1509:26
Convex Functions: Three Characterizations - 1609:27
Examples of Convex Functions - 110:06
Examples of Convex Functions - 210:21
Operations that Preserve Convexity - 110:30
Operations that Preserve Convexity - 210:44
Operations that Preserve Convexity - 310:51
Operations that Preserve Convexity - 410:57
Operations that Preserve Convexity - 510:59
Outline - 111:11
Motivation for Gradient Methods - 111:24
Motivation for Gradient Methods - 211:38
Motivation for Gradient Methods - 312:05
Motivation for Gradient Methods - 412:41
Logistic Regression with 2-Norm Regularization - 112:55
Logistic Regression with 2-Norm Regularization - 213:15
Logistic Regression with 2-Norm Regularization - 313:40
Properties of Lipschitz-Continuous Gradient - 114:21
Properties of Lipschitz-Continuous Gradient - 215:12
Properties of Lipschitz-Continuous Gradient - 315:38
Properties of Lipschitz-Continuous Gradient - 416:16
Properties of Lipschitz-Continuous Gradient - 516:17
Properties of Lipschitz-Continuous Gradient - 616:22
Properties of Lipschitz-Continuous Gradient - 716:33
Properties of Lipschitz-Continuous Gradient - 816:39
Properties of Lipschitz-Continuous Gradient - 916:44
Properties of Strong-Convexity - 116:54
Properties of Strong-Convexity - 217:05
Properties of Strong-Convexity - 317:16
Properties of Strong-Convexity - 417:17
Properties of Strong-Convexity - 517:18
Properties of Strong-Convexity - 617:27
Linear Convergence of Gradient Descent - 117:44
Linear Convergence of Gradient Descent - 217:52
Linear Convergence of Gradient Descent - 417:53
Linear Convergence of Gradient Descent - 317:54
Linear Convergence of Gradient Descent - 517:58
Linear Convergence of Gradient Descent - 618:02
Linear Convergence of Gradient Descent - 718:04
Linear Convergence of Gradient Descent - 818:29
Maximum Likelihood Logistic Regression - 119:22
Maximum Likelihood Logistic Regression - 219:31
Maximum Likelihood Logistic Regression - 319:45
Maximum Likelihood Logistic Regression - 419:46
Maximum Likelihood Logistic Regression - 519:56
Maximum Likelihood Logistic Regression - 619:57
Maximum Likelihood Logistic Regression - 720:42
Gradient Method: Practical Issues - 121:37
Gradient Method: Practical Issues - 221:51
Gradient Method: Practical Issues - 322:07
Gradient Method: Practical Issues - 422:26
Accelerated Gradient Method - 124:15
Accelerated Gradient Method - 324:23
Accelerated Gradient Method - 424:47
Accelerated Gradient Method - 525:24
Newton's Method - 125:48
Newton's Method - 226:03
Newton's Method - 326:43
Newton's Method - 427:05
Newton's Method - 527:06
Newton's Method - 727:07
Newton's Method - 827:21
Newton's Method - 927:27
Convergence Rate of Newton’s Method - 127:35
Convergence Rate of Newton’s Method - 228:03
Newton’s Method: Practical Issues - 128:16
Numercial Comparison32:44
Newton’s Method: Practical Issues - 236:02
Outline - 236:33
Big-N Problems - 136:34
Big-N Problems - 236:38
Stochastic vs. Deterministic Gradient Methods - 137:05
Stochastic vs. Deterministic Gradient Methods - 237:11
Stochastic vs. Deterministic Gradient Methods - 337:33
Stochastic vs. Deterministic Gradient Methods - 437:46
Stochastic vs. Deterministic Gradient Methods - 538:13
Stochastic vs. Deterministic Gradient Methods - 638:19
Stochastic vs. Deterministic Gradient Methods - 738:30
Stochastic vs. Deterministic Gradient Methods - 838:35
Stochastic vs. Deterministic Gradient Methods - 939:42
Stochastic vs. Deterministic Convergence Rates41:33
Stochastic vs. Deterministic for Non-Smooth - 143:28
Stochastic vs. Deterministic for Non-Smooth - 343:38
Stochastic vs. Deterministic for Non-Smooth - 448:14
Sub-Gradients and Sub-Differentials - 148:32
Sub-Gradients and Sub-Differentials - 248:42
Sub-Gradients and Sub-Differentials - 348:52
Sub-Gradients and Sub-Differentials - 449:16
Sub-Gradients and Sub-Differentials - 549:19
Sub-Gradients and Sub-Differentials - 649:20
Sub-Gradients and Sub-Differentials - 749:24
Sub-Gradients and Sub-Differentials - 849:25
Sub-Gradients and Sub-Differentials - 949:27
Sub-Gradients and Sub-Differentials - 1049:28
Sub-Gradients and Sub-Differentials - 1149:39
Sub-Differential of Absolute Value and Max Functions - 150:09
Sub-Differential of Absolute Value and Max Functions - 250:21
Sub-Differential of Absolute Value and Max Functions - 350:22
Sub-Differential of Absolute Value and Max Functions - 450:23
Sub-Differential of Absolute Value and Max Functions - 550:25
Sub-Differential of Absolute Value and Max Functions - 650:27
Sub-Differential of Absolute Value and Max Functions - 750:28
Sub-Differential of Absolute Value and Max Functions - 850:29
Sub-Differential of Absolute Value and Max Functions - 950:32
Sub-Differential of Absolute Value and Max Functions - 1050:33
Sub-Differential of Absolute Value and Max Functions - 1150:37
Sub-Differential of Absolute Value and Max Functions - 1250:39
Subgradient and Stochastic Subgradient methods - 150:57
Subgradient and Stochastic Subgradient methods - 251:05
Subgradient and Stochastic Subgradient methods - 351:25
Subgradient and Stochastic Subgradient methods - 451:54
Stochastic Subgradient Methods in Practice - 152:12
Stochastic Subgradient Methods in Practice - 252:42
Speeding up Stochastic Subgradient Methods - 155:15
Stochastic Subgradient Methods in Practice - 358:42
Speeding up Stochastic Subgradient Methods - 259:32
Speeding up Stochastic Subgradient Methods - 301:03:27
Stochastic Newton Methods? - 101:04:39
Stochastic Newton Methods? - 201:04:53
Outline - 301:10:33
Big-N Problems - 101:11:00
Big-N Problems - 201:11:06
Big-N Problems - 301:11:14
Big-N Problems - 401:11:28
Motivation for Hybrid Methods - 101:11:35
Motivation for Hybrid Methods - 201:11:37
Hybrid Deterministic-Stochastic - 101:11:50
Hybrid Deterministic-Stochastic - 201:11:52
Hybrid Deterministic-Stochastic - 301:11:58
Approach 1: Batching - 101:12:26
Approach 1: Batching - 201:12:27
Approach 1: Batching - 301:12:29
Approach 1: Batching - 401:12:30
Stochastic Average Gradient - 101:12:42
Stochastic Average Gradient - 201:12:56
Stochastic Average Gradient - 301:12:58
Stochastic Average Gradient - 401:13:14
Stochastic Average Gradient - 501:13:15
Stochastic Average Gradient - 601:13:40
Stochastic Average Gradient - 701:13:44
Convergence Rate of SAG - 101:14:03
Convergence Rate of SAG - 201:14:13
Rate of Convergence Comparison - 101:14:55
Rate of Convergence Comparison - 201:14:56
Rate of Convergence Comparison - 301:14:57
Rate of Convergence Comparison - 401:14:58
Rate of Convergence Comparison - 501:14:58
Rate of Convergence Comparison - 601:15:23
Rate of Convergence Comparison - 701:15:37
Rate of Convergence Comparison - 801:15:38
Rate of Convergence Comparison - 901:15:45
Rate of Convergence Comparison - 1001:15:51
Rate of Convergence Comparison - 1101:15:52
Comparing Deterministic and Stochatic Methods01:17:25
SAG Compared to FG and SG Methods01:17:46
Other Linearly-Convergent Stochastic Methods - 101:18:05
Other Linearly-Convergent Stochastic Methods - 201:18:20
SAG Implementation Issues - 101:18:51
SAG Implementation Issues - 201:18:53
SAG Implementation Issues - 301:19:26
Reshuffling and Non-Uniform Sampling - 101:19:31
Reshuffling and Non-Uniform Sampling - 201:19:38
Reshuffling and Non-Uniform Sampling - 301:21:37
Reshuffling and Non-Uniform Sampling - 401:21:57
Reshuffling and Non-Uniform Sampling - 501:22:01
Reshuffling and Non-Uniform Sampling - 601:22:13
Reshuffling and Non-Uniform Sampling - 701:23:45
SAG with Adaptive Non-Uniform Sampling - 101:24:36
SAG with Adaptive Non-Uniform Sampling - 201:25:07
SAG with Mini-Batches - 101:27:55
SAG with Mini-Batches - 201:28:04
SAG with Mini-Batches - 301:28:11
Minimizing Finite Sums: Dealing with the Memory - 101:28:33
Minimizing Finite Sums: Dealing with the Memory - 201:28:36
Minimizing Finite Sums: Dealing with the Memory - 301:28:37
Minimizing Finite Sums: Dealing with the Memory - 401:28:39
Minimizing Finite Sums: Dealing with the Memory - 501:28:41
Stochastic Variance-Reduced Gradient - 101:28:42
Stochastic Variance-Reduced Gradient - 201:28:44
Stochastic Variance-Reduced Gradient - 301:28:44
Stochastic Variance-Reduced Gradient - 401:29:40
Summary - 101:29:53
Summary - 201:30:12