Trusted answers to developer questions

Saifullah Shakeel

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

When we train models, we iterate over the training samples, make predictions about the training samples, and estimate the error between the predicated label and the real label. Next, we update the weights using the gradient of the error with respect to the weights. Usually, in deep models, we have to multiply a lot of terms in order to calculate the gradient. Two problems arise due to this approach.

Suppose, we have two vectors, both of which have all of the values greater than **exploding gradient problem**.

vector1 = [1.3,2.1, 1.73,0.42,1.25]vector2 = [1.26,1.35,2.58,2.81,1.32]resultant = vector1for _ in range(5):for i in range(len(vector1)):resultant[i] = resultant[i]*vector2[i]print("Final product of vector1*(vector2)^5")print(resultant)

Consider a case where we have two vectors having all values less than one. Once we start multiplying such vectors, the values of the resultant vector will start shrinking. This problem is called the **vanishing gradient problem**.

vector1 = [0.3,0.1,0.73,0.42,0.25]vector2 = [0.26,0.35,0.58,0.81,0.32]resultant = vector1for _ in range(5):for i in range(len(vector1)):resultant[i] = resultant[i]*vector2[i]print("Final product of vector1*(vector2)^5")print(resultant)

Every time we multiply two vectors, we check whether the resultant vector is above a threshold parameter, and we normalize the values by the norm of the vector. It prevents the resultant from exploding in the next multiplication and ensures a good training process. This technique mostly solves the problem of exploding gradients.

Logically:

```
if resultant > threshold:
resultant = resultant / ||resultant||
```

Where `||resultant||`

represents the norm of the vector which can be *L1*, *L2* or any other norm.

### Tensorflow syntax ###tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)### Pytorch syntax ###torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0,error_if_nonfinite=False)

Tensorflow syntax for gradient clipping

RELATED TAGS

gradient clipping

CONTRIBUTOR

Saifullah Shakeel

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring

Related Courses