Pytorch 自定義前饋反饋函式
阿新 • • 發佈:2021-01-12
torch.autograd.Function
- given a random x, y, W1, W2.
y = ( W1 * x) * W2, to predict y with input x using gradient descent by minimizing squared Eculidean distance. - We redefine ReLU and achieve the forward pass and backward pass.
import torch class MyReLU(torch.autograd.Function): """ We can implement our own custom autograd Functions by subclassing torch.autograd.Function and implementing the forward and backward passes which operate on Tensors. """ @staticmethod def forward(ctx, input): """ In the forward pass we receive a Tensor containing the input and return a Tensor containing the output. ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method. """ ctx.save_for_backward(input) return input.clamp(min=0) @staticmethod def backward(ctx, grad_output): """ In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output, and we need to compute the gradient of the loss with respect to the input. """ input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input<0] = 0 return grad_input dtype = torch.float device = torch.device("cpu") # device = torch.device("cuda:0") # Uncomment this to run on GPU # torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU # The above line disables TensorFloat32. This a feature that allows # networks to run at a much faster speed while sacrificing precision. # Although TensorFloat32 works well on most real models, for our toy model # in this tutorial, the sacrificed precision causes convergence issue. # For more information, see: # https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold input and outputs. x = torch.randn(N, D_in, device=device, dtype=dtype) y = torch.randn(N, D_out, device=device, dtype=dtype) # Create random Tensors for weights. w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True) w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True) lr = 1e-6 relu = MyReLU.apply for i in range(500): y_pred = relu(x.mm(w1)).mm(w2) loss = (y_pred-y).pow(2).sum() if i % 100 == 99: print(i, loss.item()) loss.backward() # 引數的更新常規所使用的是`optim.step()`,去對定義在`optim`裡面的`model.parameters()`這裡進行更新 # 由於這裡我們不使用優化器,因此這裡直接手動進行更新,注意這裡已經不需要算梯度了,只是把已經算好的梯度進行更新 with torch.no_grad(): w1-=lr*w1.grad w2-=lr*w2.grad w1.grad.zero_() w2.grad.zero_()
輸出結果:
99 952.6715087890625
199 6.376166820526123
299 0.06997707486152649
399 0.0012868450721725821
499 0.00012174161383882165