2020-02-17 Meeting 1
2020-02-29 Meeting 2
2020-03-07 Meeting 3
csua
index
pytorch
seq2seq
umtn
website
- The tutorial has links to Jupyter notebooks and Google Colab notebooks!
- If running in Colab, be sure to enable the GPU (Runtime > Change runtime type)
Summaries / key takeaways
Make sure you know how to do the following:
- How to construct empty, random, all-ones, and all-zeros tensors
- How to specify tensor datatype
- Create a tensor with the same shape as another tensor, but a different datatype
- Look up tensor operations
- Add, subtract and multiply tensors
- Convert between Numpy arrays and tensors
- Move tensors between CPU and GPU
- Useful takeaways from this tutorial:
with torch.no_grad()
requires_grad
determines whether the gradient is computed.
- Don’t know how important the other stuff is. Part 3 will go into how weight updates etc. is done in practice.
- Notice how they define the network as a class.
- Dense layers are called “Linear”
- What the network actually does is in “forward”
- Layers that don’t need weights (e.g. max_pool) aren’t members of the class
- E.g.
F.max_pool2d
is used in forward
but not defined in __init__
.parameters()
gets weights
- Using
zero_grad
is important at the start of each training loop. I copied the training loop from the end of the document, it’s worth remembering.
optimizer.zero_grad() # Zero gradients
output = net(input)
loss = criterion(output, target) # Compute loss
loss.backward() # Backprop
optimizer.step() # Update step
- DataLoader and Dataset. Basically, create a Dataset that provides the data, and DataLoader can do shuffling and batching on that dataset.
- DataParallel splits batches and distributes them among GPUs. This can be useful since CSUA allows 2 GPUs per person by default!