One transposed convolution works in a similar way to a traditional convolution. It is usually used when we want to obtain an output map with a spatial dimensionality (width and height) larger than the input one so that this mapping between input~output is learned (through filters/kernels) in the best possible way for the problem in question. Some common examples that use deconvolution are convolutional networks for segmentation problems (e.g., segmenting cars, pedestrians, sidewalks in a scenario of self-steerable vehicles).
(I extracted these gifs and part of the explanation of this Stack Exchange response)
Considering a deconv with Stride of 1 (i.e., by jumping from 1 to 1 unit on the input map), a 2x2 input map (blue) and a single 3x3 kernel (gray moving through the input map), the result will be the 4x4 (green) output map. The white regions are paddling 0 (an edge of 0 so that the convolution can be calculated).
Considering a deconv with Stride of 2, we would have:
In this case, Stride determines whether there is a spacing between each unit of the input map.
Now on to the Keras parameters:
- Filters: the number of filters/kernels that will be learned in this layer. In the examples I gave, we were only passing 1 filter. This parameter will also set the number of channels in the output map. If, in your case, you want the output map to be 3, then
filters = 3
;
- kernel_size: the size of the filter in the same way as in a convolution. In the example, the kernel was 3x3 in size;
- Strides: the jump/spacing that will be used;
- padding: "Valid" or "same", indicates whether or not to use padding / zero border around the input map. " Valid" implies that the kernel will only be positioned at a valid position (i.e., where all kernel positions fall on top of an input map position). In practice, "Valid" may disregard some pixels on the edges of the input map. " same" will cause padding to be added needed to position the filter at the first position of the input map.
- output_padding: is just a padding that is added around the exit map.
The other parameters are common to convolutional layers.
At the end of the link you sent in the question, there is a calculation to know the size of the final map, given the initial map and the parameters:
new_rows = ((rows - 1) * strides[0] + kernel_size[0]
- 2 * padding[0] + output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1]
- 2 * padding[1] + output_padding[1])
For example, make the input map 4x4x3
turn into a 12x12x3
, you can use:
output = Conv2DTranspose(filters=3, kernel_size=(5,5), strides=2, padding="same")(x)
Thank you for the reply, it was very enlightening.
– Beto