encoder/decoder/discriminator composition

Unless there is some hidden reasoning, the current composition of the three components seems rather random. In particular:

encoder: narrow layer of 256 neurons followed by wider one (256)
decoder: even wider layer first (1024) followed by two narrower (512)
(consequently): asymmetric encoder vs. decoder, which should not happen
discriminator: given its very narrow input, it's pointless to make it so huge (512-256-256)
seemingly random mix of activation functions
specifically, is there a reason for using sigmoid? It's well known to slow down/block convergence in many cases

For all these we either need strong reasons, or replace them with something more regular if we just don't know.

Moreover, IMHO the sizes of layers should not be fixed but they should reflect somehow the size of input

Edited Aug 10, 2022 by Aleš Křenek