Is there any time complexity analysis depending on number of parameters for forward propagation and backpropagation? In simpler terms, given an architecture, can we comment whether forward propagation takes larger amount of time or backpropagation takes larger amount of time in PyTorch implementation?

Number of parameters is not determinant.

You can re-use a network as many times as u want, which implies a more complex gradient computation.

Besides gradient computation depends on the function used. There are gradients cheaper to compute than others.