Did you try carefully studying documentation that has tones of examples for probably all your questions? For example, the last one 3) is clearly answered in examples of NetTrain and related guide:
http://reference.wolfram.com/language/ref/NetTrain.html
http://reference.wolfram.com/language/guide/NeuralNetworks.html
Did you actually read through all examples in Predict docs to get a sense?
http://reference.wolfram.com/language/ref/Predict.html