Comprehension questions

These questions should be answerable using only the core resources above. Write your answers in a google doc.

****These questions should be answerable using only the core resources above. Write your answers to the following questions in a google doc.

  1. In a neural network, a _____ is a number that defines how much one neuron should influence another. Answer : Weight

  2. In a neural network, a _____ is a number that adds a constant amount to a neuron's activation. Answer: Bias

  3. What's the value of A, B and C in this neural network that uses the sigmoid function?

A=0.6456563 , B= 0.4255575 , C=0.93801480

  1. What does the cost function represent when training a neural network? Answer : The cost function also known as the loss function is a mathematical equation that involves the ground truth or target value and model’s prediction based on the objective function/ hypothesis. Both target value or ground truth and model’s prediction is input to the cost function, which outputs the loss or error in the model’s prediction. This error or loss is the relative metric which tells how much the model has learnt from the data. It’s not absolute meaning you can not compare loss of one model with loss of another model.

  2. Why does minimising the cost function improve the neural network's performance? Answer: Minimizing the cost function improves the neural network's performance because it reduces the difference between the predicted outputs and the actual target values. This is achieved through an iterative process where optimization algorithms, such as gradient descent, adjust the network's weights and biases. By calculating the gradient of the cost function with respect to these parameters, the network can make small adjustments that reduce the error. As the cost function value decreases, the predictions of the neural network become more accurate, leading to better performance on the task at hand. This process is fundamental to training neural networks and is known as backpropagation.

  3. How does gradient descent work - i.e. minimise the cost function? (at a high level) Answer: Gradient descent is an optimization algorithm used to minimize the cost function in neural networks. At a high level, it works by calculating the gradient (or derivative) of the cost function with respect to the network's parameters (weights and biases). The gradient indicates the direction and rate of the steepest increase in the cost function. By moving in the opposite direction of the gradient, we can reduce the cost. In each iteration, the weights and biases are updated by subtracting a fraction of the gradient (controlled by the learning rate), gradually leading to a minimum cost and improved network performance.

  4. Does gradient descent guarantee finding the best model? Why or why not? Answer: Gradient descent does not guarantee finding the best model due to issues like local minima, saddle points, the choice of learning rate, and the initial weights.

Advanced techniques such as momentum, adaptive learning rates (e.g., Adam, RMSprop), and ensemble methods are often used to improve the chances of finding a better model.

  1. Explain how this excerpt from our website might be turned into data to train an LLM: “Axiom Futures was founded in March 2024 in India

Answer: To turn this excerpt from the website into data to train an LLM, we will have to consider the type of LLM we want to build. In this case, let’s assume we are building a casual LLM where text generation i.e next token prediction is the goal. next token predition is a self-supervised learning / Autoregressive task. The output of the previous sequence of tokens is appended to previous sequence and used as input sequence for next token prediction.

Since, we don’t need any separate label to be done for this task, we can go about performing data processing, which involves following steps.