Masters Dissertation
Once upon a time, we did not have ChatGPT and other LLM’s. Back then language models were being investigated but not for the superpowers they have now. For my masters dissertation I investigated a novel method for improving language models using synthetic datasets. Of course, now, synthetic datasets are ubiquitous in LLM finetuning therefore this project highlights my forward thinking in this field.
This project discussed the ability for neural networks to generalize for language and by extension do tasks based on natural language. This was a pipe dream at the time, but I was convinced they could. As we see now, large enough language models can indeed generalize. This paper also looked at using synthetic datasets as a means to improve training these language models. Code and paper can be found here and the paper can also be found below: