Synthetic Datasets for LLM's (Pre LLM Boom)

Masters Dissertation

Once upon a time, we did not have ChatGPT and other LLM’s. Back then language models were being investigated but not for the superpowers they have now. For my masters dissertation I investigated a novel method for improving language models using synthetic datasets. Of course, now, synthetic datasets are ubiquitous in LLM finetuning therefore this project highlights my forward thinking in this field.

This project discussed the ability for neural networks to generalize for language and by extension do tasks based on natural language. This was a pipe dream at the time, but I was convinced they could. As we see now, large enough language models can indeed generalize. This paper also looked at using synthetic datasets as a means to improve training these language models. Code and paper can be found here and the paper can also be found below:

Masters Dissertation

Deep Q Learning for Competitive Pokemon

Overwatch Custom Game Development

If you think I’d be a good fit for your team, drop me a line and let’s meet.