Browsing: Yannic Kilcher
Paper: Abstract: While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.…
OUTLINE: 0:00 – Intro 0:19 – Our next-generation Meta Training and Inference Accelerator 01:39 – ALOHA Unleashed 03:10 – Apple…
Paper: Abstract: While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for…
#gpt4o #sky #scarlettjohansson After the release of their flagship model GPT-4o, OpenAI finds itself in multiple controversies and an exodus…
xLSTM is an architecture that combines the recurrency and constant memory requirement of LSTMs with the large-scale training of transformers…
#rag #hallucinations #legaltech An in-depth look at a recent Stanford paper examining the degree of hallucinations in various LegalTech tools…
Matrix multiplications (MatMuls) are pervasive throughout modern machine learning architectures. However, they are also very resource intensive and require special…
#llm #privacy #finetuning Can you tamper with a base model in such a way that it will exactly remember its…
How can one best use extra FLOPS at test time? Paper: Abstract: Enabling LLMs to improve their outputs by using…
This paper posits the interesting question: How much of the performance of Mamba, S4, and other state-space-like models is actually…