We release OLMoTrace, a tool that lets you trace the outputs of language models back to their full, multi-trillion-token training data in real time. We developed OLMoTrace to raise transparency & trust in LLMs.
On top of a standard chatbot experience, OLMoTrace highlights long pieces of LLM outputs that appear verbatim in the model’s training data, and shows the matching training documents. With OLMoTrace, you can see how LLMs may have learned to generate certain sequences of tokens. OLMoTrace is useful for fact checking ✅, understanding hallucinations 🎃, tracing LLM-generated “creative” expressions 🧑🎨, tracing reasoning capabilities 🧮, or just generally helping you understand why LLMs say certain things.
OLMoTrace is now available for the OLMo 2 and OLMoE family of models on Ai2 Playground. We also open-source our code so that anyone can enable OLMoTrace with their model’s training data.
Paper: https://allenai.org/papers/olmotrace
Blog: https://allenai.org/blog/olmotrace
Try OLMoTrace on Ai2 Playground: https://playground.allenai.org