The team found that ScholarCopilot outperforms most baselines, achieving a top-1 retrieval accuracy of 40.1%. “The system has a 40% chance of guessing what citation you need correctly on the first try,” explains Wang.
They also conducted a user study by recruiting 10 students from various academic backgrounds who had experience with AI writing tools. Each participant had to write a paper on one of their topics of expertise. Ultimately, they had to rank ScholarCopilot in comparison to ChatGPT in different categories, including citation quality and user experience.
Yet, ScholarCopilot outranked ChatGPT in all categories, receiving a 100% approval rating for citation quality, particularly citation accuracy and content quality. Around 80% of the participants stated they would use it in the future.
These results were surprising because ScholarCopilot is a 7-billion parameter model, which is much smaller than most leading models, including ChatGPT and Claude. The key to ScholarCopilot’s success was its tailored training set, which comprised 500,000 academic papers. Whereas most AI tools are trained on multiple topics to ensure general use.
Overall, these Waterloo researchers are creating paradigm shifts in AI and academia— saving time and stress from gruelling papers.
Despite public discourse on AI impairing students’ skills, Wang emphasizes that “ScholarCopilot is designed not to replace students writing but rather assist them in handling mechanical tasks such as finding citations.”
“This allows students to focus more on critical tasks like reading, analytical, critical thinking and generating original ideas. Our design also encourages active human-AI collaboration, enabling students to remain fully engaged in the learning and writing process.”
Recently, the team open-sourced their work, allowing users to download their demo.
The research, ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations, was published in April.