W-PCA Based Gradient-Free Proxy For Efficient Search Of Lightweight Language Models

arXiv:2504.15983v1 Announce Type: cross
Abstract: The demand for efficient natural language processing (NLP) systems has led to the development of lightweight language models. Previous work in this area has primarily focused on manual design or training-based neural architecture search (NAS) methods. Recently, zero-shot NAS methods have been proposed for evaluating language models without the need for training. However, prevailing approaches to zero-shot NAS often face challenges such as biased evaluation metrics and computational inefficiencies. In this paper, we introduce weight-weighted PCA (W-PCA), a novel zero-shot NAS method specifically tailored for lightweight language models. Our approach utilizes two evaluation proxies: the parameter count and the number of principal components with cumulative contribution exceeding $\eta$ in the feed-forward neural (FFN) layer. Additionally, by eliminating the need for gradient computations, we optimize the evaluation time, thus enhancing the efficiency of designing and evaluating lightweight language models. We conduct a comparative analysis on the GLUE and SQuAD datasets to evaluate our approach. The results demonstrate that our method significantly reduces training time compared to one-shot NAS methods and achieves higher scores in the testing phase compared to previous state-of-the-art training-based methods. Furthermore, we perform ranking evaluations on a dataset sampled from the FlexiBERT search space. Our approach exhibits superior ranking correlation and further reduces solving time compared to other zero-shot NAS methods that require gradient computation.

Source link

What's Hot

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering – Takara TLDR

GSA Secures Meta Llama AI Agreement for Federal Government Use

Dedicated mobile apps for vibe coding have so far failed to gain traction

W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

EpiCache: Episodic KV Cache Management for Long Conversational Question Answering – Takara TLDR

GSA Secures Meta Llama AI Agreement for Federal Government Use

Dedicated mobile apps for vibe coding have so far failed to gain traction

What's Hot

W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models

Related Posts

Subscribe to Updates