Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Tesla brings closure to head-scratching Cybertruck trim

OpenAI launches ‘Grove’ programme to mentor AI Entrepreneurs | Technology News

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Yannic Kilcher

Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)

By Advanced AI EditorApril 23, 20256 Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



#wikipedia #reinforcementlearning #languagemodels

Transformers have come to overtake many domain-targeted custom models in a wide variety of fields, such as Natural Language Processing, Computer Vision, Generative Modelling, and recently also Reinforcement Learning. This paper looks at the Decision Transformer and shows that, surprisingly, pre-training the model on a language-modelling task significantly boosts its performance on Offline Reinforcement Learning. The resulting model achieves higher scores, can get away with less parameters, and exhibits superior scaling properties. This raises many questions about the fundamental connection between the domains of language and RL.

OUTLINE:
0:00 – Intro
1:35 – Paper Overview
7:35 – Offline Reinforcement Learning as Sequence Modelling
12:00 – Input Embedding Alignment & other additions
16:50 – Main experimental results
20:45 – Analysis of the attention patterns across models
32:25 – More experimental results (scaling properties, ablations, etc.)
37:30 – Final thoughts

Paper:
Code:
My Video on Decision Transformer:

Abstract:
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.

Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu

Links:
Merch:
TabNine Code Completion (Referral):
YouTube:
Twitter:
Discord:
BitChute:
LinkedIn:
BiliBili:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

source

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous Article1,000,000,000 Parameter Super Resolution AI!
Next Article xAI’s Grok chatbot can now ‘see’ the world around it
Advanced AI Editor
  • Website

Related Posts

AGI is not coming!

August 9, 2025

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

July 23, 2025

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

July 19, 2025

6 Comments

  1. cargo-ex-61 on September 6, 2025 9:42 pm

    доставка из китая цена карго перевозки из китая

  2. russkoe-porno-130 on September 8, 2025 8:49 am

    русское порно инцест порно русских мам

  3. promocod-iherb-372 on September 8, 2025 9:57 am

    Новые актуальные промокод на первый заказ iherb для выгодных покупок! Скидки на витамины, БАДы, косметику и товары для здоровья. Экономьте до 30% на заказах, используйте проверенные купоны и наслаждайтесь выгодным шопингом.

  4. kursovaya-rabota-107 on September 13, 2025 9:38 am

    курсовые под заказ срочные курсовые работы

  5. onlayn zaym 387 on September 13, 2025 10:14 am

    быстрый займ онлайн займ онлайн срочно

  6. zaym onlayn 667 on September 13, 2025 11:05 am

    взять займ онлайн без займы онлайн на карту

Leave A Reply

Latest Posts

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

Latest Posts

Tesla brings closure to head-scratching Cybertruck trim

September 13, 2025

OpenAI launches ‘Grove’ programme to mentor AI Entrepreneurs | Technology News

September 13, 2025

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

September 13, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Tesla brings closure to head-scratching Cybertruck trim
  • OpenAI launches ‘Grove’ programme to mentor AI Entrepreneurs | Technology News
  • Floating Point Precision Affects AI Model Training Effectiveness_the_number_of
  • Promoting the Implementation of Artificial Intelligence Technology in Manufacturing Scenarios_as_its_Tuo
  • OpenAI spending spree powering much of tech. Oracle latest example

Recent Comments

  1. Michaeltrups on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. KennethBiory on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. CoreyTinia on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Brucesip on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Scottsnili on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.