Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

C3 AI Contract with US Air Force Expanded to $450 Million

Nvidia Faces $8B Hit as U.S. Halts H20 AI Chip Exports to China

Paper page – CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Testing DeepSeek R1-0528 on the M3 Ultra Mac Studio and Installing Local GGUF Models with Ollama on macOS
DeepSeek

Testing DeepSeek R1-0528 on the M3 Ultra Mac Studio and Installing Local GGUF Models with Ollama on macOS

Advanced AI BotBy Advanced AI BotMay 29, 2025No Comments11 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


DeepSeek released an updated version of their popular R1 reasoning model (version 0528) with – according to the company – increased benchmark performance, reduced hallucinations, and native support for function calling and JSON output. Early tests from Artificial Analysis report a nice bump in performance, putting it behind OpenAI’s o3 and o4-mini-high in their Intelligence Index benchmarks. The model is available in the official DeepSeek API, and open weights have been distributed on Hugging Face. I downloaded different quantized versions of the full model on my M3 Ultra Mac Studio, and here are some notes on how it went.

Last night, the folks at MLX Community uploaded a 4-bit quant of the full 685B parameter model, which is a ~350 GB download that I immediately tested in LM Studio, which has a built-in MLX interpreter for Apple Silicon. Loading the model in LM Studio lets you set a maximum context window size of 163,840 tokens; I attempted that, but the M3 Ultra with 512 GB of RAM couldn’t do it. Setting the context window to a much smaller 32,000 token size loaded the model with 363 GB of RAM used, but it was swiftly ejected from memory as soon as I tried to chat with it. Going down to a more reasonable 8,192 context window size did the trick. Using a prompt that asked R1-0528 to reformat a long text file (attached to the chat) produced these results:

Prompt: “This is the output of an advanced OCR process of a long PDF. Can you help me clean up cruft? Only remove repetitions, page markers, and other byproducts of OCR. Do NOT change the main body text and sections. Return the full text without truncating it. Explain what you did.”
Results: 15.74 tok/sec • 1344 tokens • 37.70s to first token

And here’s how the updated R1 thought behind the scenes:

Alright, let me tackle this user request step by step. The user shared OCR output from a PDF and wants it cleaned up without altering the main content. They specifically mentioned removing repetitions, page markers, and OCR artifacts while preserving all meaningful text sections.

Hmm, first I notice the citations contain overlapping content about iOS features like Dictation improvements and security enhancements. The key is to deduplicate without losing any unique details—for example, the “And More” section appears in both citations but with different endings. I’ll need to merge those carefully.

The user’s explicit request is clear, but their tone suggests urgency about getting a clean document—probably for professional use. They emphasized keeping all substantive text, so I should avoid over-editing. The repetition of features like “Improved speech recognition” across citations is a clear target for consolidation.

Looking closer, page markers (like “19”) and fragments (“## Overview | Key Features…”) are classic OCR artifacts. Those can go safely since they don’t add value. The headings seem consistent, so I’ll preserve their hierarchy while removing redundancies—like the duplicated “### Dictation” section.

Wait, should I explain why certain deletions happened? Yes—the user asked for explanations, so noting removed page numbers/repeats will build trust. Also, they didn’t mention formatting fixes, so I won’t touch line breaks or spacing unless it’s clearly an artifact (like random numbers mid-text).

Biggest challenge: ensuring the merged “And More” section captures both citations’ unique points (cursor animation + language indicator) without duplication. I’ll cross-check each sentence… done. Finally, a quick verification—no technical details were lost, all sections flow logically, and zero main text edits. Perfect!

The results were pretty good, and R1 reformatted the document as requested…or at least a portion of it, since the small context window resulted in a truncated output in that R1 couldn’t fit the entire document in its response.

A few hours ago, LM Studio also released a distilled version of R1-0528 – DeepSeek-R1-0528-Qwen3-8B. From their model page:

we distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3 8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B. This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial development focused on small-scale models.

I also attempted to load the full-blown version of R1-0528 in Ollama, my favorite CLI for locally-installed models that is also compatible with Simon Willison’s LLM CLI. When I looked earlier this morning, only “sharded” (i.e. multi-part) GGUF versions of the full 685B parameter model were available, specifically these by unsloth. That led me down a fascinating rabbit hole that involved learning how to download multiple GGUF open weights and merge them into a unified file that can be used with Ollama on macOS, which I’ll document here for posterity (including my future self).

First, before trying to merge multiple GGUFs, I had to install the llama.cpp CLI, and I did so with brew:

brew install llama.cpp

Since Homebrew doesn’t automatically add llama.cpp to your system PATH, I manually made a note of where the dedicated llama-gguf-split utility was located on my Mac, which would be:

/opt/homebrew/bin/llama-gguf-split

Then, I started downloading the 15 files required for the 8-bit version of R1-0528 I wanted to test – a 700 GB download that took a few hours to complete. After all the files were saved in my Downloads folder, I ran:

/opt/homebrew/bin/llama-gguf-split –merge DeepSeek-R1-0528-Q8_0-00001-of-00015.gguf outfile.gguf

After waiting several minutes, I ended up with single 713 GB file called ‘outfile.gguf’ that I renamed to ‘DeepSeek-R1-0528-Q8_0’ and placed in my Home folder. That’s the unified model file we need to use in Ollama.

Before doing that, however, I had to create a “model file” for Ollama to install the model – essentially, a template that tells Ollama what to load and how to use it. I wasn’t familiar with the structure of model files, so to get started, I copied a model file from one of the existing models I already had in Ollama with:

ollama show –modelfile deepseek-r1:1.5b >> Modelfile

I then grabbed the Modelfile file, opened it in TextEdit, and modified it with the necessary parameters and the local path of my GGUF file:

# Modelfile generated by “ollama show”
# To build a new Modelfile based on this, replace FROM with:
# FROM DeepSeek-R1-0528-Q8_0

FROM /Users/viticci/GGUFs/DeepSeek-R1-0528-Q8_0.gguf
TEMPLATE “””{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role “user” }}<|User|>{{ .Content }}
{{- else if eq .Role “assistant” }}<|Assistant|>
{{- if and $.IsThinkSet (and $last .Thinking) -}}

{{ .Thinking }}

{{- end }}{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }}
{{- end }}
{{- if and $last (ne .Role “assistant”) }}<|Assistant|>
{{- if and $.IsThinkSet (not $.Think) -}}




{{ end }}
{{- end -}}
{{- end }}”””
PARAMETER stop <|begin▁of▁sentence|>
PARAMETER stop <|end▁of▁sentence|>
PARAMETER stop <|User|>
PARAMETER stop <|Assistant|>
SYSTEM You are a helpful assistant.
PARAMETER temperature 0.6

I saved the file, then ran:

ollama create DeepSeek-R1-0528-Q8_0 -f /Users/viticci/Desktop/Modelfile
And waited several minutes for Ollama to gather all the components and install the local model. At the end, Ollama showed a success message:

…but that’s when my lucky streak ended: I finally found a local model that couldn’t run on my maxed-out, M3 Ultra Mac Studio! As it turns out, the 8-bit, 700 GB version of DeepSeek R1 is too big even for this machine. After trying to load the model in memory and going over 400 GB of RAM used, Ollama consistently terminated the session:

Fair enough – I imagined that the 8-bit version of R1 was too large for this machine to handle. I went back to the beginning, and downloaded the multi-part, 4-bit version of DeepSeek R1, which is considerably smaller at “only” 400 GB of storage instead of 700 GB. After repeating all of the above steps (with different filenames for this version of the model), I installed the smaller model and ran:

ollama run DeepSeek-R1-0528-Q4_K_M:latest –verbose

Then waited for the model to load, and it worked.

I then asked a simple question:

What is MacStories?

And this was R1’s response, along with its thinking process and statistics about the response itself since I added the –verbose flag:

Okay, the user asked “What is MacStories?” This seems like a straightforward informational query about a tech-focused website. Hmm, judging by how basic the question is, they might be completely new to Apple ecosystem coverage or just heard the name somewhere. No complex context here – probably genuine curiosity without hidden layers. Let me mentally outline key aspects: First clarify it’s an independent site (not affiliated with Apple), then highlight their specialization in deep-dive content across all Apple platforms. Should emphasize their unique value proposition like Federico Viticci’s legendary iOS reviews and Club MacStories subscription model. Wait – important to mention they’re not just news regurgitators but creators of original workflows and app showcases. The automation tutorials are particularly signature. Oh, and must include that Marco Arment connection since Underflow is historically significant. For tone: Keep it enthusiastic but precise. They deserve props for longevity (founded 2009!) in this volatile niche. User likely wants concise yet comprehensive overview – no fluff, just meaty facts about their content pillars and influence. Better double-check my memory: Yes, still based in Italy despite global reach. And Club MacStories does have that exclusive automation system they developed… what was it called again? Actions! Right. Final structure: Definition → Founder → Content types → Business model → Special sauce (workflow focus). No need for criticism here since user just asked for basics.

**MacStories** is a well-respected **independent technology website and podcast network**, primarily focused on in-depth coverage of Apple’s ecosystem, including:

1. **Apple Hardware & Software:** Detailed reviews, tutorials, and news about iPhones, iPads, Macs, Apple Watch, Apple TV, iOS/iPadOS, macOS, watchOS, tvOS.
2. **Third-Party Apps:** Extensive app discovery, reviews (especially for productivity and creative tools), developer interviews, and curation of high-quality software.
3. **Workflows & Automation:** A major specialty is exploring how to use Apple devices effectively, particularly through automation using Shortcuts on iOS/iPadOS/Mac and other tools.
4. **Apple Services:** Coverage of services like iCloud+, Apple Music, Apple TV+, etc.

**Key Distinguishing Features:**

– **Depth & Quality:** Known for exceptionally thorough reviews (especially Federico Viticci’s multi-thousand-word iPad Pro reviews) and insightful analysis beyond basic news reporting.
– **Focus on Productivity & Creativity:** Strong emphasis on how users can leverage technology to get work done, create content, and streamline tasks. Automation is a central theme.
– **Founder-Driven:** Founded in 2009 by Federico Viticci (Editor-in-Chief), who remains its most prominent voice alongside Managing Editor John Voorhees and other talented writers/editors.
– **Club MacStories:** A popular subscription service offering exclusive content, including:

– Weekly newsletters (MacStories Weekly & AppStories Weekly)
– In-depth articles and tutorials
– Access to a powerful Shortcuts automation library (“Actions”)
– Exclusive podcast episodes (AppStories)

– **Podcasts:** Hosts podcasts like Connected (Federico Viticci, Myke Hurley, Stephen Hackett), MacStories Unwind, and the Club-exclusive AppStories.
– **Origin of Underflow:** The concept for Marco Arment’s influential iOS app discovery platform “Underflow” originated from an idea first discussed by Federico Viticci on MacStories.

**In essence, MacStories is a go-to resource for Apple enthusiasts, power users, and professionals seeking deep dives into Apple products, apps (especially productivity-focused), automation techniques, and thoughtful perspectives on how to make the most of their technology.** It’s known for its high-quality writing, editorial independence, and passionate community.

And here are the stats:

total duration: 1m46.627582375s
load duration: 30.969542ms
prompt eval count: 15 token(s)
prompt eval duration: 52.491389792s
prompt eval rate: 0.29 tokens/s
eval count: 810 token(s)
eval duration: 54.103737291s
eval rate: 14.97 tokens/s

As you can see, for an offline model that runs on my computer with ~370 GB of RAM, it’s quite good – but there are hallucinations. While R1-0528 got several aspects of MacStories and our team correct, there are some weird things in the response, such as the “Marco Arment connection” with “Underflow” (what?), an “AppStories Weekly” newsletter that does not exist, and a section of our Shortcuts coverage apparently called “Actions”. None of these are true! But the rest – including when I founded MacStories, the Club, and editorial values are pretty spot-on. It’s always very strange to see the complete chain-of-thought of these models.

In any case, while it was fun to learn all this and test the full version of R1-0528, I wouldn’t recommend doing this unless you truly want to push the limits of an M3 Ultra Mac Studio with 512 GB of RAM.

I’m keen to play around with the updated R1 some more, but I’ll wait for official versions to show up in Ollama’s directory with different sizes and better model files compared to the one I put together this morning. Regardless, the fact that this Mac Studio can run the full version of DeepSeek R1-0528 with slightly less than 400 GB of RAM is pretty remarkable, and once again confirms that Apple is making the best consumer-grade computers for local AI development.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleNvidia warns of China risks as CEO Jensen Huang praises Trump
Next Article Mistral AI launches code embedding model, claims edge over OpenAI and Cohere – Computerworld
Advanced AI Bot
  • Website

Related Posts

Distilled AI runs on a single GPU

May 30, 2025

Deepseek AI model faces criticism over censorship and free speech limits

May 30, 2025

DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro

May 30, 2025
Leave A Reply Cancel Reply

Latest Posts

Wang Chung On ‘Everybody Have Fun Tonight’ Amid New Compilation Set

J.K. Rowling Is A Billionaire—Again

The Kooks Luke Pritchard On New Music, Fatherhood And More

James Rondeau Returns as Director of Art Institute of Chicago

Latest Posts

C3 AI Contract with US Air Force Expanded to $450 Million

May 30, 2025

Nvidia Faces $8B Hit as U.S. Halts H20 AI Chip Exports to China

May 30, 2025

Paper page – CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

May 30, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.