Running local models on an M4 with 24GB memory

TL;DR

A software engineer has demonstrated that it is possible to run certain AI language models locally on a MacBook Pro with 24GB RAM. While these models are not state-of-the-art, they can perform basic tasks with acceptable speed, reducing dependence on cloud services. This development highlights potential for more accessible local AI use, though with limitations.

A software engineer has demonstrated that it is feasible to run a smaller AI language model, Qwen 3.5 9B, locally on an M4 MacBook Pro with 24GB of memory, achieving functional performance for basic tasks without internet access.

The experiment involved configuring models on various local AI frameworks such as LM Studio and Pi, with the best results obtained using Qwen 3.5-9B (Q4) with specific settings for thinking mode and context window size. The model runs at approximately 40 tokens per second, enabling interactive tasks like coding assistance and research.

Compared to larger state-of-the-art models, the local setup is less capable of handling complex, multi-step reasoning tasks or long-term problem solving. The engineer notes that while the model is not as powerful as cloud-based SOTA models, it still offers meaningful utility for basic research and coding, with the advantage of offline operation and reduced reliance on external cloud services.

Why It Matters

This development matters because it demonstrates that accessible, smaller-scale AI models can be run locally on consumer hardware, expanding options for privacy-conscious users and reducing dependence on large cloud providers. It also indicates a potential shift toward more flexible AI deployment, though with acknowledged performance limitations.

Apple 2024 MacBook Pro with Apple M4 Pro Chip (16-inch, 24GB RAM, 512GB SSD Storage) (QWERTY English) Space Black (Renewed)

Powerful 16-inch MacBook Pro with M4 Pro/Max chip, stunning Liquid Retina XDR display, and all-day battery life for professional workflows.

ProcessorM4 Pro or M4 Max chip

Display16.2-inch Liquid Retina XDR

Memory24GB RAM

Storage512GB SSD

Battery LifeAll-day performance

PortsThunderbolt 5, HDMI, SDXC, MagSafe 3

As an affiliate, we earn on qualifying purchases.

Background

Recent years have seen rapid advances in large language models (LLMs), primarily hosted in the cloud due to their computational demands. Smaller models have been available but often require significant tuning and configuration to run efficiently on consumer hardware. The experiment by Johanna Larsson, a software engineer, builds on ongoing efforts to democratize AI by making it more accessible and private, especially as cloud costs and privacy concerns grow.

“It’s surprisingly good for something that can run on a 24GB MacBook Pro while leaving space for lots of other things running too.”

— Johanna Larsson

“While it’s not a 10x productivity boost, it’s something, and it’s interesting.”

— Johanna Larsson

What Remains Unclear

It is not yet clear how well these local models will perform across diverse real-world tasks or how scalable the setup is for more complex applications. The long-term stability and ease of use also remain to be tested across different hardware configurations and user expertise levels.

What’s Next

Further testing and optimization are expected to improve performance and usability. Developers and researchers may explore integrating these models into workflows for specific tasks, and hardware improvements could expand the capabilities of local AI deployment.

Key Questions

Can I run larger models locally on my M4 MacBook?

Currently, models larger than around 20-30B parameters are unlikely to run efficiently on 24GB RAM, but ongoing optimizations may extend this limit in the future.

What are the main limitations of running local models like Qwen 3.5 9B?

These models are less capable of complex reasoning, long-term memory, and multi-step problem solving compared to state-of-the-art cloud models. They also require careful configuration and tuning.

Is this setup suitable for production or critical tasks?

No, these models are primarily experimental and suitable for research, coding, or basic tasks. They are not reliable substitutes for larger, cloud-hosted models for critical applications.

What hardware is needed to run these models locally?

A MacBook Pro or similar laptop with at least 24GB RAM, a capable CPU, and sufficient storage is required. GPU acceleration is not necessary but can improve performance.

Running local models on an M4 with 24GB memory

Up next

Why Studio Lighting Can Transform Video Quality

Author

AI Espionage Team

Share article

Why It Matters

Apple 2024 MacBook Pro with Apple M4 Pro Chip (16-inch, 24GB RAM, 512GB SSD Storage) (QWERTY English) Space Black (Renewed)

Background

What Remains Unclear

What’s Next

Key Questions

Can I run larger models locally on my M4 MacBook?

What are the main limitations of running local models like Qwen 3.5 9B?

Is this setup suitable for production or critical tasks?

What hardware is needed to run these models locally?

The Real Cost of a Local-Inference Rig in 2026

Microsoft reports are exposing AI’s real cost problem: Using the tech is more expensive than paying human employees

Thunderbolt-ibverbs: We Have InfiniBand At Home

Dangerous App Targets Android Devices, Harvesting All Data

Cyber Awareness Army Surges In Global Coverage

Since Chromium 148, Math.tanh is now fingerprintable to link underlying OS

What Heavy-Duty Laser Printers Still Do Better Than Alternatives

Since Chromium 148, Math.tanh is now fingerprintable to link underlying OS

Running local models on an M4 with 24GB memory

Up next

Author

AI Espionage Team

Share article

Why It Matters

Apple 2024 MacBook Pro with Apple M4 Pro Chip (16-inch, 24GB RAM, 512GB SSD Storage) (QWERTY English) Space Black (Renewed)

Background

What Remains Unclear

What’s Next

Key Questions

Can I run larger models locally on my M4 MacBook?

What are the main limitations of running local models like Qwen 3.5 9B?

Is this setup suitable for production or critical tasks?

What hardware is needed to run these models locally?

You May Also Like