Apple MacBook Pro M1 Max – Is it worth it for Machine Learning?


Another new MacBook? Didn’t I just buy the Air? Yes, it still has warranty, so it makes even more sense to sell it. I’m a big fan of the Air form factor, and I’ve never quite warmed up to the Pro models. However, the limitation of 16GB of RAM in the MacBook Air was hard to accept at the time, but there were no other alternatives. So, on the evening when the new MacBook Pros with M1 Pro and M1 Max were announced, I immediately ordered one – a 14″ MacBook Pro M1 Max with 10 cores, 24 GPU cores, a 16-core Neural Engine, 64 GB of RAM (!!!), and a 2TB drive. My MacBook Air has 16 GB of RAM and the first M1 chip with 8 cores.

Why 64 GB of RAM?

I regularly work with large datasets, ranging from 10 to 50 GB. But even a 2 GB file can cause issues, depending on what kind of data transformations and computations you perform. Over time, using a computer with little RAM becomes frustrating. While a local installation of Apache Spark helps me utilize multiple cores simultaneously, the lack of RAM is always a limiting factor. For the less technically inclined among my readers: Data is loaded from the hard drive into the RAM, and the speed of the hard drive determines how fast this happens because even an SSD is slower than RAM.

However, if there isn’t enough RAM, for example, if I try to load a 20 GB file into 16 GB of RAM, the operating system starts swapping objects from the RAM to the hard drive. This means data is moved back and forth between the RAM and the hard drive, but the hard drive now serves as slower “RAM.” Writing and reading data from the hard drive simultaneously doesn’t speed up the process either. Plus, there’s the overhead, because the program that needs the RAM doesn’t move objects itself—the operating system does. And the operating system also needs RAM. So, if the operating system is constantly moving objects around, it also consumes CPU time. In short, too little RAM means everything slows down.

At one point, I considered building a cluster myself. There are some good guides online about how to do this with inexpensive Raspberry Pis. It can look cool, too. But I have little time. I might still do this at some point, if only to try it out. Just for the math: 8 Raspberry Pis with 8 GB of RAM plus accessories would probably cost me close to €1,000. Plus, I’d have to learn a lot of new things. So, putting it off isn’t the same as giving up.

How did I test it?

To clarify, I primarily program in R, a statistical programming language. Here, I have two scenarios:

  • An R script running on a single core (not parallelized).
  • An R script that’s parallelized and can thus run on a cluster.

For the cluster, I use Apache Spark, which works excellently locally. For those less familiar with the tech: With Spark, I can create a cluster where computational tasks are divided and sent to individual Nodes for processing. This allows for parallel processing. I can either build a cluster with multiple computers (which requires sending the data over the network), or I can install the cluster locally and use the cores of my CPU as the nodes. A local installation has the huge advantage of no network latency.

For those who want to learn more about R and Spark, here is the link to my book on R and Data Science!

For the first test, a script without parallelization, I use a famous dataset from the history of search engines, the AOL data. It contains 36,389,575 rows, just under 2 GB. Many generations of my students have worked with this dataset. In this script, the search queries are broken down, the number of terms per query is calculated, and correlations are computed. Of course, this could all be parallelized, but here, we’re just using one core.

For the second test, I use a nearly 20 GB dataset from Common Crawl (150 million rows and 4 columns) and compare it with data from Wikipedia, just under 2 GB. Here, I use the previously mentioned Apache Spark. My M1 Max has 10 cores, and even though I could use all of them, I’ll leave one core for the operating system, so we’ll only use 9 cores. To compare with the M1 in my MacBook Air, we’ll also run a test where the M1 Max uses the same number of cores as the Air.

How do I measure? There are several ways to measure, but I choose the simplest one: I look at what time my script starts and when it ends, then calculate the difference. It’s not precise, but we’ll see later that the measurement errors don’t really matter.

Results: Is it worth it?

It depends. The first test is somewhat disappointing. The larger RAM doesn’t seem to make much of a difference here, even though mutations of the AOL dataset are created and loaded into memory. The old M1 completes the script in 57.8 minutes, while the M1 Max takes 42.5 minutes. The data are probably loaded into RAM a bit faster thanks to the faster SSDs, but the difference is only a few seconds. The rest seems to come from the CPU. But for this price, the M1 Max doesn’t justify itself (it’s twice as expensive as the MacBook Air).

Things get more interesting when I use the same number of cores on both sides for a cluster and then use Spark. The differences are drastic: 52 minutes for the old M1 with 16 GB of RAM, 5.4 minutes for the new M1 Max with 64 GB of RAM. The “old” M1, with its limited RAM, takes many minutes just to load the large dataset, while the new M1 Max with 64 GB handles it in under 1 minute. By the way, I’m not loading a simple CSV file here but rather a folder full of small partitions, so the nodes can read the data independently. It’s not the case that the nodes are getting in each other’s way when loading the large file.

First experiences with the Apple Silicon Macs with the M1.


I have already experienced a processor change at Apple. My Apple career began in 1996 with a PowerBook 5300, which I absolutely loved—despite its 640×480 pixel grayscale display. On the one hand, a Mac laptop at that time was still something very special and rare (admittedly, at an exorbitant price, but it had been provided to me by my then employer), and it had a keyboard that felt incredibly good and, above all, sounded wonderfully. On the other hand, compared to the Windows PCs I had used before, it was also extremely reliable. With 8 MB of RAM and a 500 MB hard drive, it was also quite well-equipped. This PowerBook was the first to feature a Motorola PowerPC processor, so there had already been a kind of transition shortly before.

In 2006, Apple switched to Intel processors, a move that was extraordinary at the time, especially since in the 90s, Apple had aired commercials where a Motorola processor roasted an Intel processor. For the transition, Apple offered a program called Rosetta, which allowed PowerPC applications to run on Intel-based Macs. Typically, these programs ran slower. The commercial was actually referenced again when the first Intel Macs were introduced, around minute 1:05 of the presentation.

Now another transition. In 2019, I bought the 16″ MacBook Pro after many years with a MacBook Air. I hadn’t kept any other Apple computer longer than the Air, but over time, it had become too slow for what I was doing with it (R, a lot of work in the terminal with sed, awk, Lightroom, etc.). I hadn’t upgraded earlier because I absolutely didn’t want the awful butterfly keyboard. The return to scissor-switch keyboards began with the 16″ MacBook, but I still couldn’t get used to the huge device. Not to mention, it became incredibly hot and loud, and the battery life was far from Apple’s claims. For example, when I trained a machine learning model, the MacBook got so hot that I didn’t need to heat my office anymore. And during any Zoom or Webex call, the battery drained faster than an ice cube melting in the summer heat.

I spend quite a bit of time waiting for the results of a calculation, even if it’s only 20 or 30 seconds sometimes, but it adds up over the day, and sometimes it’s several minutes or even hours. I usually know in advance how long it will take, but I don’t start another task for just half a minute because it disrupts my train of thought. Data analysis is also a meditative act for me. So, the speed of a computer is extremely important. Not just for data analysis, but for all other tasks on the computer as well. It just has to feel smooth.

The speed of a calculation in R depends on many factors:

  • Memory (yes, R loads everything into memory)
  • Processor speed
  • Parallelization

For memory, the first Apple Silicon models aren’t particularly well-equipped—16 GB is the maximum. It doesn’t help that the path from the processor to memory is especially short. The operating system uses part of the memory, the running programs also use some, so there’s not much left. Especially when working with large files, as I often do, which can sometimes reach 50GB or more, swapping is almost “predetermined.” Parallelization is not possible yet, as the necessary packages are not available—Homebrew, for example, is still not available.

Additionally, R is currently not available for the new Macs. It lacks (still) a Fortran compiler, and this is not only a problem for R but also for many machine learning software extensions for Python. Who would have thought that this old programming language could still have such a big influence today? Of course, R also runs via Rosetta, but then I could have just not bought a new Mac and let myself be used as a beta tester for Apple 🙂 But, small spoiler: Even with Rosetta, the Intel version of R runs faster on the M1, and it seems that’s not just the case for me.

I initially purchased a Mac mini with 8GB of RAM and a 512GB SSD to test how good the performance actually is and whether I could make the transition. I was able to pick up the Mac mini the same day from the Apple Store, and from the start, I was amazed at how smooth everything felt on this computer. R worked flawlessly, though RStudio showed error messages frequently. No big deal. But it soon became clear that the memory limitation was an issue. When trying to process a 200GB file (using sort, awk, sed in the shell), at some point, the hard drive filled up with swapping, and the process failed. Okay, maybe the mini is just a bit too weak for that task. What surprised me, though, was that not once did the fan kick in—this would not have been the case with the 16″ MacBook Pro. So, all in all, everything seemed great…

…except for the Bluetooth. My Mac mini also had the well-known Bluetooth problems. Specifically, the mouse loses its connection multiple times a day, which is extremely inconvenient when you’re showing a demo during a video conference. Not good, very frustrating. I tried all sorts of tips, including using a wired connection to the network instead of Wi-Fi. No improvement. It’s unclear whether this is a hardware or software issue. A chat with Apple Support dropped multiple times, and eventually, I got tired of it because, you know, I have a job too. An update to the Big Sur beta helped a little, and as of yesterday, the computer is running on 11.1, so I’m hoping it will be better now, and that it’s not a hardware issue.

Another not-so-pleasant experience was the sound. I have never experienced an Apple computer with such poor sound quality—my old PowerBook 5300 probably sounded better. They could have definitely done much more with the sound.

Despite the Bluetooth issues, after 2 days, I decided to also buy a portable Apple Silicon Mac. In full configuration (16GB RAM, 2TB SSD), it costs about the same as I could sell my 16″ MacBook Pro for on the used market, and at the same time, I get double the storage space. There used to be a rule that you should calculate how much storage you might need and then multiply that size by 4. Unfortunately, there are no 8TB SSDs for these computers yet.

The computer arrived after almost 3 weeks, one week earlier than expected. Here, I noticed a small speed boost, likely due to the doubled RAM. The 200GB file also went through smoothly now, thanks to enough space on the SSD. And, just like with the Mac mini, the computer hardly seemed to break a sweat. Only once did the computer get a little warm, but not hot, and certainly not as hot as the 16″ MacBook Pro. This is also reflected in the battery life. I have yet to drain the battery in a single day. No kidding. I plug the computer in at night, and I usually still have a few hours of battery life left. It’s a completely new feeling.

The Bluetooth issue also exists with the MacBook Air. This is unpleasant, and I wonder how it could have gone unnoticed in the tests Apple conducts. That a transition doesn’t go completely smoothly is understandable, and you’re always somewhat of a guinea pig when buying the first model after a major shift. For me, it’s a trade-off: How much time do I gain by having a fast computer versus how much time do I lose when something occasionally doesn’t work. The mouse connection is of course a hygiene factor; it should just work. But with the MacBook Air, I’m not as reliant on it. So far, I’m happy with my decision, though I would have preferred 32GB or even 64GB of RAM. But those options aren’t available yet.

The sound of the MacBook Air is much better than my old Air’s, but it doesn’t compare to the 16″ MacBook Pro. No surprise, the speakers are much smaller. Still, it’s better than the mini’s sound.

The instant wake feature actually works, and sometimes I wonder if the computer was even “asleep.” The keyboard sounds almost as nice as that of the PowerBook 5300, and if anyone wonders why a keyboard should sound good, well, aesthetics don’t stop at just the visual 🙂