Update: OpenAI hat den Code Interpreter in “Advanced Data Analysis” umbenannt.
Dr. Tom Alby, PMP: Data, Machine Learning, AI
Update: OpenAI hat den Code Interpreter in “Advanced Data Analysis” umbenannt.
I have been using Apple products almost exclusively since the mid-90s. Now and then, I engage in debates about the pros and cons of Apple products compared to their competitors, especially regarding the price difference. And of course, the question arises whether minimalism and using Apple products even go together. It creates an ambivalence between design culture and the contradiction of consumption.
Continue reading “How do minimalism and Apple products go together, when Apple is so expensive?”Mastodon and the Fediverse had maintained a niche existence for many years until they were thrust into the spotlight by Musk’s acquisition of Twitter and the ensuing turbulence. Since then, the Mastodon community has not been growing like a hockey stick, as it’s called in investor jargon, but like a rocket. This is a big win for those who champion open-source principles. However, this rapid growth might also become a curse, and for several reasons.
Continue reading “Eternal November: Will Mastodon Suffer the Same Fate as Usenet?”I was actually really excited about the Kindle Scribe I ordered, because it seemed to solve two problems I have with using my reMarkable:
Light is available, but otherwise, the Kindle Scribe has been a very disappointing experience for me. Of course, I don’t really want to throw money into Amazon’s pockets or store my data in their cloud, but the topics of “working through paper” and “reading” are of great importance to me. Since I don’t jot down anything confidential… one must choose the lesser evil. Perhaps someday there will be a solution that works without the cloud. But how good is the Scribe really?
Continue reading “Why I Will Return My Kindle Scribe”September was essentially a good month. The only new purchase I made was a pair of fingerless gloves, as it sometimes gets a bit chilly in the office. However, I didn’t want to turn on the heating just yet.
Then there was the Braun Atelier investment, which I had already written about and am still very pleased with.
However, there’s also an order I placed in September, which won’t arrive until December—the Kindle Scribe, which I might exchange for my Remarkable 2. Is the purchase necessary? Certainly not. I could print any article I want or need to read, and use a paper notebook. Can I work better and faster with paper tablets than with paper? Definitely. What I hope to achieve with the Scribe, I have already described in the article. If the Scribe doesn’t meet my expectations, it will go back. My reMarkable has very low usage costs since I use it multiple times a day. In the end, it’s about considering beforehand whether a technology actually improves something, or if it just serves blind consumption.
Update: I have now tested the Kindle Scribe, and you can find the full report here!
I had one of the first Kindles in Germany and even wrote an app for it. I also had one of the first reMarkables and now own a reMarkable 2. Apparently, I’m susceptible to tech gadgets, especially when I hope they could potentially boost my productivity. Now, Amazon is entering this market with the Kindle Scribe, directly competing with companies like reMarkable. Here’s the introduction video from an Amazon event:
With the reMarkable, I became critical when they suddenly introduced a subscription model. While this didn’t affect me, since early buyers could keep the Connect subscription “for free” for life, reMarkable clearly realized that they weren’t getting good karma points for this move and changed their model. With the Kindle, I got one of the devices that had a built-in SIM card for which you didn’t have to pay any fees worldwide. That was really convenient, being able to read my newspaper every day no matter where I was in the world.
Will the new Kindle Scribe replace the reMarkable? I haven’t yet received a Kindle Scribe for testing, but already a few interesting aspects are noticeable. Both devices offer a tremendous advantage: focusing on the essentials. I’m not familiar with the current Kindle devices, but my old Kindle displayed books wonderfully, and it only had a web browser for Wikipedia—pure focus. Annotating texts was easier on my Kindle since it had a keyboard. But, of course, it wasn’t as simple as writing a note with a pen. However, I could easily export these notes using my tool.
Let’s take a closer look at the specs:
What interests me about the Scribe? Over the last few months, I’ve been exploring Luhmann’s Zettelkasten method and now have such a system at home. With the reMarkable, it bothers me that I couldn’t get the notes I wanted to make, not the permanent notes, but my working notes. So I always carry index cards with me, which is pretty unwieldy with the reMarkable. Writing on virtual index cards would be possible with the Scribe, as you can attach a note to a text snippet and export it later. For me, that’s the killer app. I also hope that importing and exporting documents will be easier. I’ll test it and report back here.
For reMarkable, Amazon’s entry into the market means this technology will reach the masses, but reMarkable won’t benefit from that. Quite the opposite. Because Amazon offers a convenient way to access content through its store, and its awareness campaign will convert potential reMarkable customers.
The question for power users will be how convenient it will be to manage notes and books on the Kindle Scribe. reMarkable offers folders that can also be created and managed on the desktop. The tags functionality, which reMarkable recently introduced, is really good, but unfortunately, it only works on the device itself. On the Kindle, the software on the Mac is, at least, a disaster; there’s no recognizable organization.
R often comes in a wonderful mix of languages, as seen in the screenshot:
The simplest way to change that:
Sys.setenv(LANG = “de”)
Then the R session will be in German. But only for this one session! If you want to change the language permanently, you can specify a preference in the .Renviron
file (the file must be created if it doesn’t already exist). To do this, open the terminal and type:
vi .Renviron
Type, press “i” to enter Insert mode, and then
LANG=“de”
enter the following. Then press ESC and type “:wq!” (without quotes). Restart R, and it will remain in German.
The presentation will be provided through SMX.
Additionally, in this context, the Data Strategy Canvases by Datentreiber are also interesting.
I would be very happy to welcome you to my mailing list. Don’t worry, I only write very rarely, and only when there’s a new book of mine to share. Here’s the sign-up link.
Another new MacBook? Didn’t I just buy the Air? Yes, it still has warranty, so it makes even more sense to sell it. I’m a big fan of the Air form factor, and I’ve never quite warmed up to the Pro models. However, the limitation of 16GB of RAM in the MacBook Air was hard to accept at the time, but there were no other alternatives. So, on the evening when the new MacBook Pros with M1 Pro and M1 Max were announced, I immediately ordered one – a 14″ MacBook Pro M1 Max with 10 cores, 24 GPU cores, a 16-core Neural Engine, 64 GB of RAM (!!!), and a 2TB drive. My MacBook Air has 16 GB of RAM and the first M1 chip with 8 cores.
I regularly work with large datasets, ranging from 10 to 50 GB. But even a 2 GB file can cause issues, depending on what kind of data transformations and computations you perform. Over time, using a computer with little RAM becomes frustrating. While a local installation of Apache Spark helps me utilize multiple cores simultaneously, the lack of RAM is always a limiting factor. For the less technically inclined among my readers: Data is loaded from the hard drive into the RAM, and the speed of the hard drive determines how fast this happens because even an SSD is slower than RAM.
However, if there isn’t enough RAM, for example, if I try to load a 20 GB file into 16 GB of RAM, the operating system starts swapping objects from the RAM to the hard drive. This means data is moved back and forth between the RAM and the hard drive, but the hard drive now serves as slower “RAM.” Writing and reading data from the hard drive simultaneously doesn’t speed up the process either. Plus, there’s the overhead, because the program that needs the RAM doesn’t move objects itself—the operating system does. And the operating system also needs RAM. So, if the operating system is constantly moving objects around, it also consumes CPU time. In short, too little RAM means everything slows down.
At one point, I considered building a cluster myself. There are some good guides online about how to do this with inexpensive Raspberry Pis. It can look cool, too. But I have little time. I might still do this at some point, if only to try it out. Just for the math: 8 Raspberry Pis with 8 GB of RAM plus accessories would probably cost me close to €1,000. Plus, I’d have to learn a lot of new things. So, putting it off isn’t the same as giving up.
To clarify, I primarily program in R, a statistical programming language. Here, I have two scenarios:
For the cluster, I use Apache Spark, which works excellently locally. For those less familiar with the tech: With Spark, I can create a cluster where computational tasks are divided and sent to individual Nodes for processing. This allows for parallel processing. I can either build a cluster with multiple computers (which requires sending the data over the network), or I can install the cluster locally and use the cores of my CPU as the nodes. A local installation has the huge advantage of no network latency.
For those who want to learn more about R and Spark, here is the link to my book on R and Data Science!
For the first test, a script without parallelization, I use a famous dataset from the history of search engines, the AOL data. It contains 36,389,575 rows, just under 2 GB. Many generations of my students have worked with this dataset. In this script, the search queries are broken down, the number of terms per query is calculated, and correlations are computed. Of course, this could all be parallelized, but here, we’re just using one core.
For the second test, I use a nearly 20 GB dataset from Common Crawl (150 million rows and 4 columns) and compare it with data from Wikipedia, just under 2 GB. Here, I use the previously mentioned Apache Spark. My M1 Max has 10 cores, and even though I could use all of them, I’ll leave one core for the operating system, so we’ll only use 9 cores. To compare with the M1 in my MacBook Air, we’ll also run a test where the M1 Max uses the same number of cores as the Air.
How do I measure? There are several ways to measure, but I choose the simplest one: I look at what time my script starts and when it ends, then calculate the difference. It’s not precise, but we’ll see later that the measurement errors don’t really matter.
It depends. The first test is somewhat disappointing. The larger RAM doesn’t seem to make much of a difference here, even though mutations of the AOL dataset are created and loaded into memory. The old M1 completes the script in 57.8 minutes, while the M1 Max takes 42.5 minutes. The data are probably loaded into RAM a bit faster thanks to the faster SSDs, but the difference is only a few seconds. The rest seems to come from the CPU. But for this price, the M1 Max doesn’t justify itself (it’s twice as expensive as the MacBook Air).
Things get more interesting when I use the same number of cores on both sides for a cluster and then use Spark. The differences are drastic: 52 minutes for the old M1 with 16 GB of RAM, 5.4 minutes for the new M1 Max with 64 GB of RAM. The “old” M1, with its limited RAM, takes many minutes just to load the large dataset, while the new M1 Max with 64 GB handles it in under 1 minute. By the way, I’m not loading a simple CSV file here but rather a folder full of small partitions, so the nodes can read the data independently. It’s not the case that the nodes are getting in each other’s way when loading the large file.
On October 13, 2021, reMarkable announced that the previously free cloud service would now be limited, and the truly exciting features would become paid for new users. I had suspected this earlier, just as I had with tado. tado had announced a subscription in August 2018, but they backtracked for the first customers. While I had to purchase the new app for about 20 euros to use the new features, at least I don’t have to pay any subscription fees.
With both companies, I wasn’t sure why they didn’t include a subscription model from the start. Because in both cases, it was clear that costs would increase as more users accessed the servers. For reMarkable, the costs would be even higher since they offer 8 GB of cloud storage. It should have been obvious from the beginning that at some point, a subscription would have to be introduced to offset the growing costs associated with the increasing number of users. Did both companies avoid the subscription model because they thought it might deter buyers? Aren’t the first customers usually early adopters who are less price-sensitive?
I sold my reMarkable a few months ago, not because of the impending subscription model, but because I simply want fewer gadgets, and it didn’t fit into my workflow. At the end of the day, reMarkable is a niche product, because the desire for focus in a time when distraction is either sought or found by distraction is only present in a small number of users. Even though I think it’s a great product, I don’t believe it will ever be widely adopted by the masses.