<antirez>

News posted by antirez

Redis is open source again

antirez 10 hours ago.
Five months ago, I rejoined Redis and quickly started to talk with my colleagues about a possible switch to the AGPL license, only to discover that there was already an ongoing discussion, a very old one, too. Many people, within the company, had the feeling that the AGPL was a better pick than SSPL, and while eventually Redis switched to the SSPL license, the internal discussion continued.

I tried to give more strength to the ongoing pro-AGPL license side. My feeling was that the SSPL, in practical terms, failed to be accepted by the community. The OSI wouldn’t accept it, nor would the software community regard the SSPL as an open license. In little time, I saw the hypothesis getting more and more traction, at all levels within the company hierarchy.

Reproducing Hacker News writing style fingerprinting

antirez 15 days ago.
About three years ago I saw a quite curious and interesting post on Hacker News. A student, Christopher Tarry, was able to use cosine similarity against a vector of top words frequencies in comments, in order to detect similar HN accounts — and, sometimes, even accounts actually controlled by the same user, that is, fake accounts used to uncover the identity of the writer.

This is the original post: https://news.ycombinator.com/item?id=33755016

I was not aware, back then, of Burrows-Delta method for style detection: it seemed kinda magical that you just needed to normalize a frequency vector of top words to reach such quite remarkable results. I read a few wikipedia pages and took mental note of it. Then, as I was working with Vectors for Redis I remembered about this post, searched the web only to discover that the original page was gone and that the author, in the original post and website, didn’t really explained very well how the data was processed, the top words extracted (and, especially, how many were used) and so forth. I thought I could reproduce the work with Vector Sets, once I was done with the main work. Now the new data type is in the release candidate, and I found some time to work on the problem. This is a report of what I did, but before to continue, the mandatory demo site: you can play with it at the following link:

Vector Sets are part of Redis

antirez 28 days ago.
Yesterday we finally merged vector sets into Redis, here you can find the README that explains in detail what you get:

https://github.com/redis/redis/blob/unstable/modules/vector-sets/README.md

The goal of the new data structure is, in short, to create a new “Set alike” data type, similar to Sorted Sets, where instead of having a scalar as a score, you have a vector, and you can add and remove elements the Redis way, without caring about anything except the properties of the abstract data structure Redis implements, ask for elements similar to a given query vector (or a vector associated to some element already in the set), and so forth. But more about that later, a bit of background, first:

AI is useless, but it is our best bet for the future

antirez 39 days ago.
I used AI with success 5 minutes ago.

Just five minutes ago, I was writing a piece of software and relied on AI for assistance. Yet, here I am, starting this blog post by telling you that artificial intelligence, so far, has proven somewhat useless. How can I make such a statement if AI was just so helpful a moment ago? Actually, there's no contradiction here if we clarify exactly what we mean.

Here’s the thing: at this very moment, artificial intelligence can support me significantly. If I'm struggling with complicated code or need to understand an advanced scientific paper on math, I can turn to AI for clarity. It can help me generate an image for a project, make a translation, clean my YouTube transcript. Clearly, it’s practical and beneficial in these everyday tasks.

Big LLMs weights are a piece of history

antirez 46 days ago.
By multiple accounts, the web is losing pieces: every year a fraction of old web pages disappear, lost forever. We should regard the Internet Archive as one of the most valuable pieces of modern history; instead, many companies and entities make the chances of the Archive to survive, and accumulate what otherwise will be lost, harder and harder. I understand that the Archive headquarters are located in what used to be a church: well, there is no better way to think of it than as a sacred place.

Reasoning models are just LLMs

antirez 81 days ago.
It’s not new, but it’s accelerating. People that used to say that LLMs were a fundamentally flawed way to reach any useful reasoning and, in general, to develop any useful tool with some degree of generality, are starting to shuffle the deck, in the hope to look less wrong. They say: “the progresses we are seeing are due to the fact that models like OpenAI o1 or DeepSeek R1 are not just LLMs”. This is false, and it is important to show their mystification as soon as possible.

First, DeepSeek R1 (don’t want to talk about o1 / o3, since it’s a private thing we don’t have access to, but it’s very likely the same) is a pure decoder only autoregressive model. It’s the same next token prediction that was so strongly criticized. There isn’t, in any place of the model, any explicit symbolic reasoning or representation.

We are destroying software

antirez 82 days ago.
We are destroying software by no longer taking complexity into account when adding features or optimizing some dimension.

We are destroying software with complex build systems.

We are destroying software with an absurd chain of dependencies, making everything bloated and fragile.

We are destroying software telling new programmers: “Don’t reinvent the wheel!”. But, reinventing the wheel is how you learn how things work, and is the first step to make new, different wheels.

We are destroying software by no longer caring about backward APIs compatibility.

From where I left

antirez 142 days ago.
I’m not the kind of person that develops a strong attachment to their own work. When I decided to leave Redis, about 1620 days ago (~ 4.44 years), I never looked at the source code, commit messages, or anything related to Redis again. From time to time, when I needed Redis, I just downloaded it and compiled it. I just typed “make” and I was very happy to see that, after many years, building Redis was still so simple.

My detachment was not the result of me hating my past work. While in the long run my creative work was less and less important and the “handling the project” activities became more and more substantial — a shift that many programmers are able to do, but that’s not my bread and butter — well, I still enjoyed doing Redis stuff when I left. However, I don’t share the vision that most people at my age (I’m 47 now) have: that they are still young. I wanted to do new stuff, especially writing. I wanted to stay more with my family and help my relatives. I definitely needed a break.

Playing audio files in a Pi Pico without a DAC

antirez 421 days ago.
The Raspberry Pico is suddenly becoming my preferred chip for embedded development. It is well made, durable hardware, with a ton of features that appear designed with smartness and passion (the state machines driving the GPIOs are a killer feature!). Its main weakness, the lack of connectivity, is now resolved by the W variant. The data sheet is excellent and documents every aspect of the chip. Moreover, it is well supported by MicroPython (which I’m using a lot), and the C SDK environment is decent, even if full of useless complexities like today fashion demands: a cmake build system that in turn generates a Makefile, files to define this and that (used libraries, debug outputs, …), and in general a huge overkill for the goal of compiling tiny programs for tiny devices. No, it’s worse than that: all this complexity to generate programs for a FIXED hardware with a fixed set of features (if not for the W / non-W variant). Enough with the rant about how much today software sucks, but it must be remembered.

First Token Cutoff LLM sampling

antirez 475 days ago.
From a theoretical standpoint, the best reply provided by an LLM is obtained by always picking the token associated with the highest probability. This approach makes the LLM output deterministic, which is not a good property for a number of applications. For this reason, in order to balance LLMs creativity while preserving adherence to the context, different sampling algorithms have been proposed in recent years.

Today one of the most used ones, more or less the default, is called top-p: it is a form of nucleus sampling where top-scoring tokens are collected up to a total probability sum of “p”, then random weighted sampling is performed.
[more]
: