February 21, 2025
What makes entrepreneurs entrepreneurial?
Entrepreneurs think and act differently from managers and strategists.
This 2008 paper argues that entrepreneurs use effectual reasoning, the polar opposite of causal reasoning taught in business schools. Causal reasoning starts with a goal and finds the best way to achieve it. Effectual reasoning starts with available resources and lets goals emerge along the way. Entrepreneurs are explorers, not generals. Instead of following fixed plans, they experiment and adapt to seize whatever opportunities the world throws at them.
Consider this example from the paper. A causal thinker starts an Indian restaurant by following a fixed plan: researching the market, choosing a prime location, targeting the right customers, securing funding, and executing a well-designed strategy. The effectual entrepreneur doesn’t start with a set goal. She starts with what she has (her skills, knowledge, and network), and she experiments. She might begin by selling homemade lunches to friends’ coworkers. If that works, she expands. If not, she watches what excites her customers. Maybe they care less about the food and more about her cultural insights and life experiences. She doesn’t force the restaurant idea. She unasks the question and asks a new one: What do people really want from me? (This is what Airbnb did with its "Experiences". Instead of just renting out rooms, hosts began offering cooking classes, city tours, and adventures, things people didn’t know they wanted until they saw them.)
Key principles of effectual reasoning
The author, Prof. Saras D. Sarasvathy, interviewed 30 seasoned entrepreneurs and identified key principles of effectual reasoning.
The first is affordable loss. Forget maximizing returns. Entrepreneurs focus on what they can afford to lose. Instead of wasting time on market research, they test ideas in the real world. They fail cheap, fail fast, and learn faster.
The second is partnerships over competition. Entrepreneurs don’t assume a market exists for their idea. They build networks of collaborators who help shape the business. This lowers risk, opens doors, and provides instant feedback. This contrasts with the corporate world, which focuses on outmaneuvering rivals.
The third is leveraging surprises. Managers hate surprises. Entrepreneurs love them. The unexpected isn’t an obstacle, it’s an opening. A pivot. A new market. A better business.
So, effectual reasoning flips traditional business thinking on its head. Most strategies assume that predicting the future gives control. Entrepreneurs believe the opposite: take action, and you shape the future. Rather than waiting for ideal conditions, they start with what they have and improvise.
Discussion
Here is what I make of this. Some people are born entrepreneurs. Maybe you can teach it. Maybe you can’t. Some minds thrive in uncertainty and make it up as they go. Others crave a plan, a map, a well-paved road.
It’s the old fox-versus-hedgehog dilemma from Grand Strategy. The fox knows many things. The hedgehog knows one big thing.
A smart friend of mine (Mahesh Balakrishnan, who has a related Managing Skunks post here) put it perfectly. On his dream software team, he wants either Jeeps or Ferraris. Jeeps go anywhere. No roads, no directions—just point them at a problem, and they’ll tear through it. That’s effectual reasoning. Ferraris, on the other hand, need smooth roads and a clear destination. But once they have that, they move fast. He doesn’t want Toyota Corollas. Corollas are slow. Worse, they still need roads.
So here’s my corollary (admire my beautiful pun).
If you got to be a causal thinker, be a Ferrari. Not a Corolla.
Postamble
You can tell Steve Jobs is the textbook "effectual thinker" just from this quote.
“The thing I would say is, when you grow up, you tend to get told that the world is the way it is, and your life is just to live your life inside the world. Try not to bash into the walls too much. Try to have a nice family life, have fun, save a little money. But life, that's a very limited life. Life can be much broader once you discover one simple fact, and that is: Everything around you that you call life was made up by people that were no smarter than you. And you can change it. You can influence it. You can build your own things that other people can use. And the minute that you understand that you can poke life, and actually something will, you know, if you push in, something will pop out the other side, that you can change it. You can mold it. That's maybe the most important thing is to shake off this erroneous notion that life is there and you're just going to live in it, versus embrace it. Change it. Improve it. Make your mark upon it. I think that's very important. And however you learn that, once you learn it, you'll want to change life and make it better. Because it's kind of messed up in a lot of ways. Once you learn that, you'll never be the same again.”
February 20, 2025
How to find Lua scripts for sysbench using LUA_PATH
sysbench is a great tool for benchmarks and I appreciate all of the work the maintainer (Alexey Kopytov) put into it as that is often a thankless task. Today I struggled to figure out how to load Lua scripts from something other than the default location that was determined when sysbench was compiled. It turns out that LUA_PATH is the thing to set, but the syntax isn't what I expected.
My first attempt was this, because the PATH in LUA_PATH implies directory names. But that failed.
LUA_PATH="/mnt/data/sysbench.lua/lua" sysbench ... oltp_insert run
It turns out that LUA_PATH uses special semantics and this worked:
LUA_PATH="/mnt/data/sysbench.lua/lua/?.lua" sysbench ... oltp_insert run
The usage above replaces the existing search path. The usage below prepends the new path to the existing (compiled in) path:
LUA_PATH="/mnt/data/sysbench.lua/lua/?.lua;;" sysbench ... oltp_insert run
Geoblocking the UK with Debian & Nginx
A few quick notes for other folks who are geoblocking the UK. I just set up a basic geoblock with Nginx on Debian. This is all stuff you can piece together, but the Maxmind and Nginx docs are a little vague about the details, so I figure it’s worth an actual writeup. My Nginx expertise is ~15 years out of date, so this might not be The Best Way to do things. YMMV.
First, register for a free MaxMind account; you’ll need this to subscribe to their GeoIP database. Then set up a daemon to maintain a copy of the lookup file locally, and Nginx’s GeoIP2 module:
apt install geoipupdate libnginx-mod-http-geoip2
Create a license key on the MaxMind site, and download a copy of the config file you’ll need. Drop that in /etc/GeoIP.conf
. It’ll look like:
AccountID XXXX
LicenseKey XXXX
EditionIDs GeoLite2-Country
The package sets up a cron job automatically, but we should grab an initial copy of the file. This takes a couple minutes, and writes out /var/lib/GeoIP/GeoLite2-Country-mmdb
:
geoipupdate
The GeoIP2 module should already be loaded via /etc/nginx/modules-enabled/50-mod-http-geoip2.conf
. Add a new config snippet like /etc/nginx/conf.d/geoblock.conf
. The first part tells Nginx where to find the GeoIP database file, and then extracts the two-letter ISO country code for each request as a variable. The map
part sets up an $osa_geoblocked
variable, which is set to 1
for GB, otherwise 0
.
geoip2 /var/lib/GeoIP/GeoLite2-Country.mmdb {
$geoip2_data_country_iso_code country iso_code;
}
map $geoip2_data_country_iso_code $osa_geoblocked {
GB 1;
default 0;
}
Write an HTML file somewhere like /var/www/custom_errors/osa.html
, explaining the block. Then serve that page for HTTP 451 status codes: in /etc/nginx/sites-enabled/whatever
, add:
server {
...
# UK OSA error page
error_page 451 /osa.html;
location /osa.html {
internal;
root /var/www/custom_errors/;
}
# When geoblocked, return 451
location / {
if ($osa_geoblocked = 1) {
return 451;
}
}
}
Test your config with nginx -t
, and then service nginx reload
. You can test how things look from the UK using a VPN service, or something like locabrowser.
This is, to be clear, a bad solution. MaxMind’s free database is not particularly precise, and in general IP lookup tables are chasing a moving target. I know for a fact that there are people in non-UK countries (like Ireland!) who have been inadvertently blocked by these lookup tables. Making those people use Tor or a VPN sucks, but I don’t know what else to do in the current regulatory environment.
Multi-tenant vector search with Amazon Aurora PostgreSQL and Amazon Bedrock Knowledge Bases
Self-managed multi-tenant vector search with Amazon Aurora PostgreSQL
February 19, 2025
I've helped huge companies scale logs analysis. Here’s how.
My database communities
I have been working on databases since 1996. In some cases I just worked on the product (Oracle & Informix), in others I consider myself a member of the community (MySQL, Postgres & RocksDB). And for MongoDB I used to be in the community.
I worked on Informix XPS in 1996. I chose Informix because I could live in Portland OR and walk to work. I was fresh out of school, didn't know much about DBMS, but got a great starter project (star query optimization). The company wasn't in great shape so I left by 1997 for Oracle. I never used Informix in production and didn't consider myself as part of the Informix community.
I was at Oracle from 1997 to 2005. The first 3 years were in Portland implementing JMS for the app server team and the last 5 years at Oracle HQ working on query execution. I fixed many bugs, added support for ieee754 types, rewrote sort and maintained the sort and bitmap index row sources. The people there were great and I learned a lot but I did not enjoy the code base and left for a startup. I never used Oracle in production and don't consider myself as part of the Oracle community.
I lead the MySQL engineering teams at Google for 4 years and at Facebook/Meta for 10 years. I was very much immersed in production and have been active in the community since 2006. The MySQL teams got much done at both Google (GTID, semi-sync, crash-safe replication, rewrote the InnoDB rw lock) and Facebook/Meta (MyRocks and too many other things to mention). Over the years at FB/Meta my job duties got in the way of programming so I used performance testing as a way to remain current. I also filed many bugs might still be in the top-10 for bug reports. While Oracle has been a great steward for the MySQL project I have been critical about the performance regressions from older MySQL to newer MySQL. I hope that eventually stops because it will become a big problem.
I contributed some code to RocksDB, mostly for monitoring. I spent much more time doing performance QA for it, and filing a few bugs. I am definitely in the community.
I don't use Postgres in production but have spent much time doing performance QA for it over the past ~10 years. A small part of that was done while at Meta, I had a business case, and was able to use some of their HW and my time. But most of this has been a volunteer effort -- more than 100 hours of my time and 10,000+ hours of server time. Some of those server hours are in public clouds (Google, Hetzner) so I am also spending a bit on this. I found a few performance bugs. I have not found large performance regressions over time which is impressive. I have met many of the contributors working on the bits I care about, and that has been a nice benefit.
I used to be a member of the MongoDB community. Like Postgres, I never supported it in production but I spent much time doing performance QA with it. I wrote mostly positive blog posts, filed more than a few bugs and even won the William Zola Community Award. But I am busy enough with MySQL, Postgres and RocksDB so I haven't tried to use it for years. Regardless, I continue to be impressed by how fast they pay down tech debt, with one exception (no cost-based optimizer).
February 18, 2025
Alternatives to MongoDB Atlas: More Control, Lower Costs
February 17, 2025
Outgrowing Postgres: How to evaluate the right OLAP solution for analytics
My Time at MIT
Twenty years ago, in 2004-2005, I spent a year at MIT’s Computer Science department as a postdoc working with Professor Nancy Lynch. It was an extraordinary experience. Life at MIT felt like paradise, and leaving felt like being cast out.
MIT Culture
MIT’s Stata Center was the best CS building in the world at the time. Designed by Frank Gehry, it was a striking abstract architecture masterpiece (although like all abstractions it was a bit leaky). Furniture from Herman Miller complemented this design. I remember seeing price tags of $400 on simple yellow chairs.
The building buzzed with activity. Every two weeks, postdocs were invited to the faculty lunch on Thursdays, and alternating weeks we had group lunches. Free food seemed to materialize somewhere in the building almost daily, and the food trucks outside were also good. MIT thrived on constant research discussions, collaborations, and talks. Research talks were advertised on posters at the urinals, as a practical touch of MIT's hacker culture I guess.
Our research group occupied the 6th floor, which was home to theory and algorithms. From there, I would see Tim Berners-Lee meeting with colleagues on the floor below. The building’s open spaces and spiral staircases connected every pair of floors to foster interaction. The place radiated strong academic energy. One evening, I saw Piotr Indyk discussing something in front of one of the many whiteboards on the 6th floor. The next morning, he was still there, having spent the night working toward a paper deadline. Eric Demaine was on the same floor too. Once, I sent a long print job (a PhD thesis) accidentally to his office printer, and he was angry because of the wasted paper.
Nancy Lynch set a great example for us. She is a very detail-oriented person and she was able to find even the tiniest mistakes in the papers with ease. She once told me that her mind worked like a debugger when reading a paper, and these bugs jumped at her. The way she worked with students was that she would dedicate herself solely on a student/paper for the duration of an entire week. That week, she would avoid thinking or listening other works/students, even when she wanted to participate. This is because, she wanted to immerse and keep every parameter about the paper she is working on in her mind, and grok it.
People in Nancy's group were also incredibly sharp—Seth Gilbert, Rui Fan, Gregory Chockler, Cal Newport, and many other students and visiting researchers. Yes, that Cal Newport of "Deep Work" fame was a fresh PhD student back then. Looking back, I regret not making more friends, and not forging deeper connections.
Lessons Learned
Reflecting on my time at MIT, I wish I had been more intentional, more present, and more engaged. The experience was a gift, but I see now how much more I could have made of it.
I was young, naive, and plagued by impostor syndrome. I held back instead of exploring more, engaging more deeply, and seeking out more challenges. I allowed myself to be carried along by the current, rather than actively charting my own course. Youth is wasted on the young.
Why pretend to be smart and play it safe? True understanding is rare and hard-won, so why claim it before you are sure of it? Isn't it more advantageous to embrace your stupidity/ignorance and be underestimated? In research and academia, success often goes not to the one who understands first, but to the one who understands best. Even when speed matters, the real advantage comes from the deep, foundational insights that lead there.
When you approach work with humility and curiosity, you learn more and participate more fully. Good collaborators value these qualities. A beginner’s mind is an asset. Staying close to your authentic self helps you find your true calling.
February 15, 2025
Vector indexes, large server, dbpedia-openai dataset: MariaDB, Qdrant and pgvector
My previous post has results for MariaDB and pgvector on the dbpedia-openai dataset. This post adds results from Qdrant. This uses ann-benchmarks to compare MariaDB, Qdrant and Postgres (pgvector) with a larger dataset, dbpedia-openai at 500k rows. The dataset has 1536 dimensions and uses angular (cosine) as the distance metric. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.
tl;dr
- I am new to Qdrant so the chance that I made a mistake are larger than for MariaDB or Postgres
- If you already run MariaDB or Postgres then I suggest you also use them for vector indexes
- MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant
- Production is expensive -- you have to worry about security, backups, operational support
- A new DBMS is expensive -- you have to spend time to learn how to use it
So I decided to try the Docker container they provide. I ended up not changing the Qdrant configuration provided in the Docker container. I spent some time doing performance debugging and didn't see anything to indicate that a config change was needed. For example, I didn't see disk IO during queries. But the performance debugging was harder because that Docker container image doesn't come with my favorite debug tools installed. Some of the tools were easy to install, others (perf) were not.
This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit.
The ann-benchmarks config files are here for MariaDB, Postgres and Qdrant. For Postgres I use the values for M and ef_construction. But MariaDB doesn't support ef_construction so I only specify the M values. While pgvector requires ef_construction to be >= 2*M, I do not know whether Qdrant has a similar requirement. Regardless I only test cases where that constraint is true.
- MariaDB uses 16-bit integers rather than float32
- pgvector uses float32, pgvector halfvec uses float16
- For Qdrant I used none (float32) and scalar (int8)
The command lines to run the benchmark using my helper scripts are:
These charts show the best QPS for a given recall. MariaDB gets more QPS than Qdrant and pgvector but that is harder to see as the recall approaches 1, so the next section has a table for best QPS per DBMS at a given recall.
Results: create index
- index sizes are similar between MariaDB and pgvector with halfvec
- time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section. But Qdrant creates indexes a lot faster than MariaDB or pgvector.
- I did not find an accurate way to determine index size for Qdrant. There is a default method in ann-benchmarks that a DBMS can override. The default just compares process RSS before and after creating an index which isn't accurate for small indexes. The MariaDB and Postgres code override the default and query the data dictionary to get a more accurate estimate.
More details on index size and index create time for MariaDB and Postgres are in my previous post.
With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms -- MariaDB, pgvector with float32 and float16/halfvec, Qdrant with no and scalar quantization.
- Qdrant with scalar quantization does not get a result for recall=1.0 for the values of M, ef_construction and ef_search I used
- MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant
- Index create time was much less for Qdrant (described above)
- recall, QPS - best QPS at that recall
- rel2ma - (QPS for me / QPS for MariaDB)
- m= is the value for M when creating the index
- ef_cons= is the value for ef_construction when creating the index
- ef_search= is the value for ef_search when running queries
- quant= is the quantization used by Qdrant
- dbms
- MariaDB - MariaDB, there is no option for quantization
- PGVector - Postgres with pgvector and float32
- PGVector_halfvec - Postgres with pgvector and halfvec (float16)
- Qdrant(..., quant=none) - Qdrant with no quantization
- Qdrant(..., quant=scalar) - Qdrant with scalar quantization
From web developer to database developer in 10 years
Last month I completed my first year at EnterpriseDB. I'm on the team that built and maintains pglogical and who, over the years, contributed a good chunk of the logical replication functionality that exists in community Postgres. Most of my work, our work, is in C and Rust with tests in Perl and Python. Our focus these days is a descendant of pglogical called Postgres Distributed which supports replicating DDL, tunable consistency across the cluster, etc.
This post is about how I got here.
Black boxes
I was a web developer from 2014-2021†. I wrote JavaScript and HTML and CSS and whatever server-side language: Python or Go or PHP. I was a hands-on engineering manager from 2017-2021. I was pretty clueless about databases and indeed database knowledge was not a serious part of any interview I did.
Throughout that time (2014-2021) I wanted to move my career forward as quickly as possible so I spent much of my free time doing educational projects and writing about them on this blog (or previous incarnations of it). I learned how to write primitive HTTP servers, how to write little parsers and interpreters and compilers. It was a virtuous cycle because the internet (Hacker News anyway) liked reading these posts and I wanted to learn how the black boxes worked.
But I shied away from data structures and algorithms (DSA) because they seemed complicated and useless to the work that I did. That is, until 2020 when an inbox page I built started loading more and more slowly as the inbox grew. My coworker pointed me at Use The Index, Luke and the DSA scales fell from my eyes. I wanted to understand this new black box so I built a little in-memory SQL database with support for indexes.
I'm a college dropout so even while I was interested in compilers and interpreters earlier in my career I never dreamed I could get a job working on them. Only geniuses and PhDs did that work and I was neither. The idea of working on a database felt the same. However, I could work on little database side projects like I had done before on other topics, so I did. Or a series of explorations of Raft implementations, others' and my own.
Startups
From 2021-2023 I tried to start a company and when that didn't pan out I joined TigerBeetle as a cofounder to work on marketing and community. It was during this time I started the Software Internals Discord and /r/databasedevelopment which have since kind of exploded in popularity among professionals and academics in database and distributed systems.
TigerBeetle was my first job at a database company, and while I contributed bits of code I was not a developer there. It was a way into the space. And indeed it was an incredible learning experience both on the cofounder side and on the database side. I wrote articles with King and Joran that helped teach and affirm for myself the basics of databases and consensus-based distributed systems.
Holding out
When I left TigerBeetle in 2023 I was still not sure if I could get a job as an actual database developer. My network had exploded since 2021 (when I started my own company that didn't pan out) so I had no trouble getting referrals at database companies.
But my background kept leading hiring managers to suggest putting me on cloud teams doing orchestration in Go around a database rather than working on the database itself.
I was unhappy with this type-casting so I held out while unemployed and continued to write posts and host virtual hackweeks messing with Postgres and MySQL. I started the first incarnation of the Software Internals Book Club during this time, reading Designing Data Intensive Applications with 5-10 other developers in Bryant Park. During this time I also started the NYC Systems Coffee Club.
Postgres
After about four months of searching I ended up with three good offers, all to do C and Rust development on Postgres (extensions) as an individual contributor. Working on extensions might sound like the definition of not-sexy, but Postgres APIs are so loosely abstracted it's really as if you're working on Postgres itself.
You can mess with almost anything in Postgres so you have to be very aware of what you're doing. And when you can't mess with something in Postgres because an API doesn't yet exist, companies have the tendency to just fork Postgres so they can. (This tendency isn't specific to Postgres, almost every open-source database company seems to have a long-running internal fork or two of the database.)
EnterpriseDB
Two of the three offers were from early-stage startups and after more than 3 years being part of the earliest stages of startups I was happy for a break. But the third offer was from one of the biggest contributors to Postgres, a 20-year old company called EnterpriseDB. (You can probably come up with different rankings of companies using different metrics so I'm only saying EnterpriseDB is one of the biggest contributors.)
It seemed like the best place to be to learn a lot and contribute something meaningful.
My coworkers are a mix of Postgres veterans (people who contributed the WAL to Postgres, who contributed MVCC to Postgres, who contributed logical decoding and logical replication, who contributed parallel queries; the list goes on and on) but also my developer-coworkers are people who started at EnterpriseDB on technical support, or who were previously Postgres administrators.
It's quite a mix. Relatively few geniuses or PhDs, despite what I used to think, but they certainly work hard and have hard-earned experience.
Anyway, I've now been working at EnterpriseDB for over a year so I wanted to share this retrospective. I also wanted to cover what it's like coming from engineering management and founding companies to going back to being an individual contributor. (Spoiler: incredibly enjoyable.) But it has been hard enough to make myself write this much so I'm calling it a day. :)
I wrote a post about the winding path I took from web developer to database developer over 10 years. pic.twitter.com/tf8bUDRzjV
— Phil Eaton (@eatonphil) February 15, 2025
† From 2011-2014 I also did contract web development but this was part-time while I was in school.