A Git story: Not so fun this time

Table of contents

Linus Torvalds once wrote in a book that he created Linux just for fun, but it ended up sparking a revolution. Git, his second major creation, also an accidental revolution. It’s now a standard tool for software engineers, but its origin story wasn’t so much fun this time, at least for Linus.

Linus doesn’t scale

1998 was a big year for Linux. Major companies like Sun, IBM, and Oracle started getting involved with Linux. That spring, Linus’s second daughter was born. It had been almost a year since his family moved from Finland to California, settling into their new life. Even though Linux hadn’t brought Linus much financial gain yet, he was doing well in both his career and family life.

On the other hand, the Linux kernel developer community was growing, and the existing collaboration methods were becoming insufficient. Linus started struggling to keep up with the pace of code changes from developers, becoming a bottleneck in the process.

On September 28, 1998, Linus was reading the Linux kernel mailing list as usual when he came across this message:

Please don’t waste your time on creating these patches. These things are functional in the vger tree.

This message annoyed Linus. Linux code changes relied heavily on Linus himself. If you wanted to make a change, you’d email the mailing list, and if Linus saw and approved it, he’d incorporate your patch into his version and release new versions on FTP from time to time. Linus liked to work this way because it allowed him to control all changes. And everyone trusted Linus to manage Linux.

However, since David Miller, a senior Linux kernel developer, set up a CVS server called vger, some people thought they could bypass Linus and just submit changes to vger. This wasn’t the first time Linus encountered this issue, and he responded unhappily on the mailing list:

Note that saying “it’s in vger, so you’re wasting your time” is still completely and utterly stupid. The fact that it is in vger has absolutely no bearing, especially as there’s a lot of stuff in vger that will probably never make it into 2.2.

A heated debate ensued between Linus and a few developers, who complained about Linus’s slow responses, sometimes requiring three emails to get a reply from the “benevolent dictator.”

“These people should look at themselves in the mirror,” Linus thought. “I have to read so many emails a day. If sending three emails is too much trouble, I’d rather not have the patch.” He left this message before storming off:

Quite frankly, this particular discussion (and others before it) has just made me irritable, and is ADDING pressure.

Go away, people. Or at least don’t Cc me any more. I’m not interested, I’m taking a vacation, and I don’t want to hear about it any more. In short, get the hell out of my mailbox.

Linus’s emotional outburst prompted some people to offer help.

One of the open source movement leaders, Eric S. Raymond, author of the famous essay “The Cathedral and the Bazaar,” calmly urged:

People, these are the early-warning signs of potential burnout. Heed them and take warning. Linus’s stamina has been astonishing, but it’s not limitless. All of us (and yes, that means you too, Linus) need to cooperate to reduce the pressure on the critical man in the middle, rather than increasing it.

Larry McVoy also extended a helping hand. In an email titled “A solution for growing pains,” he wrote:

The problem is that Linus doesn’t scale. We can’t expect to see the rate of change to the kernel, which gets more complex and larger daily, continue to increase and expect Linus to keep up. But we also don’t want to have Linus lose control and final say over the kernel, he’s demonstrated over and over that he is good at that.

Figure out a means by which Linus can surround himself with some number of people who do part of his job. Add tools which make that possible.

The mechanism which allows all this to happen is a distributed source management system…

Larry was developing a new version control system called BitKeeper.

The origin of BitKeeper

In the early 1990s, Sun Microsystems introduced an internal tool called Network Software Environment (NSE) to manage their code, but NSE was slow and had a terrible user experience. Some engineers even quit in frustration.

Larry McVoy, a seasoned operating system developer with a background in performance work, was approached by Sun’s management to improve NSE’s performance.

When Larry looked at NSE’s code, he was surprised. “This thing wasn’t designed with performance in mind at all.” He also discovered that NSE was built on top of SCCS, a version control system from the 1970s, older than CVS and Subversion. Instead of trying to fix the deeply flawed NSE, Larry chose a different path: he wrote NSElite in Perl, implementing resync/resolve commands on top of SCCS, similar to Git’s clone/pull/push commands today.

NSElite was much faster than NSE, so Sun’s engineers started abandoning NSE for NSElite. Seeing this, a Sun VP saw a business opportunity and formed an eight-person team to rewrite Larry’s Perl scripts in C++ and turn them into a product called TeamWare.

TeamWare was likely the first distributed version control system (DVCS), and it eventually became essential for developing Sun’s Solaris operating system. Engineers who used TeamWare couldn’t go back: unlike CVS and Subversion, TeamWare allowed you to clone the entire project to your local machine, make changes locally, and merge your version back into the remote version when ready.

The team consisted of eight experienced C programmers. Since C++ was the hot new language, they learned C++ while developing TeamWare. Before TeamWare was completed, Larry continued developing NSElite, which would make the TeamWare team look bad: one guy with Perl was outpacing eight people with C++. The VP then told Larry, “This has been reported to Scooter (Scott McNealy, Sun’s CEO). If you release it again, you’re fired.”

In 1991, Larry stopped developing NSElite but couldn’t shake the idea of building a DVCS. He thought commercial software would follow TeamWare’s lead, but none did. In 1997, Larry began developing a DVCS called BitKeeper. However, it wasn’t until September 1998, when he saw Linus on the verge of burnout on the mailing list, that he felt truly motivated to take BitKeeper seriously.

Linux kernel adopts BitKeeper

One fall day in 1998, Larry invited Linus Torvalds, David Miller, and Richard Henderson to his home. After dinner, they sat on the floor and started brainstorming ways to reduce Linus’s workload. They drew diagrams on the floor for three or four hours, mostly based on how TeamWare had been working within Sun Microsystems. Larry knew them well.

In this framework, developers could use BitKeeper to work independently without interfering with each other. When Linus did the final integration, he wouldn’t lose the history of the changes, making it easier for him to review the code.

“Alright, if you build it and it works as you say, I’ll use it,” Linus said.
“No problem, I’ve done this before. It should take about six months,” Larry replied.

Larry quickly realized he had underestimated the complexity of the task. He founded a company called BitMover and recruited some version control experts to help build BitKeeper. Nineteen months later, in May 2000, BitKeeper’s first version was released. By then, BitMover was a team of seven people.

The first version of BitKeeper included a command-line tool called bk and some graphical interface tools. The bk clone/pull/push commands functioned similarly to git clone/pull/push.

At that time, Sun’s TeamWare was already well-regarded, and BitKeeper was TeamWare on steroids. For example, while TeamWare only allowed data transfer over NFS file systems, BitKeeper could transfer files over HTTP, which made it more distributed. As a result, BitKeeper soon brought BitMover a healthy cash flow. By 2002, BitMover had grown to a 25-person team, completely self-sufficient without external funding.

Larry McVoy, Linux Expo, 1999
Larry McVoy, Linux Expo, 1999

In January 2002, Linus’s workload issue resurfaced. Patches submitted by developers were either taking too long to get a response or were being ignored. Someone wrote “a modest proposal” to try to address the problem. In the discussion, someone casually mentioned, “BitKeeper is a really nice tool,” reminding Linus of that dinner three years ago at Larry’s house. Linus asked, “How many other people are actually using bitkeeper already for the kernel?”

As it turned out, some kernel developers had already been using BitKeeper. The Linux PowerPC (PPC) team started testing BitKeeper in December 1999, and BitMover set up a bkbits.net server to support them.

A few days later, on February 5, 2002, the mailing list saw Linus begin testing BitKeeper. From then on, the main Linux kernel developers started adopting BitKeeper. You didn’t have to use BitKeeper to contribute to development, but if you did, the process went something like this:

# Download the repository
bk clone bk://linux.bkbits.net/linux-2.5 linux-2.5
bk clone linux-2.5 alpha-2.5

# Pull changes from another place
cd alpha-2.5
bk pull bk://gkernel.bkbits.net/alpha-2.5

# Edit files and push changes back to the remote
bk vi fs/inode.c 
bk push bk://gkernel@bkbits.net/alpha-2.5 

To send changes to Linus, you’d email the mailing list with something like:

Here is an update for something something...

Please pull from: bk://gkernel.bkbits.net/alpha-2.5

example/file1.c | 6 ++++++
example/file2.c | 4 ----
2 files changed, 6 insertions(+), 4 deletions(-)

No more free BitKeeper

Larry McVoy allowed Linux kernel developers to use BitKeeper for free, but there were some strings attached. For example, the free user license required:

  • You couldn’t turn off Open Logging, which sent usage logs to the BitMover server.
  • You couldn’t use BitKeeper for version control if you were working on version control software.
  • You had to get BitMover’s permission if you wanted to run BitKeeper alongside other similar software.

The Linux community, full of free software advocates, had mixed reactions. Some scoffed at these terms, while others avoided them. However, for Linus and the main Kernel developers, the key point was that BitKeeper reduced their workload. Since there were no better alternatives at the time, they accepted BitKeeper’s terms for the convenience it offered.

Linus had always been open-minded about proprietary software. He had chosen the GPL for the Linux kernel purely to keep the commercial market from “tainting” Linux. The GPL fit his needs, so he used it. But he never thought all software had to be free software; he believed authors had the right to distribute their software however they wanted. To him, software use wasn’t a social movement.

Free software advocates didn’t share his view. Some extreme ones even considered proprietary software to be evil. These hackers preferred the freedom to modify software over the convenience of BitKeeper.

Larry felt the pressure from the community. To address the issue, the BitKeeper team set up a BitKeeper-to-CVS mirror in 2003, allowing those who didn’t want to install BitKeeper to access code history via CVS. However, the history from CVS was incomplete compared to BitKeeper, and people weren’t satisfied. “Why should our data be locked in BitKeeper’s proprietary format, and why are we prohibited from using other tools to read our own data?”

In response, Andrew Tridgell (Tridge), the Australian programmer behind Samba and rsync, started developing a free BitKeeper client in February 2005 to solve the problems faced by free software users.

Tridge did the following.

“Here’s a BitKeeper address, bk://thunk.org:5000. Let’s try connecting with telnet.”

$ telnet thunk.org 5000
Trying 69.25.196.29...
Connected to thunk.org.
Escape character is '^]'.

“We’re connected. Why not type the help command?”

help
? - print this help
abort - abort resolve
check - check repository
clone - clone the current repository
help - print this help
httpget - http get command
[...]

“The BitKeeper server is kind enough to list all the commands.”
“So clone must be the command to download the repository?”

He typed clone and found that the output was a series of SCCS-formatted files. With that, the “reverse engineering” was mostly done; the rest was just writing the program.

Linus somehow learned what Tridge was doing — perhaps Tridge told him privately. Linus then informed his friend Larry. Larry was not impressed. A free third-party client would ruin BitKeeper’s business model. Larry, seeking help from Linus and Stuart Cohen (then CEO of OSDL, now the Linux Foundation), wanted to ask Tridge to stop.

Stuart Cohen chose to stay out of it, considering it none of OSDL’s business. But Linus didn’t want to lose BitKeeper, so he worked hard to mediate, trying to find a compromise acceptable to both sides. Tridge firmly believed he wasn’t doing anything wrong, thinking a third-party client would be a win-win for BitKeeper and kernel developers. In April 2005, he released SourcePuller on Freshmeat (later merged into SourceForge). In the README, he wrote:

An open client combined with the ability to accurately import into other source code management tools would have been a big step forward, and should have allowed BitMover to flourish in the commercial environment while still being used by the free software community.

I would also like to say that BitMover is well within its rights to license BitKeeper as it sees fit. I am of course disappointed at how BitMover has portrayed some of my actions, but please understand that they are under a lot of pressure. Under stress people sometimes say things that perhaps they shouldn’t.

Larry disagreed with the win-win idea. Supporting kernel development cost money. Not only did they not make money, but they were also risking harm to their existing business model. To protect BitMover’s livelihood, he chose to revoke the free use license from BitKeeper.

After weeks of negotiation, Linus was fed up with playing mediator. With no more free BitKeeper available, Linus was furious. He publicly blamed Tridge on the forum, saying he “just tore down something new” and “screwed people over.” Linus could have paid for BitKeeper himself, but he couldn’t ask other kernel developers to do the same, so he needed a new solution. He kept on writing:

Now, I’m dealing with the fall-out, and I’ll write my own kernel source tracking tool because I can’t use the best any more. That’s ok - I deal with my own problems, thank you very much.

On April 6, 2005, Linus announced on the mailing list that the Linux kernel was parting ways with BitKeeper. He first thanked Larry and his team for their help over the past three years. Then he said he would be offline for a week to find a replacement. Finally, he added:

Don’t bother telling me about subversion. If you must, start reading up on “monotone”. That seems to be the most viable alternative.

Monotone

Monotone was created by Graydon Hoare. In 2001, Graydon, who lived in Canada, wanted to work more easily with his Australian friend. So they developed a system similar to today’s Continuous Integration (CI), which wasn’t widely known yet. Their system ensured that the code always passed tests.

In 2002, Graydon became interested in combining version control with CI. At that time, only Aegis had such a concept. Graydon also saw his friends using BitKeeper and thought that merging Aegis with DVCS could be an opportunity. That’s how Monotone came into being.

Remarkably, Graydon later joined Mozilla and created the Rust programming language.

Unfortunately, Linus picked a bad time to play with Monotone. Monotone 0.7 had decent performance, but starting from v0.14, developers began adding many validation mechanisms. Just before Linus downloaded Monotone, Graydon released v0.17 and then went on vacation. This version included a lot of rigorous checks to ensure data integrity before writing to the database, but these checks hadn’t been optimized, slowing down performance. The release notes for 0.17 mentioned:

not yet fully optimized; “pull” may be very slow and use lots of cpu

Someone tested downloading Monotone with Monotone itself, and it took two hours, with 71 minutes of CPU time. “A heavily sedated sloth with no legs is probably faster,” they commented.

Linus reported the performance issues to the Monotone developers. On April 10, 2005, Monotone 0.18 was released, with many operations running at least twice as fast. Although Linus was listed as a contributor in the 0.18 release, according to Monotone developer Nathaniel Smith’s words:

Linus hasn’t actually contributed any code to Monotone, or, to the best of my knowledge, any SCM besides git. He didn’t really provide any suggestions either, beyond “this is too slow!” ;-). He’s credited there because it was in discussions with him that I found the right test case to track down one of our major performance bugs. I debated for a bit whether I should actually credit him by name for that, exactly because it was likely to give people strange ideas, but, figured, if it had been anyone else I would have, so… *shrug*.

Meanwhile, inspired by Monotone’s design, Linus started writing some C code from scratch.

Git v0.01: First look

On April 7, 2005, Linus uploaded a thing called Git and wrote on the mailing list:

here’s a quick challenge for you, and any crazy hacker out there: if you want to play with something really nasty (but also very very fast), take a look at kernel.org:/pub/linux/kernel/people/torvalds/.

First one to send me the changelog tree of sparse-git (and a tool to commit and push/pull further changes) gets a gold star, and an honorable mention. I’ve put a hell of a lot of clues in there.

This was Linus’s first public mention of Git.

The URL had the following files and directories:

git.git/                  09-Apr-2005 16:09    -
sparse.git/               07-Apr-2005 20:07    -
git-0.01.tar.bz2          07-Apr-2005 14:25   39K
git-0.01.tar.bz2.sign     07-Apr-2005 14:25  248
git-0.01.tar.gz           07-Apr-2005 14:25   40K
...
sparse-git.tar.bz2        08-Apr-2005 17:26   15M

git-0.01.tar.bz2 had about 1,000 lines of C code:

---------------------------------------------------------------------
File                             blank        comment           code
---------------------------------------------------------------------
./read-cache.c                      31             14            219
./update-cache.c                    32             23            198
./commit-tree.c                     23             26            128
./show-diff.c                        8              5             73
./cache.h                           17             23             53
./write-tree.c                      11              7             53
./read-tree.c                        4              5             39
./init-db.c                          4             14             38
./Makefile                          14              0             26
./cat-file.c                         2              5             21
---------------------------------------------------------------------
SUM:                               146            122            848
---------------------------------------------------------------------

Unlike today’s Git, which uses a single executable file, the earliest version of Git that Linus uploaded compiled into seven separate executables:

  • init-db
  • update-cache
  • show-diff
  • write-tree
  • read-tree
  • commit-tree
  • cat-file

The init-db command was simple. It created a directory named .dircache/objects in the current directory and then created 256 subdirectories numbered in hexadecimal from 00 to ff inside .dircache/objects.

The .dircache/objects directory represented an object database with the following types of objects:

  • Blob: the file contents.
  • Tree: directories, essentially containing names of files (blob) and directories (trees).
  • Changeset: defined by the names of two trees, representing the change from tree A to tree B. “Changeset” was an early term for what later became known as a “commit.”

Here object names are not file or directory names but the SHA-1 hash of their compressed content. Linus borrowed this idea from Monotone, but while Monotone used SQLite to store SHA-1 object names and contents, Linus chose to use system calls and the filesystem directly.

SHA-1’s uniqueness meant that the Git database would almost never have two objects with different names but identical content. If an object’s name was ba93e701c0fe6dcd181377068f6b3923babdc150, Git would store it in .dircache/objects/ba/ as a file named 93e701c0fe6dcd181377068f6b3923babdc150.

The seven executables focused on different operations around this “content-addressable” filesystem. For example:

  • write-tree: Creates a tree object, writing the snapshot of a directory into the database.
  • commit-tree: Creates a changeset, linking two trees in the database, similar to today’s git commit.
  • update-cache: Adds a file to the .dircache/index, akin to today’s git add to the staging area.

Linus liked the SHA-1-based naming idea as soon as he saw it in Monotone — because it was simple. Simplicity was also what Linus liked about Unix. In his book “Just for Fun,” he described Unix:

This simple design is what intrigued me, and most people, about Unix (well, at least us geeks). Pretty much everything you do in Unix is done with only six basic operations (called “system calls,”…

It gives you the building blocks that are sufficient for doing everything. That’s what having a clean design is all about.

Git followed this philosophy, with a data model simpler than CSV, Subversion, or BitKeeper. It essentially stored the state of the directory before and after changes, without tracking what specific files or lines changed. This information was already embedded in the before and after states of the trees.

The Git prototype Linus wrote in two days was simple. No extra validation. No relational database. Just C code, SHA-1 hashes, and system calls, completely tailored to Linus’s needs. Meanwhile, the Monotone project, entering its third year, was feature-rich and had to cater to a wide range of use cases. Plus, Monotone’s original author Graydon had added a bunch of unoptimized code before going on vacation, leaving it without leadership. Monotone couldn’t match Git’s speed since it was at a disadvantage.

The sparse-git.tar.bz2 file Linus initially uploaded was likely the first Git repository ever. Sparse was a static analyzer for C that Linus wrote in 2003. If you’re still interested in Linus’s challenge, you can slightly modify the extracted sparse-git.tar.bz2 and use today’s git log command to read the change history:

# Assuming you are in the sparse-git directory
mv .dircache .git
mkdir .git/refs
git log

Git’s early contributors

The initial version of Git sparked lively discussion. A few days later, Linus created a dedicated mailing list for Git, allowing the Linux Kernel mailing list to get back on track. In the first month, the Git mailing list saw around 2,600 messages, while the Linux kernel, the most collaborative software project in history, had 7,000-9,000 messages each month during the same period.

For experts in the version control field, Git was just another project. Linus’s first upload of Git contained only a few low-level operations. It lacked essential commands like clone and merge, making it far from a usable version control system. And Linus’s constant praise of Git unintentionally belittled other version control systems. This irritated Bram Cohen, the creator of BitTorrent.

At the time, Bram was promoting his own version control system, Codeville. Codeville was already a mature DVCS comparable to Monotone and featured an advanced merge algorithm. Seeing how Linus and his followers talked about merge algorithms, Bram felt that Linus, an outsider, was reinventing the wheel. “Git is weekend hack which looks like a weekend hack,” Bram wrote.

Bram got a point, but this wasn’t just any weekend hack — it was Linus Torvalds’s hack, the Linux kernel creator’s hack. As a folk hero in the open-source software world, Linus’s every move is under the spotlight. Young developers looked up to him, seeing him as a role model. Consequently, after Linus uploaded Git, it quickly attracted a group of young developers eager to join the discussion and development.

One of the early contributors was Petr Baudis from the Czech Republic. The day Linus announced Git, Petr downloaded the code, became enchanted, and started contributing. Given the early Git’s usability issues, Petr developed git-pasky (pasky being Petr’s alias), which eventually became Cogito. If Git’s foundation was the plumbing, Cogito was the porcelain — the user-friendly interface.

In software development terminology, comparing low-level infrastructure to plumbing is hard to trace, but the use of “porcelain” to describe high-level packaging originated in the Git mailing list. To this day, Git uses the terms “plumbing” and “porcelain” to refer to low-level and high-level commands, respectively.

Additionally, Petr set up the first project homepage for Git, git.or.cz, and a code hosting service, repo.or.cz. These websites were the “official” Git sites until GitHub took over.

Petr contributed externally by building on top of Git and creating services around it. Another early contributor, Junio Hamano, contributed directly from within Git itself. He later took over as Git’s maintainer from Linus. He still holds this position today.

The successor

Junio Hamano is a software engineer from Japan. Around 1995, about a year after graduating, he was sent to Los Angeles by his employer, Twin Sun, and has lived in the US ever since. There, he met Paul Eggert, who was also working at the same company at the time.

Paul Eggert has maintained many free or open-source software projects, including RCS (an early version control system) and Tar. He is currently a professor at UCLA and the maintainer of the timezone database.

Influenced by Paul, Junio developed an interest in the world of open-source software. Although he wasn’t a kernel developer, he subscribed to mailing lists for open-source projects like the Linux kernel just for fun.

In April 2005, Junio saw Linus’s announcement on the mailing list that the Linux kernel was parting ways with BitKeeper. Junio had always wanted to make a mark in the open-source world, and this new project called Git seemed like a great opportunity — brand new, no historical baggage, easy to get into. He downloaded the tarball, spent about two hours reading through the initial Git code in one setting. He was impressed by how well it was written.

After the initial release of Git, Linus quickly added commit and diff commands, but there was no merge yet.

Although Linus had never written version control software before, he had used BitKeeper for three years. Before that, he had ten years of “version control human” experience. He knew what kind of merge algorithm he wanted. However, since the merge logic was more involved, Linus thought it might be better suited to a scripting language, writing:

I’ve been avoiding doing the merge-tree thing, in the hope that somebody else does what I’ve described. I really do suck at scripting things, yet this is clearly something where using C to do a lot of the stuff is pointless.

A week went by with no takers. Junio, between projects at the time, had some free time, so he wrote what Linus wanted in Perl and posted it to the mailing list.

I now have a Perl script that uses rev-tree, cat-file, … and merge (from RCS). Quick and dirty.

Junio probably picked up some knowledge of RCS from his mentor Paul, so he had some understanding of version control software. In his email, he also detailed about 30 test cases covering various code branches.

It was already 1 AM, and with his kids waking up at 7 AM, Linus usually went to bed by 10 PM. But seeing Junio’s Perl script, Linus was thrilled and couldn’t help but reply:

That’s exactly what I wanted. Q’n’D is how the ball gets rolling.

He eagerly continued the discussion with Junio.

Merge with git-pasky II” originally started with Petr asking Linus about merging his version, but it soon diverted into a discussion on merge algorithms. During the discussion, Linus also explained why Git’s internals didn’t need to handle file renames.

Over the next 48 hours, starting from the midnight on April 14, Junio and Linus exchanged a dozen emails in that thread. Junio patiently revised the code to meet Linus’s vision for merging. It was clear from his words that Junio was a big fan of Linus. For example, Junio would quote Linus’s words from four years ago, “I’m always right,” and would adequately flatter Linus.

At midnight on April 16, Linus had a brainwave and mentioned he had a “cunning plan.”

Damn, my cunning plan is some good stuff.

Or maybe it is so cunning that I just confuse even myself. But it looks like it is actually working, and that it allows pretty much instantaenous merges.

Linus cleverly reused the existing index, introducing the concept of “stages” on top of it, which significantly simplified the implementation of merge.

Junio marveled at Linus’s solution:

I really like this a lot. It is so simple, clear, flexible and an example of elegance. This is one of the things I would happily say “Sheeeeeeeeeeeeeesh! Why didn’t I think of THAT first!!!” to.

This meant that Junio’s previous Perl code would be wasted, but the new solution was so brilliant that Junio accepted it wholeheartedly.

The merge algorithm was just the beginning. Junio continued to contribute more patches to Linus, gradually earning Linus’s trust.

Linus had mentioned before that he wouldn’t maintain Git long-term. Once the time was right, he would hand Git over to someone else and return to his main job with the Linux kernel. Junio was the obvious choice, as Linus appreciated his “good taste” in writing code. So, three months later, on July 26, Linus announced that he was passing the role of Git maintainer to Junio.

Junio also posted an announcement:

As some of you seem to have noticed even before the announcement by Linus, the official GIT repository at kernel.org is now owned by me. As Linus said in his message, this does not mean he is leaving us, so please do not panic.

I would also like to thank Twin Sun Inc (my employer) and NEC for promising to support me working on GIT on a part time basis from now on. I expect to be spending 8 to 12 day-job hours per week; evenings and weekends are my own time as before. My tentative plan is to make Wednesdays and Saturdays my primary GIT days.

Earlier, I was producing as many patches as ideas cross my mind, throwing all of them at the list to see which ones stick, relying on somebody with a good taste upstream to drop all the bad ones. Although it has been fun working that way with Linus, regrettably, I ended up wasting a lot of his time.

I will slow down and be more careful as the “shepherd of the main repository” from now on. At least for now, you will see patches from me posted on the list like everybody else’s, before they hit the main repository.

Under Junio’s leadership, Git 1.0.0 was released on December 21. 19 years later (as of July 2024), Junio is working at Google and still maintains Git.

This article mentions Linus more often than Junio, but the biggest contributors to Git as it exists today are the persistent efforts of Junio and other developers in the background. “1% inspiration and 99% perspiration” might be a cliché, but it holds very true for successful projects like Git and the Linux kernel.

GitHub and the Ruby people

Although Git garnered a fair amount of attention early on, it was still pretty niche. In January 2006, the X Window System team switched from CVS to Git. And that was already enough to wow Junio. He didn’t expect such a big project like X Window System to go through the trouble of changing version control systems.

After BitKeeper, many DVCSes started to emerge. Apart from Monotone, there were Mercurial, Darcs, Bazaar, Arch, and Fossil. The most notable among them was Mercurial, created by Olivia Mackall. It was released just a few days after Git, with more complete features and a more user-friendly interface. It also had backing from Google Code and BitBucket. The version control market was like the Wild West, with each system holding its own.

What truly pushed Git to the top and made it mainstream was GitHub. Or, as Linus put it, it’s the Ruby people, strange people, who made Git an overnight success.

In February 2007, Git 1.5 was released, finally making Git more usable for normal people. At the time, Git was the hot new thing being talked about at Ruby meetups in San Francisco. Tom Preston-Werner, co-founder of GitHub, first heard about Git from his colleague Dave Fayram. Tom considered Dave to be the “patient zero” for the spread of Git adoption in the Ruby community.

Despite Git’s popularity in the Ruby community, the only Git hosting service available at the time was Petr Baudis’s repo.or.cz, which was quite basic. For example, your code had to be public with no option for private repositories. Tom saw a big opportunity there.

In 2007, social media was booming with Facebook, YouTube, and Twitter leading the way. Tom came up with an idea called GitHub: a social media hub for programmers for sharing Git repositories and exchanging ideas.

One day in October 2007, Tom met Chris Wanstrath at a sports bar in San Francisco. They had met before at Ruby meetups but didn’t know each other well. Tom struck up a conversation and shared his GitHub idea. Chris found it interesting and agreed to join.

At that time, both Tom and Chris had full-time jobs, so they spent their evenings and Saturdays building GitHub. Tom designed the user interface and used a Ruby library called Grit to interact with Git repositories, while Chris built the website with Ruby on Rails.

Three months later, they started sending out invites to friends to test GitHub. In February 2008, PJ Hyett joined as the third co-founder. On April 10, GitHub officially launched with the tagline “Social Code Hosting.”

GitHub, August 2008
GitHub, August 2008

Rails, the killer app for Ruby, switched from Subversion to GitHub just before GitHub’s launch, giving Git an even bigger boost in the Ruby community. Most people writing Ruby at the time were developing Rails applications. When they saw their go-to framework using GitHub, more Ruby developers followed suit.

Merging Git and GitHub with conflict

Scott Chacon wasn’t your typical Git + Ruby guy. Besides writing Ruby code, he was an excellent speaker, writer, and evangelist. He created videos, wrote documentation, and taught people how to use Git. He also had an in-depth understanding of Git internals and wrote an ebook called “Git Internals.”

For three years, the “official” homepage for Git had been git.or.cz, set up by Petr Baudis in 2005. Scott wanted to create a more user-friendly homepage for beginners. In July 2008, he reorganized the contents from git.or.cz and launched a new homepage, git-scm.com. He then asked for feedback from Git core developers (especially Petr) on the Git mailing list.

While Git had been popular in the Ruby community for a while, Ruby people were rarely seen on the Git mailing list. Most Git core developers were experienced C programmers active on the mailing list, while the Ruby crowd, mostly younger web developers, hung out at meetups, web forums, and GitHub, and probably never used a mailing list in their lives. The two groups didn’t interact much, and Scott’s post about git-scm.com on the Git mailing list was one of the few early interactions between them.

Another issue for Git core developers was that Tom, without any discussion, forked and customized the Git daemon using Erlang to meet GitHub’s own needs. This was because, first, Tom wasn’t familiar with C, and second, posting on the mailing list was terrifying. The list was full of people smarter than you, and if you didn’t format your messages correctly, you’d look like an idiot. The process was just too slow, so Tom decided to handle it himself.

Git-scm.com had a banner saying “hosting donated by GitHub,” which led some to question Scott’s motives. Some expressed dissatisfaction that GitHub was making money off Git while core Git developers didn’t see a share. However, most of the feedback was positive, and eventually, git-scm.com became the official Git homepage, with git.or.cz retiring.

Tom met Scott at a Ruby meetup. He thought, “This guy could become either a powerful ally or a dangerous foe.” In October 2008, Scott joined GitHub, continuing his mission to spread the word about Git. He wrote more documentation, offered consulting services, and taught companies how to use Git. He also wrote the book “Pro Git,” which became the official Git book. GitHub’s evangelism strategy worked perfectly, expanding Git’s reach beyond the Ruby community. And GitHub itself was the biggest beneficiary.

In October 2008, Google sponsored the first GitTogether conference. About 20 people from both the Git and GitHub teams met at Google’s headquarters in Mountain View. They put aside their differences, knowing that only by working together could they all become stronger.

GitTogether 2008
GitTogether 2008

Epilogue

Unable to compete with Git and GitHub, BitKeeper eventually had to exit the market. In 2016, the team open sourced its code. This grandparent of DVCS, which inspired Git, Mercurial, Monotone, and others, is now a piece of history for people to observe and study. When asked for his thoughts, Larry McVoy responded:

Hind sight is 20-20. The BitKeeper business had a good run, we were around for 18 years. It made enough that I and my business guy are retired off of what we made.

On the other hand, we didn’t make enough for everyone to retire if they wanted to. We had a github like offering and it’s pretty clear that we should have put a bunch of money into that and open sourced BitKeeper.

All I can say is it is incredibly hard to make that choice when you have something that is paying the bills.

Shoulda, coulda, woulda, my biggest regret is not money, it is that Git is such an awful excuse for an SCM. It drives me nuts that the model is a tarball server. Even Linus has admitted to me that it’s a crappy design. It does what he wants, but what he wants is not what the world should want.

Now, Larry enjoys his retirement. He likes to spend his time fishing with his kids.

In a 2022 survey by Stack Overflow, Git had a market share of 94%, so much so that the following year, Stack Overflow stopped asking which version control system people used.

Never in history has a version control system dominated the market like Git. What will be the next to replace Git? Many say it might be related to AI, but no one can say for sure. What we can be sure is that the transition will likely involve a series of occasional events and a group of talented hackers.

References

Other than the links in the article, additional references are listed below:

  1. Just for Fun: The Story of an Accidental Revolutionary” by Linus Torvalds and David Diamond, 2002
  2. Show HN: BitKeeper – Enterprise-ready version control, now open-source” on Hacker News
  3. Larry McVoy Interview with KernelTrap
  4. Larry McVoy Interview with LinuxWorld
  5. Linuxexpo 1999: Day 2: BitKeeper
  6. Hierarchy, Laboratory and Collective: Unveiling Linux as Innovation, Machination and Constitution” by Tony Cornford, Maha Shaikh, and Claudio Ciborra (2010)
  7. How Tridge reverse engineered BitKeeper” on LWN.net
  8. Tridgell drops Bitkeeper bombshell” on The Register
  9. No More Free BitKeeper” on KernelTrap
  10. The kernel and BitKeeper part ways” on LWN.net
  11. not rocket science (the story of montone and bors)”, Graydon Hoare on LiveJournal
  12. A Git Origin Story” by Zack Brown
  13. 10 Years of Git: An Interview with Git Creator Linus Torvalds” from the Linux Foundation
  14. Petr Baudis on LinkedIn
  15. Junio Hamano on LinkedIn
  16. Version Control Shenanigans”, Bram Cohen on LiveJournal
  17. Gitメンテナ 濱野 純”, 技術評論社
  18. How I Turned Down $300,000 from Microsoft to go Full-Time on GitHub” by Tom Preston-Werner
  19. Replacing Git with Git featuring Scott Chacon, the Changelog podcast
  20. GitTogether Group Photo, Junio Hamano on LiveJournal
  21. Larry McVoy’s resume

Discussion