PS: Anyone who's thinking of trolling on this thread, atleast please waste some of your useless time reading what is actually posted here before wasting anymore time.
Con Kolivas is a prominent developer on the Linux kernel and strong proponent of Linux on the desktop. But recently, he left it all behind. Why?
In this interview with APCMag.com, Con gives insightful answers exploring the nature of the hardware and software market, the problems the Linux kernel must overcome for the desktop, and why despite all this he's now left it all behind.
Read on for an honest appraisal of Linux, and why it has some way to go yet.
You wouldn't know it from seeing him online, but by day Con Kolivas works an anaesthetist at a hospital in Melbourne. Linux kernel hacking is just one of his hobbies.
Despite this, Con is one of the most well known names in Linux kernel development -- and with good reason. His focus on improving the kernel for desktop performance has won him a legion of fans, and his patchsets for the Linux kernel (marked as -ck) have had a significant impact. So much so that some of his changes have been directly incorporated into the kernel, and some of ideas inspire changes still taking place.
Recently, however, Con announced he was leaving it all behind. Interested in hearing what prompted the move I contacted Con to talk about the reasons for his leaving, what it takes to be a kernel developer, the future as he sees it.
The response I got was more than I bargained for -- in the conversation that followed, Con explored not just why he left, but also the challenges the Linux kernel must overcome as he sees it, and the very nature of the hardware and software market that led to the computing environment we have today. Whether you're a Windows user or Linux user, he makes some excellent points.
Rather than break up the Con's responses, we're publishing it as it stands. So grab a coffee, make yourself comfortable, and read on.
How Microsoft crushed innovation in PC architecture >>
APC: How did you get started developing for the Linux kernel, did you enjoy it, and what's the passion that drives you?
Normally this would be a straight forward question to answer. However at this time, I've had the opportunity to reflect on what really got me into kernel development, and I'll be able to answer it more thoroughly. So if you give me the latitude to answer it I might end up answering all your potential questions as well. This is going to be my view on personal computing history.
The very first computer I owned was 24 years ago. I've been fortunate to have been involved in the personal computer scene since not long after it actually began. In the time since then, I've watched the whole development of the PC, first with excitement at not knowing what direction it would take, then with anticipation at knowing where it would head and waiting for the developments.
In the late 1980s it was a golden era for computing. There were so many different manufacturers entering the PC market that each offered new and exciting hardware designs, unique operating system features and enormous choice and competition. Sure, they all shared lots of software and hardware design ideas but for the most part they were developing in competition with each other. It is almost frightening to recall that at that time in Australia the leading personal computer in numbers owned and purchased was the Amiga for a period.
Anyone who lived the era of the first Amiga personal computers will recall how utterly unique an approach they had to computing, and what direction and advance they took the home computer to. Since then there have been many failed attempts at resuscitating that excitement. But this is not about the Amiga, because it ultimately ended up being a failure for other reasons. My point about the Amiga was that radical hardware designs drove development and achieved things that software evolution on existing designs would not take us to.
cheap, cheerful, unique and immensely successful.
At that time the IBM personal computer and compatibles were still clunky, expensive, glorified word processing DOS machines. Owners of them always were putting in different graphics and sound cards yearly, upgrading their hardware to try and approach what was built into hardware like the Amiga and the Atari PCs.
Enter the dark era. The hardware driven computer developments failed due to poor marketing, development and a whole host of other problems. This is when the software became king, and instead of competing, all hardware was slowly being designed to yield to the software and operating system design.
We're all aware of what became the defacto operating system standard at the time. As a result there was no market whatsoever for hardware that didn't work within the framework of that operating system. As a defacto operating system did take over, all other operating system markets and competition failed one after the other and the hardware manufacturers found themselves marketing for an ever shrinking range of software rather than the other way around.
Hardware has since become subservient to the operating system. It started around 1994 and is just as true today 13 years later. Worse yet, all the hardware manufacturers slowly bought each other out, further shrinking the hardware choices. So now the hardware manufacturers just make faster and bigger versions of everything that has been done before. We're still plugging in faster CPUs, more RAM, bigger hard drives, faster graphics cards and sound cards just to service the operating system. Hardware driven innovation cannot be afforded by the market any more. There is no money in it. There will be no market for it. Computers are boring.
Enter Linux. We all know it started as a hobby. We all know it grew bigger than anyone ever imagined it. It would be fair to say that it is now one of the most important of the very few competing pieces of software/operating system that remains and drives development of the defacto standard -- Windows. However, I believe it never deserved to become this. Had the innovative hardware driven development and operating system competition continued, there is no way it would have attracted as much following, developers and time to evolve into what it has become. The hardware has barely changed in all that time. PCs are ludicrously powerful compared to what they were when Linux first booted in 1991, but that's an issue of increased speed, not increased functionality or innovation.
So what about the PC? Well the PC was 'dying' according to all accounts 20 years ago. We all know now that is crap and for the foreseeable future at least, the one all encompassing, information processing, communication (and frustration creator) is here to stay. The internet certainly has cemented that position for the PC.
Inferior but dominant:
familiar story? The technology inside the PC was inferior and poorly architected compared to other competing systems of the day, but Microsoft's clever business tactics ensured its success in the long term because it ran Windows.
So Linux was created to service the home desktop personal computer, and the PC is here to stay. For those who were looking for some excitement and enjoyment in using their computer, the defacto operating system just doesn't cut it. We want to tinker, we want control, we want power over everything. Or alternatively we believe in some sort of freedom or some combination of the above. So we use Linux. That is certainly how I got involved in Linux; I wanted something to use on the home desktop PC.
However, the desktop PC is crap. It's rubbish. The experience is so bloated and slowed down in all the things that matter to us. We all own computers today that were considered supercomputers 10 years ago. 10 years ago we owned supercomputers of 20 years ago.. and so on. So why on earth is everything so slow? If they're exponentially faster why does it take longer than ever for our computers to start, for the applications to start and so on? Sure, when they get down to the pure number crunching they're amazing (just encode a video and be amazed). But in everything else they must be unbelievably slower than ever.
Computers of today may be 1,000 times faster than they were a decade ago, yet the things that matter are slower.
The standard argument people give me in response is 'but they do such more these days it isn't a fair comparison'. Well, they're 10 times slower despite being 1000 times faster, so they must be doing 10,000 times as many things. Clearly the 10,000 times more things they're doing are all in the wrong place.
APC: So, the performance problems of Linux on the desktop became a key motivator for you?
Yes. I started to tinker with improving Linux on the desktop. "Surely if I have complete control over all the software I'll be able to speed things up," I thought. There must be a way to tweak this, tune that, optimise this and get more speedups? Userspace improvements seemed so limited when I got started in Linux. It barely worked on the desktop half the time so trying to get speed out of it on top would mean that nothing worked. The UNIX legacy was evident. We were shaping an operating system never designed for the desktop and it was going to hurt... a lot.
Eventually the only places I noticed any improvements in speed were kernel developments. They were never huge, but caused slightly noticeable changes in things like snappiness, behaviour under CPU load and so on. The first patchset I released to the public contained none of my own code and was for kernel 2.4.18 which was about February 2002. I didn't even know what C code looked like back then, having never actually been formally taught any computer science.
So I stuck with that for a while until the 2.6 development process was under way (we were still in a 2.5 kernel at the time). I watched the development and to be honest... I was horrified. The names of all the kernel hackers I had come to respect and observe were all frantically working away on this new and improved kernel and pretty much everyone was working on all this enterprise crap that a desktop cares not about.
Even worse than that, while I obviously like to see Linux run on 1024 CPUs and 1000 hard drives, I loathe the fact that to implement that we have to kill performance on the desktop. What's that? Kill performance? Yes, that's what I mean.
If we numerically quantify it with all the known measurable quantities, performance is better than ever. Yet all it took was to start up an audio application and wonder why on earth if you breathed on it the audio would skip. Skip! Jigabazillion bagigamaherz of CPU and we couldn't play audio?
Or click on a window and drag it across the screen and it would spit and stutter in starts and bursts. Or write one large file to disk and find that the mouse cursor would move and everything else on the desktop would be dead without refreshing for a minute. I felt like crying.
I even recall one bug report we tried to submit about this and one developer said he couldn't reproduce the problem on his quad-CPU 4GB RAM machine with 4 striped RAID array disks... think about the sort of hardware the average user would have had four years ago. Is it any wonder the desktop sucked so much?
The developers were all developing for something that wasn't the desktop. They had all been employed by big name manufacturers who couldn't care less about the desktop (and still don't) but want their last 1% on their database benchmark or throughput benchmark or whatever.
Linux had won. We were now the biggest competition in the server and database market out there and all the big names cared about Linux. Money was pouring into development from all these big names into developing Linux's performance in these areas.
The users had lost. The desktop PC, for which linux started out as being development for, had fallen by the wayside. Performance, as home desktop users understand performance, was gone. Worse yet, there was no way to quantify it, and the developers couldn't care if we couldn't prove it. The one place I found some performance was to be gained on the desktop (the kernel) was now doing the opposite.
What Con tried to do to improve Linux performance on the desktop >>
Con at the Kernel Summit in Ottawa, Canada, 2005
APC: So what did you do next to try to convince the Linux kernel devs of the need for more focus on end-users?
I set out to invent some benchmark to try and quantify performance problems in Linux on the desktop PC. The first benchmark I created called 'Contest' (pun intended). It was horribly complicated to use and set up and the results were difficult to interpret but at least I tried. It did help. Some changes did come about as a result of these benchmarks, but mostly on the disk I/O front. So I was still left looking at this stuttering CPU scheduling performance.
I had some experience at merging patches from previous kernels and ironically most of them were code around the CPU scheduler. Although I'd never learnt how to program, looking at the code it eventually started making sense.
I was left pointing out to people what I thought the problem was from looking at that particular code. After ranting and raving and saying what I thought the problem was, I figured I'd just start tinkering myself and try and tune the thing myself.
After a few failed experiments I started writing some code which helped... a lot. As it turns out people did pay attention and eventually my code got incorporated. I was never very happy with how the CPU scheduler tackled interactivity but at least it was now usable on the desktop.
Not being happy with how the actual underlying mechanism worked I set out to redesign that myself from scratch, and the second generation of the -ck patchset was born. This time it was mostly my own code. So this is the story of how I started writing my own code for the linux kernel.
In brief, following this I found myself writing code which many many desktop users found helped, but I had no way to quantify these changes. There is always a placebo element which makes the end user experience difficult to clarify as to whether there is an improvement or not.
However I have tried very hard to make myself relatively resistant to this placebo effect myself from years of testing, and I guess the fact that my website has close to 1 million hits suggests there are people who agree it makes a difference. This inability to prove quantitatively the advantage that -ck offered, though, was basically what would eventually spell out the death of it.
APC: What code did get incorporated into the mainline kernel?
Looking back, while the patchset has been well known, little of the actual code itself ended up in the mainline kernel, even if it spurned interest and development from other people in that time. Most of what did end up going in were changes to the CPU scheduler to improve interactivity, fairness, SMP user fairness, making 'nice' behave itself, hyperthread fairness and so on. There were lots of other minor contributions in other areas, such as the virtual memory subsystem, software suspend, disk i/o scheduling and random bugfixes.
The emphasis of what I did get involved in is still the area that there are many -ck patches in the last release I ever made (2.6.22-ck1). The remaining patches reflect where I still believe massive problems exist for the desktop, and ironically, despite my efforts to the contrary, the same problems seem to get magnified with each passing year.
It seems that the emerging challenges for the linux kernel on the desktop never seem to get whole-heartedly tackled by any full time developer, and only get a sideways glance when the problems are so obvious that even those on the linux kernel mailing list are willing to complain about them.
APC: Was there a toll on you for this voluntary involvement?
Yes. There is no doubt that whenever I was heavily involved, I would stay awake at night thinking up code solutions, I would be sleep deprived, and it had the possibility to impact on my work and family life. I tried very hard to never compromise those two for the sake of kernel development, but perhaps on occasion I did push it to far. I make an issue of never regretting anything I have done so I'm still pleased with my involvement till now.
APC: What was the driving force that caused you to hangup your kernel coding keyboard and the -ck patchset?
That's one of those 'I wish I had a dime for ever time I was asked that' sort of questions.
As most people are well aware, my involvement in kernel development was motivated on three fronts, and it has zero to do with my full time career and life.
First, it was lots of fun. I've always been a computer geek and enjoyed spending hours in front of the computer doing... well whatever really. So spending it on something that had become a passion for me was an obvious source of great enjoyment for me.
Second, it was an intellectual challenge. There seemed to be things that I always wish were done in the Linux kernel, and there were issues people didn't really care to tackle and so on. Trying to confront them head-on I was doing something I really hadn't done before in terms of high level programming. I learnt a heck of a lot about operating system design and all other really basic computer science things that most CS students probably are bored with.
However it was all new to me. There is precious little (and some would argue zero) new research in operating system design. Looking at linux there is innovation in the approach to tackling things, but it's not like there's anything new about the problems being fixed. The same issues have always been there in hardware vs software designs, and all the research has been done. I've seen many people accuse me of claiming I invented fair scheduling. Let me set the record straight. I make no such ridiculous claim.
What I did was take what was known and try and apply it to the constraints of current hardware designs and the Linux kernel framework. While the 'global picture' of the problems and solutions have been well known for many years, actually narrowing down on where the acute differences in hardware vs software have evolved has not. Innovation only lies in the application of those ideas. Academic approaches to solutions tend not to be useful in the real world. On the flip side, pure hackery also tend to be long term disasters even if initially they're a quick fix. Finding the right balance between hackery and not ignoring the many years of academic research is what is needed.
APC: Did you get that balance right?
Heck no. But that's what I strived for. I think there are precious few developers who do. If there is one failing in the human driven decision making in choosing what code goes where it is that it is impossible to recognise that (again I'm not saying it was my code).
Third, it was an ego trip. If I improved something that I cared about, I found that usually lots of other people cared about it. The reason for that obviously is that I was actually an ordinary desktop PC user that pretended to be a kernel hacker. So whenever I made improvements that affected desktop users, lo and behold lots of people cared about them. I recall a kernel hacker that I respected very much joked about my 'fans' and asking how he could attract a crowd. He has contributed possibly 1000 times more lines of code to linux kernel than myself, yet I had a 'following'.
I only attracted a following because I hacked on things I cared about. And the users told me so. The one thing that drove my development over the many years were the 'thank you's that I got from the users of the -ck patchset.
So I still haven't answered your question about what made me stop kernel development have I? I guess explaining my motivations helps me explain why I stopped.
The user response was still there. If anything, the users got more vocal than ever as I was announcing quitting kernel development.
The intellectual challenge? Well that still existed of course.
The fun? Yes that's what was killed. It stopped being fun. In my now quite public email announcing that I was quitting I explained it briefly. The -ck patchset was for quite a while, a meaningless, out of mainline's spotlight playground for my experiments. As the scope of changes got larger, the improvements became more drastic and were more acutely noticeable. This led to more and more people asking me when the changes would me merged into mainline. As the number of requests grew, my resolve to not get mainline involved diminished. So I'd occasionally post patches as examples to the linux kernel mailing list. This generated more interest and requests to get mainline involved. So I tried.
You asked before what patches from -ck got into mainline and I listed a whole lot of random minor patches. The magnitude of the changes in the patches that did _not_ get involved stands out far more than those that did get in.
APC: was there something that was the 'final straw' for you?
My first major rejection was the original staircase CPU scheduler. It stood out as being far better in interactivity than the mainline CPU scheduler, but ultimately, just as the mainline CPU scheduler, it had corner cases that meant it was not perfect. While I was still developing it, the attention moved away from the CPU scheduler at that time. The reason given by Andrew Morton (the maintainer and second last gateway into the mainline kernel) at the time was that the kernel had more burning issues and bugs to address.
Of course it did. There were so many subsystems being repeatedly rewritten that there was never-ending breakage. And rewriting working subsystems and breaking them is far more important than something that might improve the desktop right? Oops, some of my bitterness crept in there. I'll try and keep emotion out and just tell the rest of the story as objectively as I can.
The main problem was that there simply was not a convincing way to prove that staircase was better on the desktop. User reports were not enough. There was no benchmark. There was no way to prove it was better, and the user reports if anything just angered the kernel maintainers further for their lack of objectivity. I even tried writing a benchmark called Interbench. Note that interactivity is not responsiveness (I have a little summary of the difference as I see it with the Interbench code). This was much better code than Contest but even though I could demonstrate advantage with my scheduler on Interbench, the magnitude of the differences was difficult if not impossible to elicit based on the raw numbers generated by Interbench.
With a truckload of help from William Lee Irwin III (who wrote the main architecture) I posted a pluggable CPU scheduler framework that would allow you to build into the kernel as many of multiple CPU schedulers as you like and choose at boot time which one to run. I planned to extend that to runtime selection as well. This is much like the modular pluggable I/O scheduler framework that Linux kernel currently has. It was flat out refused by both Linus and Ingo (who is the CPU scheduler maintainer) as leading to specialisation of CPU schedulers and they both preferred there to be one CPU scheduler that was good at everything. I guess you can say the CPU scheduler is a steamroller that we as desktop users use to crack nuts with, and they didn't want us to build a nutcracker into the kernel.
The purpose of Plugsched was simply to provide a seamless way for the staircase CPU scheduler to be integrated into the mainline kernel. I didn't feel strongly about the presence of pluggable CPU schedulers, but many people still do and the code is still maintained today mainly by Peter Williams.
Then along came swap prefetch. I spent a long time maintaining and improving it. It was merged into the -mm kernel 18 months ago and I've been supporting it since. Andrew to this day remains unconvinced it helps and that it 'might' have negative consequences elsewhere. No bug report or performance complaint has been forthcoming in the last 9 months. I even wrote a benchmark that showed how it worked, which managed to quantify it! In a hilarious turnaround Linus asked me offlist 'yeah but does it really help'. Well, user reports and benchmarks weren't enough... It's still in limbo now but they'll have no choice but to drop it without anyone maintaining it.
Finally the nail in the coffin. The Staircase Deadline CPU scheduler. Initially started as a side project from the Staircase CPU scheduler I soon realised that it was possible to have excellent interactivity while fixing the horrible fairness issues that an unfair design had. Furthermore, it actually improved interactivity issues elsewhere that ended up being fairness problems, and fairness is of course paramount to servers and multiuser environments.
A lot of users and even kernel developers found that many long lasting complaints with the mainline and other schedulers were fixed by this code and practically begged me to push it into mainline, and one user
demanded Linus merge it as soon as possible as a bugfix. So I supported the code and fixed it as problems arose and did many bugfixes and improvements along the way.
Then I hit an impasse. One very vocal user found that the unfair behaviour in the mainline scheduler was something he came to expect. A flamewar of sorts erupted at the time, because to fix 100% of the problems with the CPU scheduler we had to sacrifice interactivity on some workloads. It wasn't a dramatic loss of interactivity, but it was definitely there. Rather than use 'nice' to proportion CPU according to where the user told the operating system it should be, the user believed it was the kernel's responsibility to guess. As it turns out, it is the fact that guessing means that no matter how hard and how smart you make the CPU scheduler, it will get it wrong some of the time. The more it tries to guess, the worse will be the corner cases of misbehaving.
The option is to throttle the guessing, or not guess at all. The former option means you have a CPU scheduler which is difficult to model, and the behaviour is right 95% of the time and ebbs and flows in its metering out of CPU and latency. The latter option means there is no guessing and the behaviour is correct 100% of the time... it only gives what you tell it to give. It seemed so absurdly clear to me, given that interactivity mostly was better anyway with the fair approach, yet the maintainers demanded I address this as a problem with the new design. I refused. I insisted that we had to compromise a small amount to gain a heck of a great deal more. A scheduler that was deterministic and predictable and still interactive is a much better option long term than the hack after hack approach we were maintaining.
Then I became very unwell. A disc prolapse in my neck basically meant I need to lie flat on my back for about 6 weeks. Yes, kernel development did contribute to this problem. So I was drugged to the eyeballs and spent most of the day and night lying down... having nothing to think about but stew over the kernel. Whenever I could I snuck back on the PC to try and support the code I had thrown out there. I refused to budge on the fairness aspect, and I kept getting that thrown back in my face as an unfixable failure in the design.
Then one day presumably Ingo decided it was a good idea and the way forward and... wrote his own fair scheduling interactive design with a modular almost pluggable CPU scheduling framework... and had help with the code from the person who refused to accept fair behaviour in my flamewar.
So I had plenty of time lying on my back to reflect on what I was doing and why, and whether I was going to regret it from that point on. I decided to complete the work on Staircase Deadline to make sure it was the reference for comparison, instead of having the CPU scheduler maintainer's new code comparing to the old clunky scheduler. Then I quit forever.
APC: Are developer egos a problem in the open source development model in general?
I think any problem with any development model has multiple factors, and ultimately, it is humans that make decisions. I won't comment on the humans themselves.
If there is any one big problem with kernel development and Linux it is the complete disconnection of the development process from normal users. You know, the ones who constitute 99.9% of the Linux user base.
The Linux kernel mailing list is the way to communicate with the kernel developers. To put it mildly, the Linux kernel mailing list (lkml) is about as scary a communication forum as they come. Most people are absolutely terrified of mailing the list lest they get flamed for their inexperience, an inappropriate bug report, being stupid or whatever. And for the most part they're absolutely right. There is no friendly way to communicate normal users' issues that are kernel related. Yes of course the kernel developers are fun loving, happy-go-lucky friendly people. Just look at any interview with Linus and see how he views himself.
I think the kernel developers at large haven't got the faintest idea just how big the problems in userspace are. It is a very small brave minority that are happy to post to lkml, and I keep getting users telling me on IRC, in person, and via my own mailing list, what their problems are. And they've even become fearful of me, even though I've never viewed myself as a real kernel developer.
Just trawl the normal support forums (which I did for Gentoo users as a way of finding bug reports often because the users were afraid to tell me) and see how many obvious kernel related issues there are. I'd love to tell them all to suddenly flood lkml with their reports of failed boots with various kernels, hardware disappearing, stopping working suddenly, memory disappearing, trying to use software suspend and having your balls blown off by your laptop, and so on.
And there are all the obvious bug reports. They're afraid to mention these. How scary do you think it is to say 'my Firefox tabs open slowly since the last CPU scheduler upgrade'? To top it all off, the enterprise users are the opposite. Just watch each kernel release and see how quickly some $bullshit_benchmark degraded by .1% with patch $Y gets reported. See also how quickly it gets attended to.
APC: What are you doing in your spare time now, and what future projects do you have planned?
I had some small userspace toys that I was experimenting with (compression tools, video encoding/conversion tools and a few others) that I thought I'd get back to. However just looking at any code gives me a bad taste in the mouth so that's not happening.
My current passion is learning Japanese. It's fun, it's a huge intellectual challenge, will allow me to eventually understand lots of interesting Japanese media in its native language (short stories, movies, manga and anime) and comes with no strings attached. I never really know what hobby I'll end up taking up but I usually get completely engrossed in whatever it was.
Thankyou for your time Con.