Sunday, May 12, 2013

The Systems "Top 50"


The computer architecture community, for some time, has maintained a "Hall of Fame", which counts the number of papers published in ISCA by various authors. It is neat to see who has been active and for how long; the only problem, of course, is that the list is just architects, and the conference is ISCA.

Yale Patt: #1 in ISCA
(followed by a bunch of Wisconsin colleagues)

Thus I decided the systems community should do the same for our top conferences, SOSP and OSDI. Thanks to DBLP, this is mostly easy; a few python scripts later, I had some results. And here they are!

The Systems "Top 50"
If you'd like, click on these for a download: jpg PDF txt.

The figure is rather easy to read: it is an ordered list of researchers with the most-to-least publications in SOSP/OSDI (total), grouped into equivalence classes. The number of publications is listed on a year-by-year basis across the horizontal (x) direction; OSDI years are shaded gray (and thus you can see OSDI only started in 1994). Also marked are the Turing award winners (T) and weiser award winners (W) in the group.

Some fun facts to observe from this data:

  • Getting to 6 publications in SOSP/OSDI puts you in some seriously awesome company. In this group of people, there are two Turing award winners (Lampson, Liskov), virtually all Weiser award winners, and a host of other highly accomplished folks. 
  • Getting to #1 will be hard, thanks to one M. Frans Kaashoek, who has 24 total published papers at these venues, a notable step ahead of Hank Levy (Washington) at 16. For the past 20 years, only a few years have passed where Frans did not have a publication at OSDI/SOSP; in many years, he's had multiple publications. Most impressive! Frans also has the longest streak, at 8 (2005 through current); this will also be hard to match.
  • As for institutions, Microsoft has the most people on the list, with 8. Note that this is a bit misleading, as many of the people on the Microsoft list did most of their work elsewhere (e.g, Lampson). However, it is still an impressive group!
  • As for Universities, the number of people on this "top 50" list goes as follows:
    • #1. MIT (6)
    • #2. Stanford (5)
    • #3. Washington (4)
    • #4. Wisconsin, Texas, UCSD (3)
    • #7. Berkeley, CMU, Columbia, MPI-SWS, Michigan, Princeton (2)
  • Thus, perhaps unsurprisingly, MIT has the highest number of folks on the list, with 6 (Kaashoek, Liskov, Morris, Zeldovich, Gifford, Balakrishnan). Stanford is #2, Washington #3, then Wisconsin (including Andrea, myself, and now Shan Lu), UCSD, and Texas are tied for #4 with three people each, and a large group is tied at #7 with 2 researchers. 
  • Publishing three at any one OSDI or SOSP is a rare feat. This prestigious "Triple Club" includes just six people: Frans Kaashoek (MIT), Roger Needham (Cambridge), YY Zhou (UCSD), Nickolai Zeldovich (MIT), Junfeng Yang (Columbia), and Greg Ganger (CMU). Also quite a group of which to be a part! 
  • The fastest climber on this list is undoubtedly Dr. Zeldovich, who went from 0 before '06 to 10 currently (I suspect a few more are coming!). What a quick climb up systems mountain! And if there is somebody who might catch Frans...
  • My esteemed wife and I (Remzi) are tied at #19. When we came to Wisconsin, we had yet to publish in either OSDI or SOSP, so we thus feel pretty good about how far we've come (indeed, there had only been three total papers with Wisconsin authors at SOSP or OSDI at that point). That said, it is still easy to think back on papers that were near misses: semantically-smart disks at OSDI '02, and D-GRAID at SOSP '03. Both got a ton of reviews (many positive) but were ultimately rejected at these fine venues, and ended up going to FAST instead (D-GRAID even won a best-paper award there). Alas, the bitter sting of rejection does lessen over the years, but does not (entirely) go away. 
  • The "Longevity Award" goes to Barbara Liskov, who first published at SOSP in 1971, and last at OSDI in 2010 (when I was co-chair with Brad Chen). Pretty amazing for someone to stay active across four decades in our field! But perhaps no surprise for this Turing award winner.
  • Finally, putting a single affiliation for authors is not always accurate; many of these people have moved around and published papers while at different institutions. Thus I simply recorded some kind of recent affiliation. Even some of these might change soon, as there is a bit of a move from the academy to industry taking place in our field. 
I also thought it might be fun to include some other statistics about publishing in our esteemed conferences. Here are some facts:

There have been 1,658 unique authors who have published at SOSP or OSDI or both; 419 of those have published twice (or more); 209 three times (or more); 112 four times (or more); 67 five times (or more); the rest is shown in the list above (the "top 50", or actually top 53). Thus even getting to four total publications puts you in a group of about 100+ people in the history of the world (though it is probably the case that early Cro-magnon tribes were not working on SOSP submissions). Below is a graph summarizing the same.


Author lists are growing. While single-author and few-author papers were common in the early days, they are increasingly rare. Here is the average author list size plotted per year:



As you can see, the average in the early days was between 1 and 2; today it is nearing 6 authors. Also included is the best-fit line of the average author data; extrapolating out, the average author list will be about 13 in the year 2100. Better start building those big systems groups! The full author list per year is also a fun chart:

This graph plots a circle proportional to the number of papers with a given author-list length for each year; also included are two solid red lines for the max and min number of authors, and the dashed line (faint) for the best-fit line as in the previous graph. As you can see, in the early years, single-author papers were common; lately, there have just been a few, and many years have no single-author papers. The max line is perhaps most outrageous, with groups of 27 (2011) and 26 (2012) collaborating to publish a paper, those being the Windows Azure and Google Spanner papers, respectively. Hard to compete with that when you have a couple of graduate students!

In any case, that's it for now; I'm sure there are more fun numbers to extract from this dataset; I'll be thinking of some myself, and of course would be happy to hear suggestions in the comments. I'd also be happy to hear about any mistakes (and will of course fix them). Finally, I'll also be doing this same analysis for some other conferences, likely starting with FAST - you can probably even guess why.

3 comments:

  1. It would be interesting to see the industry/academia split over time. And maybe the geographical spread of where the papers come from.

    If you're interested in playing with the FAST data, let me know. I've got most of it in a couple spreadsheets...

    ReplyDelete
  2. Cool stuff.

    It would also be interesting to see a list by citation count and, perhaps, citations divided by publications (quality over quantity, who's guilty of LPU dumping, etc).

    I've also long been curious to see the correlation between citation counts and best papers awards, as well as review scores and citation count. That is, how good a job are the PCs doing in predicting what's most interesting, novel, relevant, influential, etc?

    ReplyDelete
  3. Thanks for your useful publish. We are attempt to instruct that The 'Top 50". Ideal Dental practice In Houston Ideal Dental practice WITH Houston, We've sorted this problem simply by letting you find the in Houston, call up today(832)-786-4406 Ideal Dental practice in Houston, Its ones look.

    ReplyDelete