Intro
Just having served on the OSDI '12 PC (and indeed, the last three OSDI PCs, including OSDI '10 where Brad Chen and I were co-chairs), I thought I might share a few thoughts and some data on the reviewing process. There has certainly been a fair amount of belly-aching over the topic of reviewing the past few years; unlike these fine missives, I rather come to (mostly) praise reviewing and not to bury it. And while I think reviewing in systems is generally quite well done, there are a few new issues that PCs are dealing with these days w.r.t. round-based reviewing, and thus I wanted to explore how such round management should be done a little more carefully.Background
Some background first. Years ago (at least, as sage veterans such as Hank Levy tell me), each member of the SOSP program committee would read every paper that was submitted. Let's go over that again: they would each read every paper! Amazing. I can only imagine a PC meeting where literally every person in the room was relatively well-informed about the paper under discussion. Contrast this to some PCs (not in systems!) where three people are talking and everyone else is checking their email. Or tweeting. Or even worse, tweeting about the PC meeting and how boring it is!
Of course, times have changed. As Schneider and Birman point out, an increasing load is being placed upon systems committees, particularly at flagship conferences such as SOSP and OSDI. OSDI, for example, now regularly receives over 200 submissions; in contrast, SOSP in '01 received only 87. Clearly, PCs have to be run differently to handle such a large increase in submissions (many of them of high quality).
Round-Based Reviewing
The major change that has arisen to handle submission overload is the presence of a system based on rounds. Basically, assuming a simple three-round review process, this works as follows. In the first round, each paper gets a few reviews (e.g., three). The PC chairs then look through these reviews and decide which papers to drop, based mostly on numerical scoring and some online discussion. The same process takes place again in a second round, with papers receiving an additional set of reviews, and get ranked again; some more papers are culled from the pool. Finally, papers that reach the third round get a few more reviews; some of these too will be dropped based on scores and online discussion, and the remaining papers make the PC-meeting discussion*. Once a paper makes it to the meeting, it has a real chance of being accepted into the conference; each paper is usually discussed at some length, and either by vote or consensus accepted or rejected.
A natural question arises out of this round-based system: what should the criteria be in order to accept or reject a paper in an earlier round of the conference? To gain some numerical insight into this query, I collected the following data from the OSDI '10 submissions**. The basic question the graphs try to answer is: what are the chances that a paper that ended up in the final PC discussion (or was accepted into the conference) would be rejected by this round-based system in Round 1?
Some Actual Data
I present the results across three graphs. In the first graph, I take all 80-odd papers that made the discussion (including the 32 that were accepted) and plot the chances that the paper would survive Round 1 of reviewing assuming only a single person of three first-round reviewers have to like the paper in order for it to move to Round 2. The second graph requires a more stringent criterion be met: two of three reviewers must have liked the paper in order for it to advance. Finally, the third graph plots the case where all three reviewers must like the paper to advance it to Round 2. In all cases, "like" is defined as giving the rating 3 out of 5 or higher; 3 is a "weak accept" (and 2 a "weak reject"), and thus where the line is drawn. One additional note: each accepted paper is shown in light blue; each paper that was discussed but ultimately rejected is shown in orange.
The way I calculate the odds of surviving Round 1 reviews is simple: each paper has N reviews (in this case, usually seven); I then take all combinations of three reviews (i.e., N choose 3) and in each group of three, I compute whether at least one reviewer (or two, or three, depending on the graph) was positive, which I call promotion to Round 2; finally, I total the number of promotions and divide by the total number of combinations. The resulting promotion percentage, which is plotted along the y-axis, answers this simple question: given a random set of three reviews (chosen from the final seven or eight), how likely was the paper to have survived Round 1?
The graphs tell an interesting story. I think the high-level take-away is this: in a first-round review, if even a single person likes the paper, it should merit promotion to Round 2. This conclusion is derived from the first two graphs. In the first, you can see that even the lower-ranked papers that got accepted (light blue and towards the left) very likely have a single reviewer that liked their paper out of any group of three. In the second graph, you can see that a more stringent requirement, that two of three reviewers must like a paper for it to be promoted to Round 2, leads to many discussed/accepted papers being rejected in Round 1. For example, 3 out of the final 32 papers accepted have less than a 40% of making it past the first round! (And though I can't share intimate details about these papers, suffice it to say these were good papers, now more-or-less universally liked).
Limitations
Of course, there are some limitations to this study. First of all, all reviews are not done in the same way, especially as the review process goes on. What you might have marked a weak accept early in the review process you might later mark as a reject because you know the stakes are higher and that the group of papers is getting narrowed down. The study is also limited to just a single OSDI; doing it across more conferences would be interesting. The study also only examines papers that made the discussion, and thus cannot comment on the other papers submitted to OSDI that year. Finally, I don't take into account that more lenient promotion criteria increases reviewing burden; perhaps this is a topic for a future post. There are many other limitations of this study, which you can figure out for yourself.
Summary
Over the past years, I've seen one constant present in the review cycle which truly impresses me: when a large number of PC members read and review a paper (and in systems, papers that make the final round often end up with 6, 9, or even more reviews; we've gotten 12 a few times!), the PC generally does a good job of understanding what the paper is about, including its merits and flaws. Though there is sometimes disagreement (e.g., on the scope of the contribution, or on the final verdict itself), the overall discussion process is usually I think well done: I certainly just saw a number of wonderful, careful discussions take place during the OSDI '12 PC, led by Amin Vahdat and Chandu Thekkath. Discussions of papers by many well-regarded researchers is actually a solid and robust process, of which the systems community should be proud. If only we could discuss each paper in this way!
However, due to the load presented to small-ish PCs these days, the round-based system has become commonplace, and thus raises the question of how to decide on promotion. The initial study here shows that PC chairs should be lenient early on in the process: a single positive voice (out of three) should be enough to merit promotion. Not following this guideline will lead to a number of worthy papers getting rejected early on, thus not making the discussion, and thus missing out on the opportunity for the true deep inspection that transpires during the PC meeting itself.
From the perspective of the author, another view of this same result: if someone liked your paper but you still got rejected in the first round, don't feel too badly. It might just be the case that the PC made a mistake.
And finally, congratulations! You're the only person to have read this far, except for my dad, and he kind of had to, because, you know***.
Footnotes
* Getting your paper to the PC meeting is a big accomplishment! You should be proud if your paper got discussed. But perhaps that doesn't remove the sting of rejection, if your paper did get a lot of reviews but still was turned down. Sorry about that; such is the life of the researcher.
** I chose OSDI '10 because, as co-chair, I had access to many more reviews than I do in a typical PC meeting (where I typically have conflicts with many papers).
*** This is not true; not even my dad reads this blog!



Very interesting and encouraging article.
ReplyDeleteAlso, to be of more consolation to submitters with rejected papers, many reputed conferences have other processes to ensure submissions get their deserved notice ( e.g. independent shadow PCs with students/researchers at several places whose aggregated results are cross checked with the real PC outcome )
This comment has been removed by the author.
ReplyDeleteRemzi, I'm surprised there haven't been any comments on this blog. Having seen lots of rejections so far in my grad school career and the appalling quality of most of the reviews, I am fairly convinced the round-based system is broken. In fact, even a two round system is broken where the first round consists of some 5 reviewers.
DeleteWhile the program committees seem to ,consciously or unconsciously, keep acceptance rates below 20% which is the case with most top systems conferences, many good papers become casualties of a review process where there is no accountability for those 5 reviewers in a 2 round or 3 reviewers in a 3 round system). 10 years ago, with every PC member reading each and every paper, accountability would've been a moot point because everybody read the paper. Today, there are silent rejections and a good paper may actually get rejected again in a resubmission even if the authors put a lot of effort in it. On top of that , today , we get reviewers who prefer to remain 'on the fence' yet marking themselves as knowledgable in the field. Then you have reviewers who are simply positive by nature and then there are the ones for whom even Einstein's work would've been incremental. Atrocious !
With increasing number of submissions and not commensurately large PCs - and that's understandable because it's hard to get people to sign up as a member to begin with - why don't PC chairs instruct PC members to loosen up a bit ? By instructing I mean to say, explicitly instructing them in no uncertain terms.I'm seconding your point about laxing the criteria for the first round here. Moreover it should be the responsibility of the PC chair to ensure that everybody loosens up and not just half of the members while the other half still operating in the binary weak-reject/reject mode.
Good post. It should have had lots of comments in the past 365 days.