New publishing media require new reviewing media

New publishing media require new reviewing media

Jaume Gudayol
E-mail:[email protected]

Oct 15, 2002

Abstract

Changes in publishing media have rendered old `journal' method of publishing, and the associated `peer review', obsolete. In this article the author studies the `peer review' as a method for evaluating the work of a scientist. Then he makes proposals with respect to what new ways of reviewing, more fitted to the new media, should replace it.

1  Introduction

Ten years ago, the main method used by scientists to make public their discoveries was the journal article. But during the latest years, first due to e-mail and lately thanks to e-print repositories, the situation is changing. A good account of the present situation can be found in [Jac,02]. In a few years the now usual strolls to the library will be substituted by typing on the keyboards. In this close future, scientific journals (but not ordinary journals, the ones people buy and keep at home) stored in libraries are bound to be considered obsolete, and thus disappear. This will not be a great loss. In the best cases, a journal was an expensive medium produced by a college to interchange it and thus keep the college's library updated. And in other cases it was a medium produced by a third party to increase its wealth at the expenses of the government' science budget.

Journals used for entertainment purposes are likely to survive, and maybe even those journals mainly dedicated to (political) analysis. Specialised journals, or journals whose main goal is to store and transfer information, will be substituted when the same information can be found faster using other means. Does any one remember those news films that used to run in cinemas before TV?

Journals, however, perform another task apart from the communication of results; they are generally used in order to evaluate the quality of the production of a scientist. Thus the envisageable disappearance of the journals as media to communicate scientific results will force a change in the way scientific production is evaluated. There is always the possibility that the now existing journals mutate into electronic journals, but this will not last. A (maybe forced, but classical) analogy is that of the evolution of typography at the arrival of the Gutenberg era: for fifty years the types tried to reproduce handwritten letters. But then suddenly types began to change and in less than a century typists were using letters that are very close to the ones we are using now. This pattern of evolution by jumps is what is going to happen to the scientific publication and evaluation enterprise any time soon. Therefore the question is not whether or not we can keep this journals business on, but whether we can control the new ways that are going to appear and make them more fair and fitted to their purpose.

Therefore free electronic journals are dead ducks.

2  A bit of history.

The first published scientific journals were the Philosophical Transactions of the Royal Society of London and the Journal des sçavans. It is remarkable that with only two science journals, each of the two fulfilled a different goal. The Philosophical Transactions were in fact the proceedings that kept account of the meetings of the Royal Society, whereas the Journal des sçavans was more of a medium to communicate novelties of a group of philosophes. This difference meant that the former journal's purpose was mainly to state intellectual ownership on the subjects exposed by the Fellows, while the latter was meant to substitute private letters as a medium.

None of the two, nor any of the journals that appeared later, such as the Acta Eruditorum or the Journal de Trevoux (by the Jesuits), had any reviewing procedure. Obviously a Libertine would have no chance of publishing anything in the Journal de Trevoux, but he could publish freely in other places. Thus the correctness, interest and even the intellectual ownership of the ideas expressed on an article were not decided by a reviewer. What usually happened was that the article was followed by a series of replies by other members of the République des Lettres until a conclusion was reached or, more often, the authors were involved in another dispute. There were quarrels on subjects varying from the correctness of a way to compute a tangent to the question of the priority in the invention of the calculus to the relationship between reason and faith. Such polemics occupied a significant part (even half) of the space of the journals.

The Philosophical Transactions of the Royal Society of London was the first journal in establishing a review of the articles before publishing (in 1752). There were complaints on the decaying quality of the journal, so it was decided to set up a board of editors. Many journals followed this example during the next two centuries. The role of these boards of editors was not only to refuse dim articles but to convince (qualified) authors to publish there. The members of the boards sought for articles among their friends, thus reinforcing the trend of the journals to become clubs.

At the same time, subject specific journals were beginning to be published. Already during the last quarter of the 17th century many medicine-related journals were being published, although most of them were addressed at the general public. The first mathematical journals were the Archiv der reine und angewandte Mathematik (published from 1795 to 1805 by K. H. Hindenburg), and the Annales de mathématiques pures et appliquées (a.k.a. Journal de Gergonne, published by J. D. Gergonne. By the same time began to be published journals such as the Journal f�r die reine und angewandte Mathematik (a.k.a. Journal de Crelle, published by Leopold Crelle), and the Journal de math�matiques pures et appliqu�es (published by Joseph Liouville). All of these people created their journals as a means to publish their work (they sometimes had troubles to publish their work in more general journals) as well as as a means to state their views on a subject.

The willingness of a single person (for whatever reasons) has remained up to now one of the three main reasons for which a journal is created nowadays. The other two reasons are merely incidental to science creation. One of them is related to the fact that institutions that have a journal (of whatever quality) exchange copies of this journal for copies of other journals, whereas institutions not having a journal have to pay much more expensive subscription fees. This fills up libraries with piles of printed paper. On the other hand, in the last third of the 20th century commercial publishers have entered the `science journals' business. This entrance has been possible thanks to the fact that parts of the `scientific community' have been willing to: i) work for them for free (thus making the business profitable); and ii) buy their products (out of necessity). Such peculiar collaboration has to have peculiar reasons, about which we will talk later.

On the other hand, some of the journals that are created by a single person (or by a small group) disappear when that person (or group) ceases to exist. Those journals that survive to the disparition of its creator do so by being institutionalized by the institution that fostered the late editor. It is in this way that most of the non-commercial core journals of today appeared. Since the late editor is likely to have been a good researcher, they are in the hands of institutions where research has a long tradition. Thenceforth the journal represents the opinions of that well established institution.

3  Journals as a means of evaluation.

We have seen that no journal has ever been created so as to serve as a means for evaluation. In fact, scientific journals are no different from other kinds of journals. All journals (all media) are means that express certain points of view that are held by certain groups. But nowadays the number of articles a scientist has published, and the names of the journals where he/she has published, are the main criteria for the evaluation of his/her work. Few, if any, evaluation committees will ever have a look at his/her work. Instead of doing so, the committees usually rely on the decisions made by the author (which may be unwise, or simply not in a position to push his article through) and by the referee (which has the power to do whatever he pleases, likes, or is in his/her interest). Thus `peer review', as this evaluation procedure is called, is the cornerstone on which the whole building of modern science stands. Let us explain what this expression means.

The fact that most committees are not doing the task for which they where appointed is irrelevant, though. Evaluation committees are usually groups of five or at most seven persons, faced with lots of candidates with whose work they are not acquainted. Thus the task they are given would require much more work and time that what they are willing, or even able, to spend. Relying on a previous evaluation of untested quality may be ethically dubious, but the other option, that is resigning, is never considered. The fact that they are appointed to perform a task they can not fulfil is known in advance by everyone. Thus only those who accept such a fact and approve it accept the appointment in the first place. The reasons for which they accept those appointments range from ingenuity, prestige-seeking, and a sense of `that's the way it is done' inevitability (in most cases); to power-greed and an aim to push their own candidates. In any case, corruption is only a by-product of a wrong procedure.

The expression `peer review' means the following. Assume you are a scientist and you have just written an article. Besides making it public by (electronic) means, and patenting it if it applies, you have to send it to a journal. There is a list of (some) journals, ordered (we will later explain how) by a `quality index'. Using (or not) this list, you choose a journal by using one of the following two methods:


    {\bf i)} According to the quality you assume your article has, you pick a journal at the corresponding level.
    {\bf ii)} Alternatively, you choose a journal where you know that your article will end up in the hands of someone (either a member of the editorial board or a referee) who will look at it favourably.
You can use any mix of the two previous methods. Then you send it to the journal you chose. The journal's editor will either refuse it or sent it to a referee he will chose. In the second case, the referee will read your name and your paper, and he will decide whether your work deserves to be published in that journal. If your article is refused by that journal, you repeat the procedure until one journal accepts it for publication. Notice, tough, that once your article is accepted by a journal you can not send it to another journal lying higher on the list. Hence the process is not as symmetrical as the word `peer' may suggest. Another asymmetry lies in the fact that you will never formally know who your referee is. Its name will remain a secret for ever for anyone but the publisher, and so will his work.

That's the way things work in mathematics. It is the hard `peer review' version.

Humanities and social sciences, where the correctness and interest of a work is more pliable, use a softer form of this method. Several (that means two or three) referees are are used instead or a single one (but they are all chosen by the same person). Another improvement of the method is the so-called `double blind' referring. `Double blind' means that the referee does not know your name; the editor hides it to him. There have been so many arguments against the degree of `blindness' hiding your name provides that I have few to say. I can however testify that in certain cases I can guess the author of an article just after reading the title.

Praisers of the `peer review' method use to forget or minimize the ii) technique. They claim that scientist form a community of very honest people where such misbehaviour seldom, if ever, happens. In that way they seem to claim that scientists are different from the rest of the people, an ethical class of better human beings. I can not see on which grounds they sustain such opinions. I, on statistical grounds alone, would never assert that a certain large group of people has a more strict morality only because of its profession. I do not see why they should be a smaller proportion of swindlers among scientists than among shopkeepers. But nonetheless scientists have managed up to now to work essentially without any external control.

On the other hand, critics of the `peer review' method seem to think that the existence of the ii) technique is the main problem the method has. They recognize that not every scientist behaves honestly every time, and they suggest solutions to these problems (the `double blind' referring was one of them).

In my opinion, both of them miss the point. It is true that the `peer review' method of evaluation has many more honestness problems than most scientists would recognize in public. After all, scientists are human beings, with their interests, position in the community, power, and pride to care for. Moreover, the usual claim that `peer review' has worked well for centuries is completely groundless. To begin with, it was never invented to evaluate the professional careers of scientists, but to decide which articles were fit for certain journals. But what is important is that `peer review' appeared gradually and in such a way that no other evaluation method (or the no-evaluation method) was tried at the same time. Thus we do not know whether `peer review' is better than any other method or than simply no method (that would mean that each committee would set up their own criteria) at all. In fact, the method (of evaluating a person's work by the position the journals in which he published had in certain list) appeared simply because the committees that had the evaluation task were unable to judge properly that work themselves.

Even in the `perfect world' of none but honest people in which some scientists claim to live, `peer review' as it is applied would be defective. Here are some of its flaws:

4  Wild guesses about the future

To begin with, is a public discussion on how to substitute the current referring method needed? After all, some will say, the current method evolved without any planning to become not only generally accepted, but the only existing method. Moreover, since the `peer review' method began its path towards domination (that is, since 1945) the amount of published articles has increased, and so has their quality. Thus, they will conclude, we have a method that has produced good results, and the minor changes that have to be made in order to adapt it to the new media will also appear by natural evolution.

This often heard preaching is useful for believers, since it provides them some assurance that they are behaving rightly. Non believers, though, may remark that the time during which the number and assumed quality of articles has increased, and during which the `peer review' method has generalized, has been a time during which the proportion of the G.N.P. dedicated to research has increased steadily, which means that (the G.N.P. itself increasing) the increase in constant money has been impressing. Thus those who argue that the generalization of the `peer review' method' is to be accounted for the better (if so) quality of science production nowadays could on the same grounds (that is, with the same correlation) attribute this increase in quality to the number of cars the scientists own. I would however advocate for the simpler explanation that the production of scientific results is directly related to the amount of the budget dedicated to it, and leave other correlations to other factors to be revised with care. Moreover, the `peer review' method has by its own nature a peculiar property that we have already remarked: it has no quality control. Despite of the fact that many people (which at some point of their careers have been reviewers) will say that the method is essentially fair, there is no possibility of any external revision of the method. Some will claim that clearly articles that appear in core reviews are the ones that get more citations, but this is a self-fulfilling prophesy; those articles will get more citations because they appeared in a core journal (you can easily guess how it works, I am not going to go into the details here). Hence the assumption that the method works well is based only in the declarations of people that are interested in claiming so.

But let us assume that the `laissez faire' position succeeded. How would the current system evolve, when let to itself but interacting with a new environment? The future is always uncertain, as you know. But one can try to identify the the factors that may have some influence on it, and see what product would result if only those factors were present.

The first and main factor that will affect the evolution of research, and in particular that of any reviewing method, is the evolution in the amount of money dedicated to research. During the last fifty years this amount has increased nominally and as a proportion of the G.N.P., and in the U.S. is now close to a 3% of the G.N.P.. This progression is not likely to continue, and in fact 3% is about the maximum we can expect. For European countries, where social expenses take most of the budget, the mean is now 2%, and it could happen that it never gets even close to that 3%. This stabilization of the money dedicated to research will affect employment in several ways. Left alone, it would produce a corresponding stabilization on the number of researchers, the new ones substituting those who retire. But since during the last 40 years have seen a continual growth in the number or researchers, the current distribution of the ages of researchers is pyramidal. Thus for the next 20 years there will be a remarkable scarcity of jobs (this is already happening in Germany and other parts of Europe).

On the other hand, and as someone (guess who!) said, scientists want to multiply subordinates, not rivals. That is, most (but not all) scientists believe that their work will be more relevant if the number of fellow scientists that follow their research path (and thus cite them) increases. There are many ways to produce such a dissemination of one's work, but for the purposes of the present disquisition we will reduce them to two:

A combination of both methods (as far as it is possible, in particular given the difficulty of the first one) can transform a researcher into a pope in his research area. Once someone has reached such a position, he will remain there until the age of retirement. For the second method to succeed, though, there is a need to position that (direct or indirect) pupils in relevant positions. But this has never been difficult, as there have always been more positions than no one could fill with his own candidates.

Such a state of things has never been too dangerous, even tough the levels of corruption method II induces are higher than most would recognize. But the need of those semi-mediocre main researchers is mixed with a non-increasing staff, the situation becomes explosive. A researcher in need to apply method II in a stagnant staff, and knowing that the blindness of the reviewing method gives him power without accountability, can void a whole research area.

Thus the `natural evolution' of the peer review method will take place in a quite unhealthy environment. This will make a difference with `peer review' itself, which evolved in the middle of the increase of resources taking place after World War II.

When people talk of electronic publishing and the future of the peer review method there is a point that is usually missed. Everyone seems to consider the article as the research unit and the journal as the publication unit. From the point of view of the researcher as producer, though, the article is usually a subunit of the set of researches that he has made on a certain subject. And for the researcher as consumer the journal is a subunit of the library, which is the real unit of information. However, the researcher/producer is likely to keep on producing articles (although with a modified appearance that profits from the new technologies) for as long as he is being evaluated in terms of the quantity and quality of his articles. Thus the ways in which the article as unit may evolve depend too much on the ways evaluation evolves to be considered here. The evolution of the journal as unit if left to itself (that is, with a political will to change it) is easier to guess.

As we have seen, the first scientific journals appeared with a clear intention of providing the reader all the (new) scientific information that was available in the journal's language. Later, when subject specific journals began to be published, they also had the same aim, even if their scope was restricted to an specific area. An issue of a journal has, however, obvious physical (size) limits. It has to be handleable. Those physical limits made it necessary that there were several journals sharing not only the same area of knowledge but also the same views and purposes. Unbound by size, any journal could grow until becoming a library in itself. Even nowadays, some journals have several series with distinct `quality levels'. Similarly, the American Mathematical Society maintains three journals with a clear gradation of quality. In the new technological context, and once the need of many institutions to maintain a journal disappears, concentration of journals is the most likely outcome. This is not surprising, since power has this nasty trend of concentrating, when it is not submitted to external control. In the end, each area of research would have at most three or four journals, and areas in which there is little discrepancy among the ruling popes could have a single journal. Each of these journals would have several series (from A to Z) providing different `quality levels'. An author would submit his paper to one of these series, but the (anonymous) referee would, whenever he pleases, suggest/recommend the submission to a different series. In such a way, the researchers would have even less control than at present on the qualification of their work. Moreover, once the (reasonable) claims of arbitrariness began to appear, the standard procedure would be to create a board of editors for the different series that would take in charge all the evaluation of the articles submitted to that journal. But as it is always the case with large boards, the real decision power would in fact be held by a very small group. The situation would be even worse in some sense (but not necessarily on the whole) in those areas in which the leading journal would be in the hands of a private company.

If you are a scientist, I would like you to make yourself two questions. Of all the popes in your area, which one is the most power-greedy? Hence, who is more likely to become the editor in chief of the leading journal in your area? Scary, isn't it?

5  Possible alternatives to the `peer review' method

Before you read this section, let me remark that I am not an expert on this subject. Thus most of what you will find here is a compilation of the proposals that have already been made by other people, plus some hopefully witty remarks.

Before we make any proposal regarding alternatives to the so called `peer review' method, it would be convenient to clarify which goals we would like the new method to perform. The standard list of tasks the current method claims to fulfil is more or less the following:

In fact, most `wrong' papers never get to be written. Results are usually exposed in seminars, congresses and other fora (forums) before being written. Most errors are caught at that stage. On the other hand, the referring process gives a false security on the correctness of a paper. Many referees do not work the details of the paper, especially if the author has some reputation. Since errata notices are appear in a later issue, they are hardly ever read. Important articles are known to be wrong not because of the published errata but because that knowledge lies in the lore. Regarding plagiarism, a referee can usually fight it if the plagiarized article is already published, something that is nowadays a clear limitation. Finally, we have already said that we do not consider the method as a reliable source of information on the quality of a paper. But, in any case, some variation of the former list should be considered as part of the demands the scientific community would make to any reviewing method.

On the other hand, `peer review' has fulfilled its purpose up to now, even if somewhat defectively. The purpose has then to be not to destroy it completely but to reform it. For this purpose it is useful to have in mind the list of flaws that we produced in section 3. With this two lists in mind, the suggested list of desirable characteristics that any reviewing method should have would be the following:

There is a problem that I have not faced in this list, namely, that `peer review promotes conservatism. This is so because what I am advocating for is a true `peer review' method. And, as Tocqueville already remarked, democracy by itself promotes conservatism. Thus to avoid this problem what we need is not a new reviewing system but a new scientific culture, with less respect for authority and more critical conscience. Otherwise, we are bound to remain in a situation in which most of the truly revolutionary discoveries are made by serendipity.

In fact, it is enough to allow a minority with such an unrespectful attitude to exist. Majorities are conservative by their own nature. But in my experience I would rather say that such minorities are now less likely to survive in the scientific media than they where in the past. I do not want to argue here in terms of majorities and minorities since I do not believe such arguments to be ethically correct.

Let us now try to see what could be done to provide a reviewing method that, taking into account the possibilities of the new electronic media, fulfils at least some of the requirements we have made. Since I am a mathematician, I will have in mind the existent arXiv. Other readers may substitute it for the corresponding dominant e-print archive in their areas. All along the following, we will also have in mind that the reviewing/follow-up system would be the basis, and any evaluation system should be built on top of it, but not intermingled with it. also, and for the present purposes, I would like you to allow me to assume that there is an authentication method (an electronic signature) that makes fraud by supplantation difficult. This can be easily implemented.

To begin with, we have the arXiv, that is, a place where people put their articles for the public to read them. It has some defects (only the abstracts, and not the full articles, are searchable), but it works fairly well. They are working in other that one can see which previous papers each article cites, and in which papers each article is cited. Observe that this provides the first method of evaluation. One can count how many articles cite one given paper. This method is quite unreliable, though. Many people cite arbitrarily. But in the arXiv a careful examiner can detect such arbitrarieties, if he has some time to spend. There is a possibility of, there or at some other place, to ask the author to give the three (as a maximum) more relevant cites. Then one could more easily build, not only an evaluation method, but a tree of knowledge in each area that would be useful to all researchers.

The first, simplest, evaluation method would a voting system on the articles. What I have in mind would be something like the current voting methods that work on the web, each article being evaluated from 1 to 10, but any method could be considered. This is quite simple, but also easily manipulable. To reduce the possibilities of manipulation, there is a simple way, though. Each vote should be properly identified, in a way that allows everybody to know who gave each vote. Additionally, and if the evolution of such a system proved it necessary, one could limit the number of votes each person could express each month.

The next step, one that we have come to see as natural in electronic media, would be to allow people to send comments on each article. Such comments could range from the three words `It looks nice' to the three pages full report on the article. These reports should be evaluated in a way similar to the way articles are evaluated. Moreover, if someone feels the need to do so, he should be able to fill a report on a given report, and so on.

The question that arises at this point is: why would people agree to spend their time reviewing other people's work? Let us try to answer that question realistically. Why does people write articles to begin with? Because their articles are an essential part of their curricula, without which they can not prosper in their careers. Thus people will freely choose to write reports on other people's work only if and when this is important for their careers. At this point a recommendation of the professional societies of a standard curriculum that includes, let us say, 15% referring would do the job.

The idea of letting the people choose which articles they want to write a report on could have a nice side effect if used properly. Assume that the number of reports does not exceed the number of articles by a large amount. Then, since people will (should) choose the articles they report because they are either interesting or easy to work on, the least interesting articles would be left unreported. This fact would force authors to try to write more interesting and well written articles in order to maximize the chances their articles have of being reported. If we manage to mix this method with a standard curriculum that does not take into account the number of articles but their quality, we would have a method that would by itself reduce the present tendency to overproduction of dull papers.

Many other evaluation methods could and will almost surely be proposed. I am not trying to make a complete list. But would be desirable to try them all, and moreover, to keep using several methods at once even if there are a few that are more successful that the others. Because a method that seems not fit to a given political and scientific entourage may prove optimal when this environment changes, and vice-versa. Thus sticking to only one method just because at some time it worked fine could be disastrous.

I would like to make a final digression on the cost of evaluation. Not all of the work scientists do is worth doing. Others (or even myself sometimes) would say that most of the work scientist do is not worth doing. On the other hand, scientists work mostly with public money, and the public should have the right to know that their money is not wastes. Thus some kind of evaluation is a necessity. But evaluation is a job that has some costs. Right now we do not have any idea of neither the cost nor the quality of the evaluations we are performing. Thus the present situation is unfair to the taxpayers and also to scientists. For ethical reasons (and for practical ones) scientists should make an effort to change this situation in order to make the costs of evaluation explicit, and this can be done only through more openness and transparency.

  Note (Oct 15, 2002)

I have recently come across an article ([Kup,02]) by Greg Kuperberg, where he deals with the same subject. Even if he is less critical with the present state of scientific publication, the conclusions he arrives at are not different from mine. His proposal is, in short, to transform (some of?) the existing journals into 'open journals' which would in fact act as evaluation boards, with non-anonymous referees. Observe that in order for such a method to work, the referee's reports should be considered as a part of his/her CV (as a reward). This would ultimately lead (even if he does not say so) to a evaluation of the referees themselves, and then to a cascading evaluation. But, if commercial publishers remained in the business, there would be a danger of substituting the recognition reward the referee would receive by a monetary reward (thus making the method more, instead of less, expensive).

References

Many articles on the subject of peer review and its reform can be found by (wisely) searching on Google. I have explicily cited the following:

[Jac,02]
Allyn Jackson, From preprints to e-prints: the rise of electronic preprint servers in mathematics, Notices of the AMS, Vol. 49, n.1 (2002), pp. 23-31.

[Kup,02]
Greg Kuperberg, Scholarly mathematical communication at a crossroads.

Last modified: Tue Oct 15 22:44:44 CEST 2002
Hosted by www.Geocities.ws

1