Users should note that the situation with respect to the Eagle in December 1992 is still as described below. However any user who feels that his work is seriously hampered by the scheduler should contact the Help Desk (ext 34681 or via userid ADVISER) who can make special administrative arrangements in case of need. P.Callow 22.12.92 Eagle Development Status March 1987 ------------------------ This is a standard reply to SUGGESTions and private comments about the Eagle job scheduler, the time spent composing individual replies in essentially the same vein having become unreasonable. All suggestion are however read and noted by the appropriate people. The Eagle is currently being accorded minimal CS staff effort, on the grounds that (a) it is by and large good enough, and (b) there are more important things to be done. It is recognised that there are definite deficiencies in the present Eagle, but the known deficiencies cannot be remedied easily, and are deemed to be such as we can live with. The Eagle does receive maintenance in emergencies, when it appears to have gone very badly wrong, and when other system developments definitely require it. Otherwise, the only only improvements come as a result of good will in staff spare time, and there is little enough of the latter. -------------------------------------------------------------- Some known deficiencies ----------------------- The following list is by no means the totality of known problems, but includes some common subjects of complaint. Failure to place the job so as to minimise the charge This is often expressed as "I submitted the job with parameters xxx, killed it, and resubmitted with less restrictive parameters, whereupon it was scheduled more expensively". The algorithm used for finding the minimum is fundamentally incorrect. Although for the most part it gets near enough the right answer, occasionally it is significantly wrong. The algorithm does, however, have the virtue of being cheap to run. None the less, the CS recognises that it was a mistake, and would in principle like it rectified. The code is, however, complicated, and would need rewriting in its entirety, together with all the debugging that this would imply. Failure to live up to promises The turnround values are estimates, not promises, and this is a fundamental pillar of the whole apparatus. Although there is some very substantial and complicated code in support of producing and realising these estimates, there is no actual commitment to them. The only commitment is to the price offered. Very large jobs are seen running during the day. True. Occasionally these are deliberate and fully paid for. More commonly they were started early in the day or mid-day when there was no other eligible work to do, and the alternative would have been to waste the machine. Some effort is made to minimise this when it is expected that more deserving work probably will arrive later. Large clusters of jobs for a single user are seen. True - and they could indeed not all execute at the same time, and possibly not all complete in one night even. Although it is clearly wrong that they should have been so placed by the Eagle, placing jobs so as to avoid clashing in this way as well as remaining within all the other constraints is a very complicated problem and the Eagle has no apparatus that would begin to cope with it. The consequence of these clusters is not however as severe as might be supposed, since various components of the Eagle do notice them and allow for the implications. In particular they are not regarded as being a point load on the system. Lack of job sequencing facilities Again, the CS recognises the desirability of such, but both the semantic design and the implementation are complicated. There is no doubt that this would be a major development project. In the absence of these facilities the recommended technique for those who must have automatic job sequencing is to have each job submit its successor. Various levels of sophistication can be built privately on top of this, and perhaps should be to avoid well known pitfalls, such as accidental multiplication of jobs and propagation of a job that will certainly fail. N.B. there is currently no way of guaranteeing the order in which two jobs on the queue actually run. The queue can be and is shuffled when relieving overload. FAST jobs are not always fast Our mistake, perhaps, was the use of the word FAST, but in the times when it was adopted it was indeed fast compared with the expectations of those days. None the less, the Eagle does try to keep the FAST queue within reasonable bounds, and does by and large succeed within our definition of "reasonable". Without actually reserving power, and thus wasting it, it is difficult to avoid the overcommitments to FAST that are sometimes seen. The only mechanism the Eagle has of holding off the load is to wind the price up, and lacking precognition it can only do this in response to the arrival of work on the queue. Immediate and gross fluctuations of price would be both unreasonable and unacceptable, so price does not always rise fast enough to stave off the load. Lack of a special fast tape queue This would not help significantly, the problem being a fundamental inability to meet the demand in these terms rather than any scheduling problem. There is particular contention for the use of tape decks during the day, especially the afternoon, and no tape job is really quick by the time the tape has been found, mounted, and wound back and forth. The use of DFHSM, IBM's file archive management system, has and is still continuing to lessen the need for day-time use of tapes. Meanwhile users who find themselves needing to shunt files frequently between tape and disc during the day are encouraged to discuss with User Services whether they should have more file space. XJOB lacks sundry useful features XJOB currently supplies all the information that is readily accessible. It is widely regarded as sufficient, though maybe not ideal. XEAGLE has bugs/infelicities/deficiencies XEAGLE was primarily written for the benefit of the maintenance team, and is made available to the wider public on a "why not?" basis only. Even outright bugs are not regarded as a serious matter, except in so far as they are a problem to the maintenance team.