Home

Home

Resumé

Ph.D. Studies

A Paper Presented in Partial Fulfillment Of the Requirements for

RM501 Survey of Research Methodology

Capella University, September, 2001

By D.L. Jackson

CRITIQUE OF RESEARCH ARTICLES

(Note: Changes suggested by the course tutor will be added later in October, 2001.)

Introduction

A problem for novice researchers is that published papers can be intimidating in the sense that their format and vocabulary may be difficult to comprehend. This would seem particularly relevant for social science students whose own work may reflect a more qualitative approach to research problems. Case studies, questionnaires, surveys, and interviews are a more common feature within their learning programs while statistical tests are used, more frequently, for other subject areas. Sadly, in addition to not feeling comfortable with qualitative terminology, they may feel a need to defend their choice of research design should they choose a qualitative format.

However, the author of this paper believes that the option of either format should be open to all subject areas and that researchers should not need to defend their methodology except in terms of defining its relevance. To judge qualitative research as "better" than quantitative is, in this author's opinion, shortsighted. It would be far better for researchers to adopt the most appropriate design and then to do it well. It may be that the issue is not a qualitative vs. quantitative issue but more of a good vs. bad research problem that is, unfortunately, more easily noticed in a qualitative report. It is hoped that the critique of the following article will enable the author to become more familiar with terminology and in distinguishing good from bad research.

Critique of Articles

Article 1. Citation: De Beer, M. & Visser, D., (March, 1998). Comparability of the paper-and-pencil and computerized adaptive versions of the General Scholastic Aptitude Test (GSAT) senior, South African Journal of Psychology, 28(1), 21-28.

Article Summary: A computerized adaptive test was constructed from two existing parallel paper-and-pencil versions of the General Scholastic Aptitude Test (GSAT) Senior. Achievement in the GSAT computerized adaptive test was compared to achievement in one form of the GSAT paper-and-pencil test. In computerized adaptive testing the program tailors each test to the examinee's ability level. Based on a statistical method known as Item Response Theory (IRT), the program interactively selects test items which are at the appropriate difficulty level for the individual being tested, thereby allowing a considerable reduction in test length without forfeiting measurement accuracy. The study was undertaken to investigate the equivalence of results obtained with three versions of the GSAT: A paper-and-pencil version, a standard computerized version, and a computerized adaptive version. The standard computerized GSAT was included to study the effects of computerization apart from adaptive testing. The results indicate that achievement in the paper-and-pencil GSAT and the standard computerized version of the GSAT were not equivalent because the examinees performed better in the paper-and-pencil version of the GSAT than in the standard computerized version of the GSAT. Following this investigation, certain adjustments were made to the CAT version of the GSAT. Firstly, the linear adjustments that gave the best results when compared to the paper-and-pencil version, were incorporated into the program software of the CAT, thereby ensuring that scores obtained with CAT version are equivalent to paper-and-pencil scores.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Mere computerization of a paper-and-pencil test for administration by computer does not necessarily optimally use the facilities that computers offer. The purpose was to obtain the information necessary to publish a computerized adaptive version of the GSAT that would provide results equivalent to those of the paper-and-pencil versions of the test.

Is the problem adequately narrowed down into a researchable problem? Yes. The aim of this study was, therefore, to compare results on the CAT version of the GSAT to results on the paper-and-pencil version of the GSAT in order to make adjustments to the computerized adaptive version if required, thereby ensuring equivalent measurement by the two versions of the test.

Is the problem significant enough to warrant a formal research effort? Yes. Recent advances in microcomputer technology have not only increased general access to computers, and encouraged the development of item response theory (IRT), but have made computerized tests and computerized adaptive tests (CATs) viable alternatives to paper-and-pencil testing. This has resulted in large-scale computerization of existing tests. However, generally computerized tests are direct copies of paper-and-pencil tests in content, format and sequence, the only difference being that one version is administered by computer, whereas the other is administered in the more traditional paper-and-pencil format.

Is the relationship between the identified problem and previous research clearly described? Yes. In research on comparisons between conventional tests and CATs, it has generally been found that the reliability and validity of adaptive tests are equivalent to or even better than those of conventional tests. Even though large differences exist between conventional test administration and computerized adaptive testing, comparable estimations of ability should, in principle, be obtained with both versions.

Step 2. Literature Review

Is the literature review logically organized? Yes. The literature review begins with an introduction to testing, then refers to advances in microcomputer studies as well as computerized adaptive tests (CATs) as alternatives to paper-and-pencil testing. The authors point out that the Binet-Simon test was, in principle, an adaptive test. The final point addressed concerned the problem of equivalence because earlier studies have indicated the different test formats are not necessarily equivalent.

Does the review provide a critique of the relevant studies? Not really. By "critique" I would expect that some of the literature review would include material that is contrary to the purposes of this study. The cited research studies, however, appeared to correlate with the potential findings of this particular study.

Are gaps in knowledge about the research problem identified? Possibly. The problem of balancing the content of CATs is an issue that has not been resolved because achieving statistical equivalence entails the adjustment of test scores such that the resulting score distributions are comparable. This process is not straightforward it eh case of equating paper-and-pencil tests with CATs, because IRT has shown that paper-and-pencil tests are less accurate than CATS at the extremes.

Are important relevant references omitted? I don't know as I am just beginning to look into this particular area of research.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? I can't identify a theoretical framework in this study. I would have thought that the differences between the testing processes of pencil-and-paper vs. computerized or computerized adaptive testing.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? Not relevant.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? The specific terms, independent and dependent variables, are not specifically defined. However, based on the discussion of the method used, the reader assumes the independent variable is the type of testing offered and the dependent variable is the score achieved by each student.

Are any confounding variables present? If so, are they identified? The confounding variables are not specifically; however, I believe there are several. One of which deals with the verbal tests: Three of the sex sub-tests of the GSAT are verbal tests, therefore no black pupils were included in the sample, because English or Afrikaans, are often not their first language. (Note: This survey was completed before May, 1998 at which time apartheid was abolished!) Another confounding variable may have been the following: There was some attrition of the original sample size due to testing taking place on separate days.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? No. The hypotheses are never stated because the purpose of the study was to obtain the information necessary to publish a computerized adaptive version of the GSAT that would provide results equivalent to those of the paper-and-pencil versions of the test.
Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Not relevant.

Do the hypotheses logically flow from the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? Depends. The population consisted of 16 year old English-speaking and Afrikaans-speaking high school pupils; however, no black students were included and the study was conducted in South Africa where there are English-speaking and Afrikaans-speaking pupils. The sample size was 613.

Is the sample representative of the defined population? This is difficult for me to know as I don't know what the total population is.

Is the method for selection of the sample appropriate? A random sample of 20 schools were selected for the study, omitting those schools which were included in the original standardization of the GSAT paper-and-pencil test, and also omitting schools with fewer than 500 pupils. Within each school, 18 boys and 18 girls were drawn randomly from the 16 year old pupils. Having lived in South Africa, I am fairly confident that schools with fewer than 500 pupils were those with people "of color" because the "white" schools were much bigger.

Is there any sampling bias in the chosen method? Yes, racial and language bias.

Are the criteria for selecting the sample clearly identified? Yes. The authors indicate the sample of 18 boys and 18 girls were drawn randomly but there is no indication as to how this breakdown of exactly 50% compares with the total population of English-speaking and Afrikaans-speaking pupils. Additionally, the sample was split into two groups of 242 and 371 pupils for the comparisons to evaluate the equivalence of the different versions of the GSAT. However, there was no explanation as to how the groups were divided. I have assumed that the two groups were tested on different days as well but this data is not provided.

Step 7. Research Design

Is the research design adequately described? No. The rationalization for the particular research design was not discussed. Most of the description of the methodology concerned the implementation of the test itself.

Is the design appropriate for the research problem? No. I was concerned about the description of data analysis because few statistical techniques were applied within this study and so it was unclear to me whether the results of the study were warranted.

Does the research design address issues related to the internal and external validity of the study? Not really. The paper-and-pencil version of the GSAT had previously been standardized on a large, representative sample of Afrikaans and English-speaking pupils between the ages of 13 years 6 months and 18 years 6 months. However, I am uncertain as to what is meant by the term "standardized". The terms internal and validity are not mentioned within the study.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? There was just the one data collection method described and, although the two groups were tested on separate days, there was no indication as to why they had to be tested on separate days.

Are the data collection instruments described adequately? Yes. The collection instruments were adequately described and the students were allowed to "practice" for several questions before beginning the computerized tests.

Do the measurement tools have reasonable validity and reliability? Not sure. The computerized test was designed using pre-standardized paper-and-pencil samples. The MicroCAT program, used to design the adaptive test, was not fully described and was said to be the only commercial package available at the time. The authors made the comment that the paper-and-pencil form of the GSAT had been extensively investigated and documented in the manuals and that, therefore, it was considered sufficient to prove the equivalence between scores of the CAT version of the GSAT and one of the paper-and-pencil versions of the GSAT which would then ensure that the validity and reliability of the GSAT paper-and-pencil version could be claimed for the computerized adaptive GSAT. However, earlier in the study the authors had indicated that the use of the computerized adaptive GSAT employed a completely different testing process and so if the process is different, how can the validity and reliability be transferred?

Step 9. Data Analysis

Is the results section clearly and logically organized? No. The results section was clearly labeled and yet there were no specific data described. Statements such as: In Table 1, the correlations between the P&P and PC versions and results of t-tests for related groups are also provided. One had to look to the tables for to find out that a SD and means were used as well as a level of significance, and possibly a Pearson product.

Is the type of analysis appropriate for the level of measurement for each variable? The type of analysis does not appear complete to me because there are no additional techniques to evaluate the correlation between the variables. A normalized standard scale was used to convert the theta scores of the adaptive test to integer scores. Following this, and using the converted theta scores as raw scores, norms with a mean of 100 and standard deviation of 15 were calculated to convert the scores obtained on the computerized adaptive GSAT to SA scaled scores.

Are the tables and figures clear and understandable? The tables are clear and understandable but this clarity is belied by the incompleteness of the actual article itself.

Is the statistical test the correct one for answering the research question? I believe that the analysis of the data is not complete. Because the tables provide additional information that is not explained in the report, I would assume that the individuals did not do their own number crunching and were, therefore, unable to explain their work.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? The authors indicated that computerized testing changes the dynamics of the testing situation and that this supported earlier findings of other researchers. They explained that the differences in the scores could at least partially be explained by the unfamiliar testing environments. They also indicated that a second explanation could be the fact that the MicroCAT does not allow the students to go back and change answers which would affect the computerized version more than the adaptive version because the student would eventually reach his/her appropriate level. However, one assumes that the paper-and-pencil test-takers were allowed to change their answers which would seen to compromise the testing results particularly because the researchers used the determined that the computerized testing changed the results and indicated the tests were not comparable.

Are the interpretations based on the data obtained? The authors don't discuss the major interpretations which are the different scores; however, they discuss the unfamiliar testing situations which were never really discussed within the study.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Sometimes. As mentioned above, some of the results are related to previous research but I saw no indications of relations to a conceptual/theoretical framework.

Are unwarranted generalizations made beyond the study sample? Yes. Without specifically testing for this situation, the authors indicated the present could support the findings of another research who reported that approximately 32% of the examinees were able to improve their ability scores in a classroom test when they were allowed to review their answers on a computerized test.

Are the limitations of the results identified? No limitations are mentioned.
Are implications of the results discussed? No with the exception that the linear adjustments are vaguely described so as to ensure equivalence between the two testing versions.

Are recommendations for future research identified? No.

Are the conclusions justified? No. The major conclusion was that considering the time saving of approximately 75% in test administration time, the computerized adaptive GSAT is a useful alternative to the GSAT paper-and-pencil version; however, this was never mentioned in the actual test design segment as a possibility.

Article 2. Citation: Ponsoda, V., Julio, O., Rodriguez, M.S., & Revuelta, J., (1999). The effects of test difficulty manipulation in computerized adaptive testing and self-adapted testing, Applied Measurement in Education, 12(2), 167-185.

Article Summary: One aim of this work is to gather new data and to carry out a new CAT (computerized adapted test) versus SAT (self-adapted test) comparison for estimated ability, standard error of estimated ability, posttest anxiety, and testing time. Previous woks were concerned with an efficiency comparison between SAT and CAT, so that time spent on choice of item difficulty was including in testing time. The testing time variable in this study excludes time spent on difficulty choices, and allows us to compare time invested in answering the item only, under both SAT and CAT conditions. Any differences appearing may be due to the psychological processes involved in responding to the type of test in question. A second aim is to check the effects of the number of items passed on proficiency and anxiety measures. To achieve a different number of items passed, easy and difficult versions of each type of test were needed as no previous manipulation has been carried out in SATs. One of the aims of this work was to study the motivational and psychometric effects of changing test difficulty in SATs an din CATs. The mean number of items passed was higher in the ECAT and ESTA conditions, as compared to DCAT and DSAT, respectively. This means that tests differing in difficulty were obtained, despite procedural differences. The study did not find clear positive SAT effects on ability and state anxiety. Ability and posttest anxiety correlations did not show the expected pattern, as stronger correlations have been found in the two easy conditions, rather than in the CATs. Suggestions for future research were suggested.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Research is trying to obtain new computerized test formats in which motivational aspects of the examinees are not disadvantaged by comparison with traditional testing formats. The aim is to find SATs without motivational drawbacks and with null or acceptable loss in precision and efficiency when compared to CATs.

Is the problem adequately narrowed down into a researchable problem? Yes. Easy and difficulty versions of SATs and CATs were compared with respect to estimated ability, posttest state anxiety, number of correct responses, testing time, anxiety change, and standard error of ability.

Is the problem significant enough to warrant a formal research effort? Yes. Current research trends clearly indicate a need to determine best-practices with regards to both SATs and CATs.

Is the relationship between the identified problem and previous research clearly described? Yes. There is a careful delineation between the problem of computerized testing and previous research comparing the different variables.

Step 2. Literature Review

Is the literature review logically organized? Yes. The literature was logically organized and covered a wide range of topics ranging from estimated ability of SATs and CATs; invariance property of item response theory; ability precision; relationship between type of test and test anxiety to algorithm selection and comparisons of CATs and SATs.

Does the review provide a critique of the relevant studies? Yes. On one occasion, the authors indicate that a particular research "attempted to…" while, in another case, research is reported as inconclusive with literature on both sides cited. Additionally, the research on anxiety has followed three approaches which are identified.

Are gaps in knowledge about the research problem identified? Yes. Posttest anxiety was higher in CAT as determined by Ponsoda (1997) but that difference did not reach the significance level. However, information in this respect provided by other authors is said to be scarce and inconclusive. Part of the problem appears to be found in the testing time where different techniques, in various studies, are employed for what to include in the testing time.

Are important relevant references omitted? I don't know as I have just begun searching this topic and am not aware of the seminal works in this area.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? I don't believe that the theoretical framework as to why computerized testing may be valid is addressed. The cited literature generally refers to a particular issue being tested e.g. a comparison between SAT and CAT rather than the motivation for computerized testing.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? It is clear, throughout the study, that the concept of computerized testing forms the basis of this research.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? There are two independent variables which are defined and well established: type of test (CAT vs SAT) and test difficulty (easy vs difficult).
Are any confounding variables present? If so, are they identified? As difficulty levels are achieved by different procedures in each type of test, crossing type of test and test difficulty would not be correct because the difficulty level obtained in both difficult conditions (or in both easy conditions) may not be the same. The design allows the researchers to compute the significance of the main factor type of test, the pooled effect of the factor test difficulty, and simple effects of the second factor inside each level of the main factor. No other information on the interaction effect between the two factors is provided by this design.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? Interestingly, no hypotheses are stated; however, it is assumed that there will be statistically significant differences within the four conditions being tested: easy CAT, difficult CAT, easy SAT, and difficulty SAT.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? There is no predicted relationship for this study; however, the authors do suggest explanations for findings related to specific literature citings.

Do the hypotheses logically flow from the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? Unknown.

Is the sample representative of the defined population? Unknown.

Is the method for selection of the sample appropriate? Unknown.

Is there any sampling bias in the chosen method? Unknown.

Are the criteria for selecting the sample clearly identified? No. A total of 187 high school students (127 boys, 60 girls) took part in the study. The sample was taken from a Spanish private school in Galicia, Spain. Ages ranged from 17 to 19 years but there is no additional information provided.

Step 7. Research Design

Is the research design adequately described? The design is adequately described and includes a brief discussion of the independent variables, resulting four conditions, and the use of one factor (test difficulty) being nested in type of test. This design allows the researchers to compute the significance of (a) the main factor type of test, (b) the pooled effect of the factor test difficulty, and c) simple effects of the second factor inside each level of the main factor.

Is the design appropriate for the research problem? The design appears to allow the researchers the opportunity to address their problem.

Does the research design address issues related to the internal and external validity of the study? The research design does not address internal and external validity of the study with one exception. The item bank for questions was apparently calibrated according to a three parameter logistic model and those details as well as additional information are provided in a cited article which I did not recheck.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? Yes although there were no different types of data collection described.

Are the data collection instruments described adequately? The procedure is described and included verbal directions that were repeated to each group in order to eliminate the possibility of extraneous instructions biasing the results.

Do the measurement tools have reasonable validity and reliability? There is no indication of validity and reliability with the exception of the item test bank which had been tested earlier but, even so, there is no mention of validity and reliability. An additional test was used, State Anxiety Scale, post pre- and post-test; however, no validity and reliability results are indicated.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes. A .05 level of significance was used in all the statistical analyses, a two-factor hierarchical analyses of variance were applied, and whenever the nested factor was significant, t tests were applied as well as calculated correlations between estimated ability and posttest anxiety for each test type.

Is the type of analysis appropriate for the level of measurement for each variable? The type of analysis appeared simple in that the mean, standard deviation, level of significance and t test were the only measures included. No other parametric statistics appeared to be used.

Are the tables and figures clear and understandable? Yes.

Is the statistical test the correct one for answering the research question? I would have assumed that once a statistical level of significance was discovered, additional parametric statistics might have been used to examine the correlations e.g. a factor and or path analysis.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes, the investigators broke down each variable being testing and discussed the effect on the overall outcome of the study.

Are the interpretations based on the data obtained? Yes, however, 4% of the results had to be discounted but it appeared the subjects were, in some cases, picking either significantly more difficult or less difficult questions that did not match up with their estimated ability.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? The researchers make four major conclusions that are linked to either earlier research findings or future research studies.

Are unwarranted generalizations made beyond the study sample? No.

Are the limitations of the results identified? Yes. For example, the study did not find clear positive SAT effects on ability and state anxiety; however, the researchers felt the unique aspects of the study should be taken into account e.g. non-U.S. population, younger mean age, and limitations of test parameters itself. However, I felt that at least some of the limitations should have been addressed earlier in the study. At the very least, there should have been more of an explanation regarding the population choice or the tests themselves should have been validated and reliability results published for this particular group.
Are implications of the results discussed? No.

Are recommendations for future research identified? Yes. However, the recommendations appeared to deal with the relationship between anxiety and test results rather than specifically on the differences seen in SAT vs. CAT.

Are the conclusions justified? Yes, but not conclusive and the actual methodology could have been improved to minimize the effect of non-validated variables, particularly in the choice of subjects, that might have biased the outcome.

Article 3. Citation: Eggen, T.J.H.M. & Straetmans, G.J.J.M., (October, 2000). Computerized adaptive testing for classifying examinees into three categories, Educational and Psychological Measurement, 60(5), 713-734.

Article Summary: The objective of this study was to explore the possibilities for using computerized adaptive testing in situations in which examinees are to be classified into one of three categories. Testing algorithms with two different statistical computation procedures are described and evaluated. The first computation procedure is based on statistical testing and the other on statistical estimation. Item selection methods based on maximum information considering content and exposure control are considered. The measurement quality of the proposed testing algorithms is reported. The results of the study are that a reduction of at least 22% in the mean number of items can be expected in a computerized adaptive test compared to an existing paper-and-pencil placement test. Furthermore, statistical testing is a promising alternative to statistical estimation. Finally, it is concluded that imposing constraints on the maximum information selection strategy does not negatively affect the quality of the testing algorithms.

Step 1. The Problem

Is the problem clearly and concisely stated? Very clearly. The purpose of this article is to explore the possibilities for CAT based on item response theory in a situation in which examinees are to be classified into one of three categories.

Is the problem adequately narrowed down into a researchable problem? Yes. The researchers address the fact that although CATs were originally developed to obtain an efficient estimate of an examinee's ability, they can also be used to help classify individuals.

Is the problem significant enough to warrant a formal research effort? Yes. With more and more computers in use, it is imperative that we understand the significance of different testing techniques in determining which is most effective given a particular environment and/or desired end result.

Is the relationship between the identified problem and previous research clearly described? Yes. The research question is described along with a description of the algorithms behind CAT and citations include as well the different results obtained from CAT studies.

Step 2. Literature Review

Is the literature review logically organized? Yes. However, it is brief. Of the 19 references cited, only 6 were used in the literature review and, of those, several were mentioned more than once.

Does the review provide a critique of the relevant studies? Yes, although few studies are mentioned. In some cases, the studies cited were said to limit their conclusions to the specific situation studied and several of the studies were published in the 1980s.

Are gaps in knowledge about the research problem identified? It would appear that the literature review is not comprehensive and some references are old but possibly not of a seminal nature as I have not seem them referenced in other articles.

Are important relevant references omitted? I don't know but I would assume that at least with respect to the algorithmic nature of CAT, there were many other studies that could have been utilized.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or dose it seem forced? I don't believe this article addresses a theoretical framework although, arguably, the algorithm discussion of CATs might fit this category although it was not then included within the parameters of the study as a whole.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? The conceptual framework explained concerned a few brief statements regarding the nature of the educational system in the Netherlands and the expression of a need to maintain confidentiality and increase the measurement accuracy for large groups which is quite difficult for paper-and-pencil test scenarios.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Specific variables, whether independent or dependent, are not mentioned; however, it is assumed that the independent variable would be the type of test taken by the subject although this area is not discussed.

Are any confounding variables present? If so, are they identified? Although confounding variables may be present, this is dependent on, among other things, the selection process for the subjects but this is not described.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? The hypotheses are listed as H0_1:0<=0₁₁ etc. I am sure that someone with more experience could determine the meaning; however, it gave me great difficulty. I needed the hypotheses to be stated in words as well and, although I checked the tables and charts as well, the hypotheses were always mentioned in that fashion.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? I don't know because I couldn't interpret the hypotheses although there were several research questions mentioned: (1) Which testing algorithm is most suitable for the computerized adaptive placement test for mathematics, given a number of practical requirements?; (2) Which statistical computation procedures are suitable for classifying examinees into one of three different levels?; (3) Which item selection methods should be considered?; and (4) How do the testing algorithms operate in terms of measurement accuracy, the number of misclassifications, measurement efficiency, adherence to content specifications, and the distribution of exposure rates over the item bank?

Do the hypotheses logically flow from the theoretical or conceptual framework? As the intended purpose of the study is to determine the viability of statistical testing or statistical estimation in the testing algorithm, the aforementioned questions do seem appropriate and to fit within the conceptual framework of determining the best testing procedures for large groups of subjects.

Step 6. Sampling

Is the sample size adequate? I was confused here. The only mention of a sample size was the following: "In the calibration study, 268 items were administered to a sample of 1,198 students in an incomplete design in which each student was administered one of 16 different, though overlapping, booklets with about 43 items" (Eggen & Straetmans, p. 716). However, upon first reading, I had assumed that "calibration study" referred to a pilot study and then expected to see more information later on in the study which were not forthcoming.

Is the sample representative of the defined population? Unknown.

Is the method for selection of the sample appropriate? Unknown.

Is there any sampling bias in the chosen method? Unknown.

Are the criteria for selecting the sample clearly identified? No.

Step 7. Research Design

Is the research design adequately described? The design was referred to as "incomplete" with regard to the calibration study and as a "simulation study" when the performance of the computation procedures and item selection methods were investigated.

Is the design appropriate for the research problem? It appeared, to me, that the focus of this study was not so much on the overall design but on the evaluation of the algorithms being investigated.

Does the research design address issues related to the internal and external validity of the study? The internal and external validity issues were not specifically addressed. However, the mathematics item bank was said to be calibrated although the term "calibration" was not defined.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? I am unsure as to how the data were collected. There is mention made of the OPLM computer program that was used in the scaling of an item bank with 250 items that was established by imposing the constraints for the mean item difficulty and the discrimination indices.

Are the data collection instruments described adequately? No.

Do the measurement tools have reasonable validity and reliability? Unknown.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes, the results are clearly organized with cited tables located near the text rather than as a set of appendices at the end. The organization included the following sections: (1) measurement accuracy with statistical estimation; (2) the algorithms in the conditions of the placement test; (3) statistical estimation; (4) statistical testing; (5) comparison of statistical estimation and statistical testing; and (6) exposure data.

Is the type of analysis appropriate for the level of measurement for each variable? It appeared to me that the critical component of this study was the work designed to address the algorithmic features of CAT. Certainly, each subject's choice of questions were carefully analyzed via a confidence interval, standard error, examinee's ability, examinee's true ability, etc.

Are the tables and figures clear and understandable? Yes, there were several tables and charts presented that illustrated the formulas used. Someone of more experience would have had no difficulty understanding the nature of the charts and tables.

Is the statistical test the correct one for answering the research question? Although the algorithms for item testing and the analysis of the data is comprehensive, I see little reference to inferential statistical procedures such as regression, ANCOVA, etc. and I was wondering whether or not there might be a difference between statistical tests used in international and those in American studies.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes. Each section is clearly labeled and the results for each segment are discussed; however, the interpretations are presented in the discussion section.

Are the interpretations based on the data obtained? Yes. With regard to the testing algorithms used for the classification of examinees into three categories, the conclusion is that statistical testing as a computation procedure is a promising alternative to the more traditional statistical estimation procedure.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Yes, for some aspects. Results reported in this study on the relation between the size of the acceptable decision error rates and the width of the indifference zone and the performance of the test are consistent with those of earlier studies for two-way classification.

Are unwarranted generalizations made beyond the study sample? No. In fact, as noted below, the authors indicate that is a problem with the statistical basis for this type of study.

Are the limitations of the results identified? Yes. The authors point out that the comparison between statistical testing and estimating does not have a proper statistical basis and that, therefore, the generalizability of the results above estimating is not guaranteed. The authors point out the problem is that, in estimation, there are no formal relationships between indifference zones and acceptable decisions error rate. They indicated there was a newer study that employed a different procedure that leads to a matching of the accuracy of testing and estimating.
Are implications of the results discussed? Yes. The following implications are mentioned: (1) the quality of the item bank is satisfactory; (2) the maximum of 25 items for each test is realistic; (3) the reduction in the number of required items can be expected to amount to between 22%-44%; (4) applying the double SPRT; (5) the imposition of constraints on item selection in the form of content or exposure control; and (6) the final implementation of a CAT in the placement test should be used only after determining whether or not the algorithms operate the same in simulations as in real testing situations.

Are recommendations for future research identified? Yes. The authors suggest that the classification system should be expanded to three categories rather than the previously studied two; the quality of the testing algorithms; and the consequences of truncating algorithms at a maximum test length for acceptable decision error rates.

Are the conclusions justified? Yes. However, the conclusions do not appear comprehensive and the discussion appears to address the difficulties with the study more than the appropriateness of the conclusions.

Article 4. Citation: Lloyd, D., & Martin, J.G., (March, 1996). The introduction of computer-based testing on an engineering technology course, Assessment & Evaluation in Higher Education, 21(1), 83-91.

Article Summary: The authors indicate that lecturers are exploring the possibility of using non-traditional methods in some aspects of their work to deal with the increasing number of students along with the concurrent reduction of resources. One possibility is to apply new technology in the assessment of students. This paper presents a controlled comparison between traditional paper-based tests and those using a computer. It concludes that the new technique is acceptable to students and produces results with no deterioration in their validity and has great potential for using staff time in other areas rather than in assessment.

Step 1. The Problem

Is the problem clearly and concisely stated? The problem for the engineering department is that the program uses phase testing to assess students which means that a series of written examinations are set throughout the year to assess different syllabus sections. Although the system worked with 30-40 students, there are now upwards of 100 plus students which is causing problems with increased staff workload and a question of uniformity of assessment when students are taught in several different groups.

Is the problem adequately narrowed down into a researchable problem? Yes.

Is the problem significant enough to warrant a formal research effort? Yes although, clearly, the problem is set up as a case study for this particular university and is not intended to be generalized to other areas.

Is the relationship between the identified problem and previous research clearly described? There is no previous research cited. The only explanatory notes regard the previous assessment procedure at the University.

Step 2. Literature Review

Is the literature review logically organized? The cited literature regarding computerized testing, although minimal, is cited when the authors discuss their testing strategies.

Does the review provide a critique of the relevant studies? No.

Are gaps in knowledge about the research problem identified? No.

Are important relevant references omitted? The authors do not appear to have wanted a literature review component in this particular study so one would assume that relevant references have been omitted. In particular, one would have expected some more citations dealing with the nature of computerized adaptive testing. However, it is important to note that this article was published in 1996 which is a long time ago with reference to computerized testing research studies.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or doe it seem forced? No theoretical framework is specified.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? No conceptual framework is established with the exception of a perceived need to change the system for the current lecturers and to adopt a system of computerized testing.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? No. Neither independent nor dependent variables are specifically addressed.

Are any confounding variables present? If so, are they identified? Although the terms, are not used, certainly one confounding variable was that the computing skills of the students was said to be minimal. However, there were no allowances for that built into the design.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? The hypotheses are not clearly identified; however, it is assumed that the students' performance on both modes of testing as well as their self-assessment comments would form the basis of the study.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? There does not appear to be any discussion of the relationship between students, estimated and/or known ability, and performance on the tests.

Do the hypotheses logically flow form the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? Unknown.

Is the sample representative of the defined population? Unknown. However, one assumes the sample was representative of at least the engineering students who were enrolled in the program.
Is the method for selection of the sample appropriate? Unknown. However, there was a control group (N=?) who took the pencil-and-paper in 1992 while the second phase took place in 1993 with three groups who were assigned "on a random basis" to Group A: (traditional test); Group B: (computer-based examination); and Group C: (computer-based but over a four day period of time). Interestingly, the group size varied with Group C the smallest so as to minimize the extra load on the open access computer facilities which were always in demand.

Is there any sampling bias in the chosen method? There would appear to be significant sampling bias. For example, the group size; the fact that, once begun, the computer test had to be completed in one session; and that the Group C participants might have spoken with others involved in the study because they had to complete their test within a four day period which could have provided substantial sampling bias.

Are the criteria for selecting the sample clearly identified? No.

Step 7. Research Design

Is the research design adequately described? The research design is not adequately described; however, the procedure is fairly detailed.

Is the design appropriate for the research problem? It would appear that this was indeed a simplistic design and appeared to be organized more around a case study or even an action research problem.

Does the research design address issues related to the internal and external validity of the study? No issues of validity were addressed.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? There is no description of the data collection by the control group and the only description of data collection for the Phase Two groups indicates the computer program marked the results.

Are the data collection instruments described adequately? No.

Do the measurement tools have reasonable validity and reliability? No. Although I have read the study several times, there were no indications of validity and reliability measurements.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes but there are hardly any results.

Is the type of analysis appropriate for the level of measurement for each variable? Well, not exactly. The only analysis I see is the average score.

Are the tables and figures clear and understandable? There are none.

Is the statistical test the correct one for answering the research question? I can't say that there was an adequate and/or appropriate test used for answering the research question. It appears they only wanted to determine the difference in scoring but did not determine the reliability or validity of their measurement tools which means, in effect, that any differences are clearly not correlated.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? I suppose so but the findings were minimal. Additionally, the researchers used a questionnaire at the end to determine which test the examinees had preferred but there are no notes indicating the problems with self-assessment as a reliable tool of measurement.

Are the interpretations based on the data obtained? Yes but I believe the testing was faulty and, therefore, the interpretations cannot be correct.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? No other significant links here except a few made about early computer testing. There are, however, no descriptions of instructional design differences between paper-and-pencil and computerized tests.

Are unwarranted generalizations made beyond the study sample? Yes. The statement, "Comparison of results from the two modes of assessment show that computer testing techniques can be valid and also by their nature ensure reliability" (Lloyd & Martin, p. 89) cannot be made based on the study described in this paper.

Are the limitations of the results identified? No limitations are identified.
Are implications of the results discussed? The implied result is that computerized testing can be used to save staff time and that the results will be reliable.

Are recommendations for future research identified? While unsupervised computer examinations did not show any bias from the results, it was felt that further safeguards against student collusion will need to be developed before such open-access testing can be performed on a wider population.

Are the conclusions justified? If the foundation is faulty, the conclusions cannot be justified.

Article 5. Citation: Desai, M.S., (December, 2000). A field experiment: Instructor-based training vs. computer-based training, Journal of Instructional Psychology, 27(4), 239-244.

Article Summary: This article evaluates the impact of instructor-based training vs computer-based training. One of the major issues of end-user computing is training individuals to use it effectively and so the researchers have looked at key variables such as training support, delivery, techniques, and individual differences that can be manipulated to enhance the training program. The findings indicated that the major differences between IBT and CBT subjects were attributed to the performance, enrollment for the classes, motivation and general attitude toward training method, and satisfaction with the facility.

Step 1. The Problem

Is the problem clearly and concisely stated? The authors distinguish between education and training and indicate that the market forces are causing the corporate world to act in a defensive manner where they must constantly struggle to keep their staff trained on the newest software. This, they indicate, is getting to be more of a problem as administrators strive to determine the most efficacious training methods.

Is the problem adequately narrowed down into a researchable problem? Yes. It is clear that, although this is termed a longitudinal study, the aims are clear-cut in that businesses need to find an effective training methodology for computer applications.

Is the problem significant enough to warrant a formal research effort? Yes. There is an ongoing search to distinguish between methodologies appropriate to F2F vs computer-based environments.

Is the relationship between the identified problem and previous research clearly described? Previous research is not addressed until the conclusions are explained.

Step 2. Literature Review

Is the literature review logically organized? There is no apparent literature review although the bibliography includes 19 references. At the end of the study, 6 citations are made that deal with effective training techniques.

Does the review provide a critique of the relevant studies? No.

Are gaps in knowledge about the research problem identified? No.

Are important relevant references omitted? As the literature review is parsimonious at best and the article was published in 2000, I am certain that important references are omitted.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? There was no theoretical framework proposed.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? There was no conceptual framework proposed. In fact, this study appeared to be instigated solely on the basis of need. The bottom line of corporate America means efficiency is critical to their success (contrary to most educational institutions!) and, therefore, money spent on such studies adds up to increased profits for them and an efficiently trained workforce. It would seem, therefore, that this study was motivated not by theoretical or conceptual frameworks but by a business model.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? "Key" variables were defined as the training methods and tasks; however, the terms independent and dependent variables were not mentioned.
Are any confounding variables present? If so, are they identified? With the exception of highest level of education achieved, which appeared to be a factor in whether the individuals chose CBT or IBT, there were no apparent confounding variables. Interestingly, although the level of education is addressed, one would have thought the researchers would have worked that into the design itself because the subjects chose the method themselves.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? No hypotheses were presented.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Not relevant.

Do the hypotheses logically flow form the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? I don’t believe so although I don't know how large the defined population was as the authors stated, "The target population for this study ws the end-users of information systems technology. The subjects for this study were employees of a Fortune 100 corporation located in a major southwestern city in the United States" (Desai, p. 239).
Is the sample representative of the defined population? I don't believe so. Even if the defined population size was given, the samples for the different groups: IBT and CBT were not comparable as they were 90 vs 21 respectfully.

Is the method for selection of the sample appropriate? It doesn't appear that the end result was obtained from an optimal sample size, particularly from the CBT group. The authors indicate a "self-selecting and convenience sample was employed" (Desai, p. 239); however, there was a definite difference between the groups e.g. high level of education achieved which was not addressed.

Is there any sampling bias in the chosen method? There does not appear to be any justification for the self-selecting model.

Are the criteria for selecting the sample clearly identified? Some criteria are indicated. For example, the subjects chose the type of training most closely aligned to both their workload and the actual training schedule itself.

Step 7. Research Design

Is the research design adequately described? I don't believe so. There is clear information about the beginnings of the study but there is no indication as to how long the study was maintained, only that the subjects were testing at the beginning and end of training as well as one month after training completion.

Is the design appropriate for the research problem? The design does not appear to be well described

Does the research design address issues related to the internal and external validity of the study? No issues of internal and/or external validity were addressed.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? The authors indicated that the subjects' scores were recorded along with demographic data on each subject. It was stated that the demographic data were used in assessing the prior knowledge of the subjects.

Are the data collection instruments described adequately? Not really. Scores were recorded but how the score was assigned is not made clear and neither is the difference between computer- and instructor-based assessment noted.

Do the measurement tools have reasonable validity and reliability? Although not clearly stated, it would appear that the computerized software used was of a commercial nature. In that case, there should have been an indication of validity and reliability measures but nothing was stated.

Step 9. Data Analysis

Is the results section clearly and logically organized? Although the results are presented, they are not clearly organized.

Is the type of analysis appropriate for the level of measurement for each variable? The employee's performance data were analyzed using one-way ANOVA test and employee's satisfaction data were analyzed using Kruskal-Wallis and Mann-Whitney tests at a significance level of 5%.

Are the tables and figures clear and understandable? The only table presented was one that indicated the breakdown of males and females and the totals of those who chose IBT or CBT. No other tables were included.
Is the statistical test the correct one for answering the research question? With such a small sample size, particularly for the CBT, I would have thought the level of significance would have been lowered to 1% to avoid the possibility of a Type I error. Additionally, there is no indication as to the level of achievement was prior to the testing particularly given that all employees were expected to complete the training regardless of positions, regardless of knowledge.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? This is difficult to determine because the results are not presented in either tabular or expository format.

Are the interpretations based on the data obtained? The authors often make comparisons between the results achieved in MS Word and those obtained in MS Excel which I find strange because the skills necessary for MS Word excellence would not, I assume, be similar to those necessary to prove excellence in MS Excel. Interestingly, although the authors state the results show CBT to be more effective than IBT, the sample size was quite small and, more importantly, there is apparently no statistically significant difference between the end of training and one after training results for any teaching mode.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? The authors employ citations of former research studies which are all intended to address the implications of the results.

Are unwarranted generalizations made beyond the study sample? Yes, the findings indicated that the major differences between IBT and CBT subjects were attributed to the performance, enrollment for the classes, motivation and general attitude toward the training method and satisfaction with the facility; however, these issues were not clearly addressed within the body of this report. The authors report that motivation and attitude were not measured but were interpreted based on the general comments made by the subjects which, surely, cannot be used to indicate an official finding.

Are the limitations of the results identified? No limitations are identified by the authors.
Are implications of the results discussed? Yes. Several implications were identified: (1) CBT as a methodology is difficult to "sell" to the employees; (2) the viability of any training over the long term; (3) identification of those who need training vs those who do not; (4) a consideration of learning styles; (5) the mixture of CBT and IBT to enhance training; and (6) the need for training managers to work in close alliance with software vendors.

Are recommendations for future research identified? Except as indicated above with regard to implications, no further future research recommendations are identified.

Are the conclusions justified? The conclusion, "It was determined by the research that CBT is an effective means of training; however, its acceptance as a formal training tool was not favorable" (Desai, p. 233) was indicative of the types of statements made throughout the study that did not appear to be fully explained.

Article 6. Citation: Brown, K.G., (Summer, 2001). Using computers to deliver training: Which employees learn and why? Personnel Psychology, 54(2), 271-297.

Article Summary: Note: This paper was adapted from the author's doctoral dissertation. The author states that computer delivered training typically offers learners more control over their instruction. In learner-controlled environments, learner choices regarding practice level, time on task, and attention are expected to be critical determinants of training effectiveness. To examine the effect of learner choices in computer based training, a study was conducted with 78 employees taking an Intranet-delivered training course. Learner choices were assessed and predicted with goal orientation (mastery and performance) and learning self-efficacy, as well as age, education, and computer experience Results indicated considerable variability among trainees in practice level and time on task, which both predict knowledge gain. performance orientation interacted with learning self-efficacy to determine practice level, and mastery orientation had an unexpected negative effect. Implications for the use of computers to deliver training and for future research were discussed.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. The author examines the different aspects of computer-based instruction but, more importantly, the author addresses the fact that a new model will need to be incorporated into learning theory that specifically addresses learner choices.

Is the problem adequately narrowed down into a researchable problem? Yes. This study provides two contributions to existing research. The first contribution is the introduction of individual differences and learning theory to research on computer-based training. The model presented provides a conceptual framework for research on computer-based training that is consistent with prior theory-driven research on individual differences and learning. This study asserts practice level and time on task as classic yet understudied constructs that are central to determining learning outcomes. The second contribution of this study is that it examines learner choices and goal orientation with adult employees enrolled in an organizationally sponsored course.

Is the problem significant enough to warrant a formal research effort? Yes. Enough studies have been compiled on computer based training for us to realize that now a different phase has been reached. Clearly, the technology is available for the creation of virtually any course but we do not yet understand what makes one learner more successful than another and while paper-and-pencil assessments, etc. will yield some data based on good practices, we must now learn how to use the computer to evaluate the data as well as the learner.

Is the relationship between the identified problem and previous research clearly described? Yes, contrary to other studies I have read, the author of this study breaks down previous research and describes both the main points as well as possible explanations for any abnormalities and/or unexplained findings.

Step 2. Literature Review

Is the literature review logically organized? Yes. The author presents the literature review in a topical fashion whereby each area to be addressed in the current study is examined and justified from an earlier research point of view.

Does the review provide a critique of the relevant studies? Yes. At various times, the author makes suggestions regarding the findings of earlier researchers. For example, "the failure of goal orientation to predict these choices may have resulted from the limited duration of the training and the limited number of practice opportunities available" (Brown, p. 275). In another example, "Again, differences among learners may have been restricted because all students were given the same amount of time for practice" (Brown, p. 275).

Are gaps in knowledge about the research problem identified? I can see no gaps related to this particular study.

Are important relevant references omitted? This is unknown to me; however, the bibliographic list is extensive and there are many references to earlier authors.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? Goal orientation theory forms the framework for this study because the author is concerned with choices made by learners.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? I believe that the emphasis of this study is on the goal orientation theory mentioned above.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Age, education, and computer experience were all assessed with adequate measures. In this particular study, the emphasis was on the learner's processing during the testing and, therefore, the focus was constantly on the learner because, the author believed, effective learning strategies need to be identified and then taught to others.

Are any confounding variables present? If so, are they identified? I did not notice any confounding variables.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? Yes. Several hypotheses are clearly stated: (1) Individual differences are expected to predict behavior and cognition of learners during the learning experience, and these behaviors and cognition are expected to be the most proximal predictors of learning; (2) The more learners practice and spend time on task, the more they will learn; and (3) the more learners engage in off-task attention, the less they are hypothesized to learn.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Yes. In some cases, the author points out that a particular hypothesis will need to be supported in a number of laboratory studies but that it must also be studied within the context of organizationally sponsored training programs as well.

Do the hypotheses logically flow from the theoretical or conceptual framework? Yes. All are related to goal orientation theory.

Step 6. Sampling

Is the sample size adequate? Although it is not clear how large the original population was, the sample size consisted of 78 technical employees.

Is the sample representative of the defined population? It appeared that employees were given the choice of taking or not taking the computer version and this might have added bias into the results because one is not aware of why an individual chose the computer version over the paper-and-pencil versions.

Is the method for selection of the sample appropriate? I don't know how selection process was completed. It appears that it was of a strictly volunteer nature.

Is there any sampling bias in the chosen method? If, indeed, the subjects were self-chosen, then bias might be possible because their reasons for the decisions to do CBT would not have been known.

Are the criteria for selecting the sample clearly identified? No.

Step 7. Research Design

Is the research design adequately described? The design is described by procedure rather than by name with indications as to what each learner was going to face during the two day exercise. However, during the conclusion section, the research model was said to be a mediated model that depicted learning choices as the key process by which individual differences influence learning.

Is the design appropriate for the research problem? Yes. The procedural explanation explained by the author appears to incorporate all of the areas to be considered when addressing the problem of learner motivation.

Does the research design address issues related to the internal and external validity of the study? Again, although the specific terms were not mentioned, certainly the descriptions of the methodologies chosen indicate that the internal validity would be sufficient to guarantee an accurate result while the external validity of having the study performed in a computer laboratory and a corporate training center.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? Yes. Learners began by completing a computerized survey that included individual difference measures as well as pretests. The self-report measure of off-task attention was collected after learners completed 3 modules out of a total 9 modules. The computer recorded the practice activities completed and time on task. Posttests were available on the computer when a learner had completed the course. Scoring focused on technical accuracy and evidence of application while ignoring presentation e.g. spelling. Two advanced graduate students were trained to grade the responses. Following training the raters coded all answers independently. Ratings were correlated to examine multi-rater, multi-question relationships. The average cross-rater, same-question correlation was .74 on the pretest and .82 on the posttest, indicating that raters provided consistent ratings.

Are the data collection instruments described adequately? Yes. See above.

Do the measurement tools have reasonable validity and reliability? Yes.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes. The results were indicated by measure and the following were included: age, education, computer experience, goal orientations, self-efficacy, time on task, practice level, off-task attention, and knowledge.

Is the type of analysis appropriate for the level of measurement for each variable? Yes.

Are the tables and figures clear and understandable? Yes. There are several tables and figures presented which represent descriptive statistics and correlations, standardized coefficients for regression of learning choices and knowledge posttest on individual differences, and standardized coefficients for regression of knowledge posttest on learning choices.

Is the statistical test the correct one for answering the research question? Yes. The effects of individual differences on the three learner choices and the learning outcome were examined using multiple regression. Then, hierarchical regression was used to test the effects of learner choices on knowledge gain by regressing posttest score on choices, controlling for pretest score. In addition, the mediation model was examined using the procedures recommended by an earlier, published research author.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes. Age and computer experience were not strongly associated with learner choices; however, computer experience was positively related to both pre- and posttest scores while practice level and time on task both were related to knowledge test scores in the expected direction.

Are the interpretations based on the data obtained? Yes. For example, it was hypothesized that mastery orientation would predict practice level and time on task, and performance orientation would predict off-task attention. Off-task attention was predicted by performance orientation and mastery orientation. The relationships among goal orientations, practice level, and time on task were small and statistically insignificant, except for an unexpected negative relationship between mastery goal orientation and practice level. Therefore, the hypotheses regarding the prediction of off-task attention were supported but the hypotheses regarding the prediction of time on task and practice level were not.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Yes. This study addresses several earlier research questions including the need for more research on goal orientation with adults as well as the need to examine goal orientation and self-efficacy as predictors of training outcomes.

Are unwarranted generalizations made beyond the study sample? No.

Are the limitations of the results identified? Yes. For example, in the discussion about high mastery-oriented learners, the author addressed one particular limitation. Although high mastery oriented learners may have failed to practice and learn key features of the problem solving process, they may have become familiar with the process as a whole, gained more proficiency with the training program and sued the system more following training. This, the author proposes, may have been the case with the mastery-oriented learners; however, no data was collected to test these hypotheses. Future research might then explore these possibilities by studying a broader range of training outcomes including use of the course web site for performance support. Additionally, this study did not use a control group and so conclusions cannot be drawn about the effectiveness of this particular course relative to other instructional designs or formats. A third limitation was that the sample consisted of volunteers who may, themselves, have been a different sampling than the general population. The last major limitation concerned the small sample size and modest reliabilities of some measures.
Are implications of the results discussed? Yes. The results suggest that employees may not use control over their learning wisely as indicated by the number of skipped practices, quickly moving through training, and, therefore, obtaining a much lower score.

Are recommendations for future research identified? Yes. Research should address how learner choices regarding practice level and time on task influence more distal training outcomes, such as skill maintenance and generalization.

Are the conclusions justified? Yes. The results of this study suggest considerable variability in learner choices and, as a consequence, learning. The author suggests that as responsibility for learning is shifted from trainers to learners, learner choices will become an increasingly important determinant of overall training effectiveness.

Article 7. Citation: Barab, S.A., Young, M.F., & Wang, J., (1999). The effects of navigational and generative activities in hypertext learning on problem solving and comprehension, International Journal of Instructional Media, 26(3), 283-310.

Article Summary: The study examined learning while using a linear text, navigational hypertext, or a generative hypertext system. In one experiment, students were assigned either the linear or navigational hypertexts and expected to learn the information to solve a posed problem, while students in the second experiment learning the information to pass a reading comprehension test. In the third experiment, the effect of carrying out generative activities on problem solving and reading comprehension was examined. The results indicated that the number of generative activities explaining a significant amount of the variance in problem-solving and reading comprehension scores.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Although hypertext environments are becoming omnipresent, the research related to the benefits of hypertext learning environments is divided. Some studies have found increased benefits of learner-controlled instruction while others have found that students with high degrees of learner control performed less effectively than those receiving program control. The research does indicate, however, that a more appropriate use of hypertext systems might be that involving problem-solving tasks rather than reading comprehension tasks.

Is the problem adequately narrowed down into a researchable problem? Yes.

Is the problem significant enough to warrant a formal research effort? Yes, most certainly. It is not enough just to be able to use computers as an enhanced learning environment, we must conduct more studies to determine which Internet facilities are useful for which type of learning and for which type of learner.

Is the relationship between the identified problem and previous research clearly described? Yes. In particular, the authors do a good job of identifying both the positive and negative research findings for this particular issue.

Step 2. Literature Review

Is the literature review logically organized? Yes. In particular, the authors do a good job of identifying both the positive and negative research findings for this particular issue.

Does the review provide a critique of the relevant studies? Not particularly. However, in this instance there are a number of studies that show contradictory results. A critique of some of the studies to determine whether or not the methodologies might have been one of the reasons for the discrepancies would have been helpful; however, this particular study is actually three-in-one and perhaps there was some sort of inherent word-count limitation in which the authors determined the results of the studies were more important than a literary critique of the available literature.

Are gaps in knowledge about the research problem identified? Not really except to point out the divergent results of previous studies.

Are important relevant references omitted? Unknown.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? One of the theories put forth by the authors is that of generative learning which has, in the recent past, appeared to help significantly increase both retention and text comprehension compared to control groups who did not work with a generative model.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? I believe that the theoretical components of this study were more relevant than any conceptual framework.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Yes, although not through those terms. The independent variables included: total time, self-determination, problem-solving, and reading comprehension.

Are any confounding variables present? If so, are they identified? I don't know whether or not there are other confounding variables. I couldn't find evidence of any pre-testing and/or post-testing which would have helped to clarify the issue.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? There were several hypotheses presented including: that learners who engaged in the generative activities would do better at recalling information due to their active processing while reading.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Yes. In the aforementioned hypothesis, for example, the authors sought to address the issue under two conditions: (1) when subjects had a specific problem solving goal in mind for reading, and (2) when their goal was simply to comprehend the text in the context of preparing for a traditional test of reading comprehension.

Do the hypotheses logically flow from the theoretical or conceptual framework? Yes because the basis of this study is the effect of hypertext on generative learning in the hopes that learning is actually enhanced and of a long-term nature.

Step 6. Sampling

Is the sample size adequate? Not known.

Is the sample representative of the defined population? Not known. The participants for this study were recruited from psychology and education classes at a northeaster US university. The process of recruitment was not explained.

Is the method for selection of the sample appropriate? Unknown except that, as an added incentive, the undergraduate students were given extra course credit and then randomly assigned to one of three conditions. There were 13 males and 35 females.

Is there any sampling bias in the chosen method? Although this was not mentioned, there are approximately 51% females and 49% males in the universities; however, I don't know what the breakdown is for education and psychology classes but certainly the percentages mentioned above do not appear to be a representative sample. Additionally, the additional course credit might have dictated a certain type of individual to participate in the study.

Are the criteria for selecting the sample clearly identified? No.

Step 7. Research Design

Is the research design adequately described? The students were randomly assigned to one of three treatments and the design regarding the linear, navigational, and generative was well described. I do not believe there was a control group and there don't appear to have been any duplicate studies done.

Is the design appropriate for the research problem? I thought it was interesting that the authors didn't question the fact that the chosen subjects were all liberal arts students. I would have thought that an interesting extension of this design would have been to include some math and/or science students as well.

Does the research design address issues related to the internal and external validity of the study? I believe the internal validity for this study is consistent; however, it is not an extensive study as the sample size is quite small and does not seem to be representative of the whole population. Regarding the external validity, it is difficult to assess because, although the students were randomly assigned to a treatment, there is no indication as to whether or not the course was required or just "extra". In fact, even if they did received extra course credit, their motivation should have been addressed within the design.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? The data collection was obtained from a self-determination questionnaire, a problem-solving condition, meaningful generations, and a 52 reading comprehension measure.

Are the data collection instruments described adequately? Yes.

Do the measurement tools have reasonable validity and reliability? Unknown because there is no in-depth description of how the scorers were moderated although the authors did mention there was a content expert, a high school history teacher, and two educational psychologists.

Step 9. Data Analysis

Is the results section clearly and logically organized? There was a short but descriptive results section that included a brief discussion of the statistical analysis tools employed and the reasons for each choice.

Is the type of analysis appropriate for the level of measurement for each variable? Yes.

Are the tables and figures clear and understandable? Yes. There were several tables including: a means and standard deviation chart, a chart illustrating the number of generative activities, and on reading comprehension.

Is the statistical test the correct one for answering the research question? Scatter plots reflecting a positive linear relationships between the variables, MANOVA, standard deviations, Wilks' criterion, discriminant function analysis, Tukey's honestly significant difference procedure, t-tests, C-values, path analysis, and linear regression were all employed. However, I am not familiar with all of those tests and so am unable to answer whether they are correct for this particular situation.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes. In particular, the researchers were surprised that the generative condition did not yield as positive a result as was obtained from those with navigational control only.

Are the interpretations based on the data obtained? Yes.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Yes. For example, students in the navigational condition were found to have significantly higher degrees of self-determination than students in the other two conditions which is consistent with earlier studies confirming that students improved with increased learner control.

Are unwarranted generalizations made beyond the study sample? No.

Are the limitations of the results identified? In some cases. For example, when students had reading comprehension goals (as contrasted with problem-solving goals), there was no apparent difference between the navigational, linear, or generative conditions. The authors indicated that this might have be attributed to poor design, measurement error, other sample-related problems, or the lack of a treatment effect. Interestingly, there was no comparison of the samples used for each treatment although comparisons were made across treatments.
Are implications of the results discussed? Yes. At the time of the study, the researchers indicated that generative activities, although useful for some students, were less than optimal and that, in some cases, might have even hindered the progress of some because of time spent off-task. The authors go on to say that although the literature clearly indicates learners perform better when they have more control, educators must pay attention to the actual goals of the learners because those appear to have a bearing on the success and/or failure of the student in that particular task.

Are recommendations for future research identified? Yes. Future research should continue to compare navigational paths of different groups of individuals, including high and low achievers, knowledge seekers, and confused hypermedia users or users who have adopted different goals for using hypermedia.

Are the conclusions justified? It would appear that the conclusions are justified; however, the results really did little to dispel the controversy surrounding the use of hypertext in the educational arena and this issue was not addressed. Certainly, one suggested study should be to investigate why the results are inconclusive with various studies.

Article 8. Citation: Brownlee, J., Purdie, N., & Boulton-Lewis, G., (April, 2001). Changing epistemological beliefs in pre-service teacher education students, Teaching in Higher Education, 6(2), 247-269.

Article Summary: A teaching program designed to foster the reflection on and development of epistemological beliefs was implements with 29 pre-service graduate teacher education students in Australia. As part of the year long teaching program, students were required to reflect on the content of an educational psychology unit in relation to the epistemological beliefs. The students were interviewed at the beginning and conclusion of the teaching program. The questionnaire used was designed to measure beliefs about knowing. The results of both the quantitative and qualitative data analysis indicated that the group of students engaged in the teaching program experienced more grown in epistemological beliefs. Certainly, the success of the teaching program has implications for how teacher educators develop learning environments.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. The authors indicate that the intention of the study was to improve learning for pre-service teacher education students but that in order to accomplish this, they would have to have a clearer idea as to the development of current students regarding epistemological beliefs.

Is the problem adequately narrowed down into a researchable problem? Yes. Current research indicates that teaching programs aimed at improving learning may need to focus on students' epistemological beliefs and that the focus of interventions should attempt to help students see that critical interpretation is sometimes necessary.

Is the problem significant enough to warrant a formal research effort? At the present time, there is sufficient research available to guarantee an interest in epistemological interventions, particularly in light of the fact that there is not yet an agreed upon benchmark upon which to work although Perry's work comes close to being seminal in this area.

Is the relationship between the identified problem and previous research clearly described? Yes. The different research theories regarding epistemological beliefs are clearly explained.

Step 2. Literature Review

Is the literature review logically organized? Yes.

Does the review provide a critique of the relevant studies? No. The intent of the authors does not appear to be that of critics but as of organizers and contributors to the growing body of epistemological intervention research.

Are gaps in knowledge about the research problem identified? Unknown.

Are important relevant references omitted? Unknown.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? Not relevant.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? The purpose of this study was to supplement already existing (but not conclusive) epistemological belief interventions. As such, the concept addressed would involve how interventions could change established belief systems, particularly for educators.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? The independent variables included: age, gender, areas of study, and teaching experiences. The dependent variable was to involve the amount of change observed over a period of one year.

Are any confounding variables present? If so, are they identified? None were identified.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? The hypothesis was that, with intervention, the subject's epistemological belief system would become more sophisticated.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Although the aforementioned hypothesis was assumed, the issue was not described.

Do the hypotheses logically flow from the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? There were 29 students in the treatment group and 25 in the comparison group although the overall population size isn't mentioned.

Is the sample representative of the defined population? Unknown.

Is the method for selection of the sample appropriate? The purpose of the study was explained to the students who were then given the opportunity to opt out for another group but none withdrew.

Is there any sampling bias in the chosen method? The undergraduate qualifications of the students were varied and included: business, social science, psychology, visual and performing arts, science, literature, and nursing. However, there were only 3 males and 26 females. The authors do not indicate whether this gender imbalance is irregular; however, they do indicate the two groups have a similar make-up.

Are the criteria for selecting the sample clearly identified? No. The only criteria mentioned was that of being a graduate student in a pre-teaching training course.

Step 7. Research Design

Is the research design adequately described? As the purpose of the study is to determine the epistemological beliefs of individuals and to track any changes, the design incorporating both qualitative and quantitative components appears appropriate.

Is the design appropriate for the research problem? Yes.

Does the research design address issues related to the internal and external validity of the study? The internal validity is address because this study forms a component of a program already known to the students and the external validity is established because of the real-life situation of a tutorial group within a school setting.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? Regular journal reflections and the use of the Schommer questionnaire, and interviews were conducted twice during the year.

Are the data collection instruments described adequately? Yes.

Do the measurement tools have reasonable validity and reliability? The test-retest reliability on the Schommer is .70. Inter-item reliabilities for individual items within each factor range from .63 to .85. The comparison group wrote two written statements about their beliefs two times during the year.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes. The authors indicated that the analysis used a predominantly inductive approach. The results were presented both from a qualitative and quantitative point of view.

Is the type of analysis appropriate for the level of measurement for each variable? Yes.

Are the tables and figures clear and understandable? Yes.

Is the statistical test the correct one for answering the research question? Yes. This was one of the few studies that used qualitative information which is the reason I chose it to see how the journals and interviews were interpreted.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes. In this particular study, the students appeared to experience a stronger growth in inconsistent beliefs which were not necessarily the expected results although the authors did indicate that even a change to inconsistent beliefs signified a change in their original belief system.

Are the interpretations based on the data obtained? Yes.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Yes. In particular, the authors believed that an understanding of metacognition would enable the teachers to become better trained and more capable of dealing with the ill-defined problems found in today's educational environment.

Are unwarranted generalizations made beyond the study sample? Possibly. Although the authors presented a fairly comprehensive literature summary of epistemological studies, there didn't appear to be any research that indicated the appropriate change from one level to another. In fact, one of the researchers had indicated that a possible problem with, for example, Perry's work was that it was of a linear nature and the boundaries between one stage and the other were not clearly delineated. Therefore, when the authors indicate that changes to one's belief system, even though of an inconsistent pattern, were indicated of change, this was not clearly explained anywhere else within the study and seemed to be more representative of a rationalization rather than a reasoned argument.

Are the limitations of the results identified? Yes. The fact that the students experienced stronger growth of inconsistent beliefs was discussed.
Are implications of the results discussed? The authors indicate that teaching programs should be developed that will encourage students to reflect on epistemological beliefs in order to help the students become more metacognitive.

Are recommendations for future research identified? Not really although a more thorough discussion of the actual intervention techniques would have been appropriate as well as a discussion as to whether or not different teachers were in charge of each group.

Are the conclusions justified? Potentially, this study could have been useful in helping to refine teaching education programs but it didn't seem to actually go anywhere although the design appeared appropriate. There was too much left unexplained in terms of the actual interventions, the number of teachers, and the types of reflections offered by the students. Additionally, although the students were participating in the study for one year, it apparently wasn't long enough to show viable and consistent changes. At best, the conclusions were that self-reflection may provide an atmosphere of potential personal change.

Article 9. Citation: Welch, M., (November/December, 2000). Descriptive analysis of team teaching in two elementary classrooms: A formative experimental approach, Remedial & Special Education 21(6), 366-377.

Article Summary: This article reports the results of a descriptive analysis of team teaching in two classrooms. The study employed a relatively new approach to field based research, referred to as formative experiments, to conduct formative and summative evaluation procedures. Results of quantitative and qualitative analyses to assess student outcomes, teaching procedures, and teacher impressions are presented. Descriptive information regarding planning time, type of instructional format of team teaching, student groupings, and follow-up evaluation time was obtained through weekly teacher logs. Focus groups and written teacher comments provided information regarding teacher satisfaction of the team-teaching experience. Performance of typical students and students with learning disabilities on curriculum-based assessment measures given pre-and post team teaching suggest academic gains in reading and spelling for all students.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Through a critical review of current literature, the author provides several areas that need additional work and that will be addressed in this current study.

Is the problem adequately narrowed down into a researchable problem? Yes. The author identifies several research questions: (1) How much time did teams of teachers spend planing, implementing, and assessing their team teaching activities?; (2) What formats of team teaching were employed by teaching teams?; (3) What student grouping formats were employed during team teaching?; (4) Will there be an overall improvement in student performance on criterion-referenced assessment scores?; (5) Will teachers achieve their instructional objectives using team teaching?; and (6) What will be teachers' impressions and levels of satisfaction regarding the use of team teaching?

Is the problem significant enough to warrant a formal research effort? Yes. Recent literature suggests that the number of adults in a teaching room may improve the students' performance and, as it is physically impossible to minimize the numbers of students per classroom, this option may prove viable in improving students' outcomes.

Is the relationship between the identified problem and previous research clearly described? Definitely. The author presents several areas that are not closely examined within the literature and incorporates those areas into this study.

Step 2. Literature Review

Is the literature review logically organized? The organization appears to follow the needs of the researcher rather than a chronological or thematic organization.

Does the review provide a critique of the relevant studies? The author doesn't critique the relevant studies so much as indicate the gaps present in current research.

Are gaps in knowledge about the research problem identified?

Are important relevant references omitted? Unknown.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? Interestingly, the author appeared more interested in justifying the methodology which included alternative approaches and recommendations rather than discussions of team teaching itself. Additionally, the type of qualitative research done, in this case, would seem to add more to the body of research while, at the same time, clarify issues not consistently addressed in earlier studies.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? Not relevant.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Not really. Although the author refers to this study as a combination qualitative/quantitative experimental design, I find little evidence for the quantitative portion.

Are any confounding variables present? If so, are they identified? Not really.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? The author indicates that the purpose of this study is to be descriptive and it appears to be so. I suppose that an assumed hypothesis would be that team teaching, whatever the approach, will provide more successful and long-term student success.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Not relevant.

Do the hypotheses logically flow form the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? Two schools were chosen as the team teaching sites and two teachers were chosen for each of the two schools.

Is the sample representative of the defined population? The teachers were apparently a subset of a larger group that had completed a video-based staff development training program. However, the total number of staff participating in the program is unknown. The total of schools in the area is unknown as well.

Is the method for selection of the sample appropriate? Although the teachers are a subset as mentioned above, it is not known how they were chosen.

Is there any sampling bias in the chosen method? There were only two schools chosen and both were said to be in the middle to upper class socioeconomic level. Additionally, all the teachers were female Caucasian and the composition of the classes were such that different nationalities were listed (with no indication as to whether or not there were ESL difficulties) as well as students requiring special education and those with learning disabilities. However, with all that the descriptive statements don't appear to describe any potential bias. Perhaps this was due to the fact that a descriptive study is just that, a descriptive statement.

Are the criteria for selecting the sample clearly identified? No.

Step 7. Research Design

Is the research design adequately described? The research is described as a mixed methodology of both qualitative and quantitative research. The authors stated that the descriptive information is necessary for social validation and to add the to body of literature about team teaching.

Is the design appropriate for the research problem? As journals, log recordings, power entries, and student performance data were necessary. Because of the dependence on the opinions and thoughts of the teachers, then the descriptive nature was appropriate. However, I am not as sure about the relevance of the student performance because I can see no evidence that variables such as learning disabilities, special needs, English language proficiency, etc. were factored into the study.

Does the research design address issues related to the internal and external validity of the study? Certainly the use of journals to measure trends and opinions offers acceptable internal validity; however, I do have some questions as to the external validity of the study based on problems with the study design mentioned in other sections.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? The formative and summative evaluation variables included time analysis teaching formats and student grouping formats, student performance, and instructional outcomes and teacher impressions and satisfaction. Data regarding planning and teaching formats were collected on the same weekly log. Instructional planning consisted of recording the academic activity as well as the materials used and determining how student performance on the activity would be measured. Student performance was measured by a comparison of pre-and post-team teaching mean scores.

Are the data collection instruments described adequately? Yes, there were descriptions of the time analysis and teaching formats, student performance and instructional objectives, and teacher impressions and satisfaction. However, there was one monthly meeting with the teachers that would last from 60-90 minutes and it doesn't appear that these meetings were recorded.

Do the measurement tools have reasonable validity and reliability? Unknown.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes. The results section is divided into a team-teaching and student-grouping formats, student performance and instructional objectives, and teacher impressions and satisfaction.

Is the type of analysis appropriate for the level of measurement for each variable? The level of analysis appears to have been at a basic level. For example, the results on student performance included paired t-tests, significant differences, and means at both pretest and posttest tests. However, there was no description of the type of testing provided. Additionally, although the purpose of the study was to provide descriptive information on the teachers, there does not appear to have been much analysis beyond the groupings of certain comments. For example, themes do not seem to have developed and there appeared to be direct comparisons between the two schools and two groups of teachers although there is no evidence to support the compatibility of the two environments.

Are the tables and figures clear and understandable? There are several tables included depicting the results indicated above.

Is the statistical test the correct one for answering the research question? Not as detailed as one would have expected.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Some of the data interpretations, particularly with regards to special needs students, appear generous at best. Eight students were classified as learning disabled and yet the researchers indicate impressive growth, such as a 72% gain in reading fluency…. However, the environment was not controlled for different variables and it doesn't appear possible that team teaching alone could have accounting for that degree of improvement. Indeed, if true, then team teaching would truly become the panacea of all special education students' needs! Additionally, I don't believe that generalizations such as, "Consequently, it appears that team teaching can supplement rather than exclusively supplant segregated service delivery in specialized settings" (p. 370), are warranted based on this particular study.

Are the interpretations based on the data obtained? Sometimes. See above statement.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Sometimes. However, the relationships are somewhat obtuse. For example,

This implies that team teaching could have a positive impact on all students' performance in inclusive settings. These results appear to support the study by Self, Benning, Marston, and Magnusson (1991) that also incorporated curriculum-based assessment to measure students reading skills before and after teach teaching. However, it is not possible to discern whether student achievement in this study would have occurred anyway, without team teaching, as no comparison group was utilized. (p. 371)

Are unwarranted generalizations made beyond the study sample? At times.

Are the limitations of the results identified? Yes. A number of variables could not be controlled. Students could not be randomly assigned to groups. Similarly, comparison or control groups were not employed due to the voluntary nature of the implementation. Likewise, there was no observation of the team teaching to validate the integrity of the team-teaching procedures or the information that was self-reported on the planning logs. Finally, the results of this study cannot be generalized to other settings, grades, or student populations.
Are implications of the results discussed? The authors suggest that perhaps the most important contribution of this study is the student outcome data, which suggest that students have demonstrated improvement in each of the academic areas. Additionally, they state that team teaching does not appear to have an adverse effect on the academic performance of special needs students. The data logs appeared to indicate that station teaching is perhaps the most effective team teaching tool for both large and small group environments. Finally, the authors offer the suggestion that teacher education programs must continue to emphasize and examine factors such as attitudes, beliefs, values, and role expectations before establish team teaching.

Are recommendations for future research identified? Yes. This section was particularly helpful and well written. Many suggestions for future research are identified including: (1) focus groups and interviews may also shed light on why a particular format of team teaching is used or preferred over others; (2) it is important to assess outcomes of all students and not just those eligible for special education services; (3) continued efforts to understand complex social variables associated with team teaching; and (4) continued use of log journals coupled with qualitative methods such as interviews may shed light on why some teaching teams require less time to plan than others, as well as in what ways teachers find the time to plan.

Are the conclusions justified? It would appear that the suggestions for future research are indeed valid as are the limitations of this particular research. However, I would hesitate to apply any of the direct conclusions, particularly those related to the special education students, due to the very small sample size as well as lack of a control group.

Article 10. Citation: Reeves, J., (August, 2000). Tracking the links between pupil attainment and development planning, School Leadership & Management, 20(3), 315-333.

Article Summary: The study was part of a more extensive research project commissioned by the Scottish Office Education Department in 1994 called the Improving School Effectiveness Project. The project gathered extensive quantitative and other data on 80 schools and qualitative data on a subset of 24 schools. The research brief for the study was to examine the theme of development planning within the context of the overall project. The article tracks through the development of a set of associations between the value-added attainment results of 12 primary and 12 secondary schools and some characteristics of their approach to development and features of their culture and organization.

Step 1. The Problem

Is the problem clearly and concisely stated? Not particularly.

Is the problem adequately narrowed down into a researchable problem? Apparently the problem addresses development planning.

Is the problem significant enough to warrant a formal research effort? An interesting question because, certainly, development plans are important to all organizations. I chose this article because it was, in fact, part of a much larger research project that had been ongoing for a number of years.

Is the relationship between the identified problem and previous research clearly described? No, not clearly described. However, throughout the report, references to previous research are woven into the results of this current research.

Step 2. Literature Review

Is the literature review logically organized? Not really as citations are made throughout the report but are not organized in and of themselves.

Does the review provide a critique of the relevant studies? No. Studies are cited that coincide with either the hypotheses or the rationalization for this current study.

Are gaps in knowledge about the research problem identified? No.

Are important relevant references omitted? Unknown.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? Not relevant.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? This was an interesting study because it appeared to evolve throughout the years and still maintain a basic framework. The researchers hoped to obtain a conceptual framework for determining why certain schools were more successful than others in their approaches to change.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? The study was designed along two strands: one based on capacity building and the other on the quality of the school's development plan. As such, several variables were defined including: (1) people's level of understanding; (2) attitudes; (3) skills; (4) resource availability; (5) internal structures; and (6) internal procedures.

Are any confounding variables present? If so, are they identified? Apparently, according to the authors, there were many problems that evolved out of the study that were, in fact, partly due to the longevity of the program in addition to the gender imbalance, among other areas.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? There were two primary hypotheses mentioned: (1) Schools which adopt strategies which have a high level of impact on the capabilities of staff and the school's capacity to accommodate change are more likely to add value to their pupils' attainments that those who do not; and (2) School which produce development plans which conform with models of good practice are more likely to add value to their pupils' attainments than those which produce poor development plans.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? No.

Do the hypotheses logically flow from the theoretical or conceptual framework? Yes.

Step 6. Sampling

Is the sample size adequate? It would appear that the size was adequate.

Is the sample representative of the defined population? Eighty schools across Scotland were involved in providing attainment and attitudinal data from a cohort of pupils and attitudinal data from teachers and parents. From these 80 schools, a representative sample of 24 schools were chosen to be the case study schools which would provide the evidence for the qualitative strand of the Improving School Effectiveness Project.

Is the method for selection of the sample appropriate? There is no method discussed.

Is there any sampling bias in the chosen method? Not known.

Are the criteria for selecting the sample clearly identified? No criteria are described.

Step 7. Research Design

Is the research design adequately described? Yes. However, as was noted in the study, this work has extended over a period of time and, therefore, much of the basic descriptions have been either amended or edited over time.

Is the design appropriate for the research problem? The use of both qualitative and quantitative research is appropriate for this study as it involves an analysis of change and is an attempt to develop a change model for both primary and secondary schools.

Does the research design address issues related to the internal and external validity of the study? This is difficult to determine; however, one would assume that the internal validity of this study is accurate as well as the accuracy of real-life situations for the external validity. However, it appears that there were many changes occurring within the study throughout the years.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? Yes.

Are the data collection instruments described adequately? Yes. For each school, qualitative data about development planning was collected using the development analysis interview and the school's development plan. As the basis of the DAI, each school was asked to choose a major development which had taken place in the school in the last few years. The purpose of this interview was to find out how the school leaders perceived the management of change. Two other sources of information were included in the data collection: (1) teacher interviews and (2) teacher questionnaires.

Do the measurement tools have reasonable validity and reliability? There are no validity and reliability scores provided.

Step 9. Data Analysis

Is the results section clearly and logically organized? This section was logically order with information about both the leaders of the school and the teachers. Not all of the analysis was detailed because it had been published in a previous article but that article was cited.

Is the type of analysis appropriate for the level of measurement for each variable? There was an interesting analysis of the Interview transcripts. The analysis led to the calculation of a strategy impact score of the development and the second analysis identified key features associated with the description of the initiative. The authors said that by using such a simplistic they were unable to account for variance, particularly with the secondary school data, and there was evidence of sharp conflict between staff over the individual innovations. By reading through the transcripts several times, different factors could be applied. However, I saw no evidence of the more traditional analysis parametric and nonparametric statistics. Perhaps this is due to a particular methodology applicable to the school system in Scotland or the fact that this was not the original descriptive study article.

Are the tables and figures clear and understandable? Yes. The tables are incorporated into the report.

Is the statistical test the correct one for answering the research question? I saw no evidence of a particular statistical test.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? In the analysis of change, several factors were determined to be quite important and included: (1) the importance of new people; (2) a high level of resistance; (3) involvement of parents; (4) involvement of students; (5) involvement of learning support; and (6) problems with sustaining resources. Of note, is that these factors may have had a positive or negative influence on the implementation of change. There was significant descriptions of the differences between the primary and secondary data.

Are the interpretations based on the data obtained? Because this study was quite extensive and long term, I assume the interpretations are based on the obtained data. However, it is important to note that not all of the details were present in this report.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? In some cases, the findings were related to previous research e.g. the complexity of change and the importance of staff involvement in the development process, etc.

Are unwarranted generalizations made beyond the study sample? I could see no evidence of unwarranted generalizations.

Are the limitations of the results identified? Limitations of the study were mentioned throughout the report and included such items as an imbalanced gender grouping which may have affected results.
Are implications of the results discussed? Although the study did succeed in terms of the original brief in showing a correlation between the processes of planning for development in schools and school effectiveness, the author believed this to be a generalization that left more questions than answers. One point mentioned was that this particular study dealt only with internal factors affecting change and made no attempt to analyze external factors which, the author stated, made the issue of change even more complex.

Are recommendations for future research identified? The author makes some interesting suggestions regarding development plan future research studies. Some of those suggestions include: (1) looking at whether or not primary and secondary development should involve different improvement strategies; (2) whether the focus of development should also include the content development of change as well as the structure of the implementation itself.

Are the conclusions justified? It would appear that, at least based on this study, the described findings are minimal in contrast to the energy and longevity of the study. However, this may be due to the fact that this was not the original description of the study results but an additional article written to supplement the findings.

Article 11. Citation: Retalis, S., Psaromiligkos, Y., & Avgeriou, P., (2000). Web engineering: New discipline, new educational challenges, Information Services & Use, 20(2/3), 95-109.

Article Summary: Sophisticated applications are being deployed in increasing numbers on the WWW without having been developed according to appropriate methodologies and quality standards. The main reason for this ad hoc development philosophy is the lack of specialized training/education on the web engineering subject domain. This discipline is new and has recently started getting the attention of researchers, developers, and of the major players in the web-based application development market and training market. There is now justifiable and increasing concern about he manner in which students and lifelong learners are well educated and trained in this new discipline. It's also only one year ago that few universities have started providing courses on this discipline and offer seminars to lifelong learners. The Department of Electrical and computer Engineering at National Technical University of Athens began offering a one semester course called "Internet Publishing". In this paper, an overview of the course, its web-enriched delivery method as well as the quantitative and qualitative results extracted after the completion f the evaluation study in 1999-2000.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Web engineering is concerned with the establishment and use of sound scientific, engineering and management principles and disciplined and systematic approaches to the successful development, deployment and maintenance of high quality Web based systems and applications. However, in higher education today, there is no discipline in the current curriculum that teaches all or even an extensive part of these skills. Even at universities where there are a lot of practical classes, the student has to make specific choices in courses during his/her studies such as: programming language courses, database management systems, software engineering and object-oriented design, information systems, networking and Internet protocols, in order to have adequate background as a web engineer.

Is the problem adequately narrowed down into a researchable problem? Yes.

Is the problem significant enough to warrant a formal research effort? Yes. The WWW is clearly a new electronic frontier, much like the Wild West, in which the pioneers are testing out the parameters of the system while now, the sheer numbers of users and sites, demands a more methodical approach to course design on the Internet.

Is the relationship between the identified problem and previous research clearly described? Yes.

Step 2. Literature Review

Is the literature review logically organized? The literature review is not organized in the traditional sense but consists of a review of web engineering courses.

Does the review provide a critique of the relevant studies? Apparently, an earlier review of the literature revealed several gaps in the research which are addressed below. However, there does not appear to be any special critique of the studies.

Are gaps in knowledge about the research problem identified? Yes. The authors point out three gaps in current research on the impact of technology in education. These include: (1) a lack of theoretical or conceptual framework; (2) the different learning styles of students related to the use of different technologies, is not taken into consideration; and (3) the feelings and attitudes of the students are not adequately investigated.

Are important relevant references omitted? Unknown.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? As indicated in the introduction, the authors do not believe there is a theoretical or conceptual framework in the area of web engineering.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? Not relevant.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Variables are listed including: (1) usage of the learning environment; (2) effect of the instructional delivery mode to students' learning styles; (3) contribution of the learning resources to the acquisition of knowledge and skills; (4) effect of the instructional delivery mode to the acquisition of knowledge and skills; (5) quality of the learning resources; and (6) a comparison of the enriched classroom delivery mode with the traditional ex-cathedra one.

Are any confounding variables present? If so, are they identified? No confounding variables are identified.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? No hypotheses were proposed although the assumed hypothesis is that the WWW learning environment is a potential enriching environment that needs categorization and that web engineering courses will facilitate the development of web courses that follow a particular theoretical or conceptual framework

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? Not show.

Do the hypotheses logically flow from the theoretical or conceptual framework? Not relevant.

Step 6. Sampling

Is the sample size adequate? The sample size seems small: 16 individuals (2 women and 14 men)

Is the sample representative of the defined population? The defined population would appear to be the number of students who successfully completed the course, 40.

Is the method for selection of the sample appropriate? There is no indication about the selection method described.

Is there any sampling bias in the chosen method? Clearly, the size is a problem. In addition, the gender balance would appear to indicate a study in an of itself as well as the fact there was a 21.6% attrition rate for the course itself and these individuals were not included in the study.

Are the criteria for selecting the sample clearly identified? No. It appears to have been by default.

Step 7. Research Design

Is the research design adequately described? The evaluation study followed a specific methodology, called CADMOS-E which is a pre-test and post-test method incorporating some aspects of the illuminative evaluation approach. It is a stepwise method supported by specially developed pre- and post-test questionnaires, which provide data for both quantitative and qualitative analysis.

Is the design appropriate for the research problem? According to the authors, this is appropriate because the focus of this evaluation is on the learning effectiveness of the course and its delivery mode as well as the identification of extensions and reviews that were required to take place.

Does the research design address issues related to the internal and external validity of the study? The specific issues of internal and external validity are not described although the internal validity would be assumed through the use of questionnaires and the external would appear to be present because of the real-live environment of the course.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? The pre-and post- test questionnaires consisted of 29 closed-end questions but also included several open-ended questions on feelings and emotions. There is no reliability or validity results for this questionnaire.

Are the data collection instruments described adequately? Not described.

Do the measurement tools have reasonable validity and reliability? Not mentioned.

Step 9. Data Analysis

Is the results section clearly and logically organized? Sort of. However, in each and every occasion the Likert-type scale was presented and this seemed gratuitous in light of the fact that it has already been explained.

Is the type of analysis appropriate for the level of measurement for each variable? The authors decided that the size of the sample was not statistically appropriate for quantitative analysis and so a comparative statistical analysis of the data was performed. The basic statistical analysis depicted the trends of the learners' opinions.

Are the tables and figures clear and understandable? The tables are absolutely necessary because the data are not presented within the report itself but the reader is referred to the tables.

Is the statistical test the correct one for answering the research question? No statistical test was used and there do not appear to have been any correlations described between the questions and trends and so one is left with just the continuum of answers for each question.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? The authors do not appear to offer interpretations of the data as the findings are presented.

Are the interpretations based on the data obtained? I could not see any indication of interpretations.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? No. This was disappointing because, in the introduction, one of the complaints about earlier research was that there were no theoretical and/or conceptual frameworks presented and yet nothing was presented here either. This report, I believe, was more of a descriptive statement about one individual course rather than a research study taking its place among many others.

Are unwarranted generalizations made beyond the study sample? No.

Are the limitations of the results identified? Except for the sample size, no limitations were discussed.
Are implications of the results discussed? None discussed.

Are recommendations for future research identified? No.

Are the conclusions justified? Not really. For example, the authors indicated that the evaluation study showed that the course was of high quality and that the open learning mode was most appropriate for the postgraduate students and yet there is no discussion regarding the high attrition rate, almost 22%, nor the lack of female participants.

Article 12. Citation: Dominguez, P.S., & Ridley, D.R., (March, 2001). Assessing distance education courses and discipline differences in their effectiveness, Journal of Instructional Psychology, 28(1), 15-20.

Article Summary: This study illustrated a new, "parsimonious" (p. 15) model that investigators interested in distance education can use to ask meaningful questions about the relative quality of distance education courses. The approach removed the emphasis from student-level data and placed it upon course-based data. Sample data comparing online and traditional higher education courses covering nine disciplines were reported. These data revealed that preparation for advanced courses was statistically equivalent whether the course prerequisites were online courses or their traditional classroom counterparts. The article further explored the usefulness of this framework for identifying a significant discipline-related difference in the relative effectiveness of online and traditional prerequisites as preparation for advanced courses.

Step 1. The Problem

Is the problem clearly and concisely stated? Yes. Focusing on student-level data tells only a limited tale. For example, generating a profile of the successful distance education student does not provide institutions with practical information for program improvement or refinement. The authors remove the emphasis on distance education students and places it on the course itself. Additionally, the expand the scope of investigations to include distance education students' subsequent performance in other classes.

Is the problem adequately narrowed down into a researchable problem? Yes.

Is the problem significant enough to warrant a formal research effort? Yes. The point is well made. What does the research say to universities about improving the quality of their on-line courses. I did, however, think it was interesting to note that the authors have mentioned nothing about the recent growth in literature about on-line instructive design models.

Is the relationship between the identified problem and previous research clearly described? Not really. There are only two sources cited in the bibliography. Clearly, the intent of the authors is not to do a literature review.

Step 2. Literature Review

Is the literature review logically organized? There is no literature review.

Does the review provide a critique of the relevant studies? Not relevant.

Are gaps in knowledge about the research problem identified? No.

Are important relevant references omitted? Yes. There are no references whatsoever to current research dealing with instructional design methodologies for the Internet.

Step 3. Theoretical or Conceptual Framework

Is the theoretical framework easily linked with the problem, or does it seem forced? Not relevant.

If a conceptual framework is used, are the concepts adequately defined, and are the relationships among these concepts clearly identified? The conceptual framework used in this study is clearly identified. The authors suggest a shift from a focus on the students to determine the validity of a course to the course itself and to the students' subsequent performances in future classes.

Step 4. Research Variables

Are the independent and dependent variables operationally defined? Traditional or online courses were two variables discussed within this study.

Are any confounding variables present? If so, are they identified? I would humbly suggest that the confounding variables are ignored, if present at all. For example, there is no indication as to the success of particular students before taking certain courses. There is, also, no discussion as to whether or not certain student or certain learning styles favored traditional over online or vice versa.

Step 5. Hypotheses

Are the hypotheses clear, testable, and specific? One hypothesis was discussed: One reason for a finding of no overall difference between course delivery formats might be that the effectiveness of online instruction varies with the department or discipline being studied.

Does each hypothesis describe a predicted relationship between two or more variables included in each hypothesis? The one variable that appeared relevant to this study regarding disciplines of study.

Do the hypotheses logically flow from the theoretical or conceptual framework? Although the hypothesis does not necessarily flow from the conceptual framework, it flow from the examination of the records the researchers were conducting.

Step 6. Sampling

Is the sample size adequate? The sample size was determined by the study itself and could not be varied within the given time period.

Is the sample representative of the defined population? The sample is not so much representative as the total population fitting the particular criteria.

Is the method for selection of the sample appropriate? The authors chose courses that were offered at the advanced level in a more traditional format where the students had completed the prerequisite course online. Hence, the guiding question: Do online courses prepare students for advanced study as well as traditionally accepted forms of prerequisites?

Is there any sampling bias in the chosen method? No.

Are the criteria for selecting the sample clearly identified? Yes.

Step 7. Research Design

Is the research design adequately described? I am assuming that the research design presented is the "parsimonious" model mentioned earlier in the report.

Is the design appropriate for the research problem? Interestingly, I enjoyed the analysis component of this report because the question was clear-cut and each statistical test explained. Therefore, I assume it was an appropriate design for this particular research problem.

Does the research design address issues related to the internal and external validity of the study? No.

Step 8. Data Collection Methods

Are the data collection methods appropriate for the study? The data collection was of past records and so involved searching rather than collecting new data.

Are the data collection instruments described adequately? Not relevant.

Do the measurement tools have reasonable validity and reliability? I am not aware of how the original data were collected.

Step 9. Data Analysis

Is the results section clearly and logically organized? Yes. The results section centered around the statistical analysis itself.

Is the type of analysis appropriate for the level of measurement for each variable? Yes.

Are the tables and figures clear and understandable? Yes. There are two tables presented: (1) final grades and (2) tabular results by department.

Is the statistical test the correct one for answering the research question? First, the frequencies were presented and, using Fisher's Exact test of significance, a probability of .09 would be associated with the distribution. Although there was no statistically significant difference, the question of a statistical interaction needed to be tested. The statistical interaction between the method and the discipline (relative advantage) appeared to show the non-Management courses were positive and Management courses negative. Two chi-squared analyses of the distributions of online passes and fails were performed to test for an interaction between target discipline and prerequisite format and, finally, a second method was used that set the expected frequencies proportionately with the rates of passes and fails.

Step 10. Interpretation and Discussion of the Findings

Does the investigator clearly distinguish between actual findings and interpretations? Yes. No interpretations of the results are offered in the results section.

Are the interpretations based on the data obtained? Yes. In particular, the authors noted possible explanations for the results shown in the Management section.

Are the findings discussed in relation to previous research and to the conceptual/theoretical framework? Not really except to confirm that the alternative model of evaluating courses will provide effective information to administrators.

Are unwarranted generalizations made beyond the study sample? No. All interpretations are clearly shown to be possibilities.

Are the limitations of the results identified? No.
Are implications of the results discussed? Yes. The authors indicate that there appears to be no significant difference between the results at higher level courses between those students who completed the prerequisites in the traditional manner or online. Therefore, institutions may continue to expand their online course offerings. However, the authors do point out that some course material may not be conducive to online management and that administrators should consider that option.

Are recommendations for future research identified? Yes. The authors suggest that perhaps if a particular format is not suitable for a specific subject, the expectations of the instructors might be influenced. For example, if an online course provided less effective preparation for a target course in the past, teachers might come to expect that result. A phenomenon that the researchers suggest should be studied.

Are the conclusions justified? Yes.

Summary

While evaluating these 12 articles, the author was surprised at the lack of clarity within several of the studies. For example, several studies (Desai, 2000; Welch, 2000) did not appear to link their discussion of the results with the original research objectives. Other studies (De Beer, 1998; Eggen, 2000; Lloyd, 1996; Ponsoda, 1999) did not appear to handle the data in a matter conducive to successful statistical analysis. It was, in fact, disappointing to see that several authors (Brownlee, 2001; De Beer, 1998; Desai, 2000; Lloyd, 1996) made broad-based generalizations not based on the results of their own studies. Other studies appeared to excel in certain areas while skipping over the details of the study, such as an adequate description of the design itself. One problematic area appeared to be the sample size which, in many cases (Barab, 1999; De Beer, 1998; Eggen, 2000; Lloyd, 1996; Ponsoda, 1999; Retalis, 2000; Welch, 2000), appeared far to small to have been of any value. As the author mentioned in the introduction to this paper, one purpose was to evaluate both qualitative and quantitative research studies. It was discomforting to see the lack of specific text analysis in the qualitative studies (Barab, 1999; Brownlee, 2001; Retalis, 2000; Welch, 2000). In fact, most of those studies were said to be of a combination "qualitative and quantitative" design as if to provide legitimacy to their own work. Another problematic area concerned the validity and reliability measures of statistic tests. Most authors (De Beer, 1998; Ponsoda, 1999) never mentioned some of the critical terms that would have helped to establish the reliability of the study as a whole.

Bibliography

Barab, S.A., Young, M.F., & Wang, J., (1999). The effects of navigational and generative activities in hypertext learning on problem solving and comprehension, International Journal of Instructional Media, 26(3), 283-310.

Brown, K.G., (Summer, 2001). Using computers to deliver training: Which employees learn and why? Personnel Psychology, 54(2), 271-297.

Brownlee, J., Purdie, N., & Boulton-Lewis, G., (April, 2001). Changing epistemological beliefs in pre-service teacher education students, Teaching in Higher Education, 6(2), 247-269.

De Beer, M. & Visser, D., (March, 1998). Comparability of the paper-and-pencil and computerized adaptive versions of the General Scholastic Aptitude Test (GSAT) senior, South African Journal of Psychology, 28(1), 21-28.

Desai, M.S., (December, 2000). A field experiment: Instructor-based training vs. computer-based training, Journal of Instructional Psychology, 27(4), 239-244.

Dominguez, P.S., & Ridley, D.R., (March, 2001). Assessing distance education courses and discipline differences in their effectiveness, Journal of Instructional Psychology, 28(1), 15-20.

Eggen, T.J.H.M. & Straetmans, G.J.J.M., (October, 2000). Computerized adaptive testing for classifying examinees into three categories, Educational and Psychological Measurement, 60(5), 713-734.

Lloyd, D., & Martin, J.G., (March, 1996). The introduction of computer-based testing on an engineering technology course, Assessment & Evaluation in Higher Education, 21(1), 83-91.

Ponsoda, V., Julio, O., Rodriguez, M.S., & Revuelta, J., (1999). The effects of test difficulty manipulation in computerized adaptive testing and self-adapted testing, Applied Measurement in Education, 12(2), 167-185.

Reeves, J., (August, 2000). Tracking the links between pupil attainment and development planning, School Leadership & Management, 20(3), 315-333.

Retalis, S., Psaromiligkos, Y., & Avgeriou, P., (2000). Web engineering: New discipline, new educational challenges, Information Services & Use, 20(2/3), 95-109.

Welch, M., (November/December, 2000). Descriptive analysis of team teaching in two elementary classrooms: A formative experimental approach, Remedial & Special Education 21(6), 366-377.

To The Top

Hosted by www.Geocities.ws