A Paper Presented in Partial
Fulfillment Of the Requirements for
RM501 Survey of Research
Methodology
Capella University, September,
2001
By D.L. Jackson
CRITIQUE OF RESEARCH ARTICLES
(Note: Changes
suggested by the course tutor will be added later in October, 2001.)
Introduction
A problem for novice researchers is that published papers can be intimidating
in the sense that their format and vocabulary may be difficult to comprehend.
This would seem particularly relevant for social science students whose own work
may reflect a more qualitative approach to research problems. Case studies,
questionnaires, surveys, and interviews are a more common feature within their
learning programs while statistical tests are used, more frequently, for other
subject areas. Sadly, in addition to not feeling comfortable with qualitative
terminology, they may feel a need to defend their choice of research design
should they choose a qualitative format.
However, the author of this paper believes that the option of either format
should be open to all subject areas and that researchers should not need to
defend their methodology except in terms of defining its relevance. To judge
qualitative research as "better" than quantitative is, in this
author's opinion, shortsighted. It would be far better for researchers to adopt
the most appropriate design and then to do it well. It may be that the issue is
not a qualitative vs. quantitative issue but more of a good vs. bad research
problem that is, unfortunately, more easily noticed in a qualitative report. It
is hoped that the critique of the following article will enable the author to
become more familiar with terminology and in distinguishing good from bad
research.
Critique of Articles
Article 1. Citation: De Beer, M. & Visser, D., (March, 1998).
Comparability of the paper-and-pencil and computerized adaptive versions of the
General Scholastic Aptitude Test (GSAT) senior, South African Journal of
Psychology, 28(1), 21-28.
Article Summary: A computerized adaptive test was constructed from two
existing parallel paper-and-pencil versions of the General Scholastic Aptitude
Test (GSAT) Senior. Achievement in the GSAT computerized adaptive test was
compared to achievement in one form of the GSAT paper-and-pencil test. In
computerized adaptive testing the program tailors each test to the examinee's
ability level. Based on a statistical method known as Item Response Theory (IRT),
the program interactively selects test items which are at the appropriate
difficulty level for the individual being tested, thereby allowing a
considerable reduction in test length without forfeiting measurement accuracy.
The study was undertaken to investigate the equivalence of results obtained with
three versions of the GSAT: A paper-and-pencil version, a standard computerized
version, and a computerized adaptive version. The standard computerized GSAT was
included to study the effects of computerization apart from adaptive testing.
The results indicate that achievement in the paper-and-pencil GSAT and the
standard computerized version of the GSAT were not equivalent because the
examinees performed better in the paper-and-pencil version of the GSAT than in
the standard computerized version of the GSAT. Following this investigation,
certain adjustments were made to the CAT version of the GSAT. Firstly, the
linear adjustments that gave the best results when compared to the
paper-and-pencil version, were incorporated into the program software of the
CAT, thereby ensuring that scores obtained with CAT version are equivalent to
paper-and-pencil scores.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Mere computerization
of a paper-and-pencil test for administration by computer does not necessarily
optimally use the facilities that computers offer. The purpose was to obtain
the information necessary to publish a computerized adaptive version of the
GSAT that would provide results equivalent to those of the paper-and-pencil
versions of the test.
- Is the problem adequately narrowed down into a researchable problem?
Yes. The aim of this study was, therefore, to compare results on the CAT
version of the GSAT to results on the paper-and-pencil version of the GSAT in
order to make adjustments to the computerized adaptive version if required,
thereby ensuring equivalent measurement by the two versions of the test.
- Is the problem significant enough to warrant a formal research effort?
Yes.
Recent advances in microcomputer technology have not only increased general
access to computers, and encouraged the development of item response theory (IRT),
but have made computerized tests and computerized adaptive tests (CATs) viable
alternatives to paper-and-pencil testing. This has resulted in large-scale
computerization of existing tests. However, generally computerized tests are
direct copies of paper-and-pencil tests in content, format and sequence, the
only difference being that one version is administered by computer, whereas
the other is administered in the more traditional paper-and-pencil format.
- Is the relationship between the identified problem and previous research
clearly described?
Yes. In research on comparisons between conventional
tests and CATs, it has generally been found that the reliability and validity
of adaptive tests are equivalent to or even better than those of conventional
tests. Even though large differences exist between conventional test
administration and computerized adaptive testing, comparable estimations of
ability should, in principle, be obtained with both versions.
Step 2. Literature Review
- Is the literature review logically organized?
Yes. The literature
review begins with an introduction to testing, then refers to advances in
microcomputer studies as well as computerized adaptive tests (CATs) as
alternatives to paper-and-pencil testing. The authors point out that the Binet-Simon
test was, in principle, an adaptive test. The final point addressed concerned
the problem of equivalence because earlier studies have indicated the
different test formats are not necessarily equivalent.
- Does the review provide a critique of the relevant studies?
Not
really. By "critique" I would expect that some of the literature
review would include material that is contrary to the purposes of this study.
The cited research studies, however, appeared to correlate with the potential
findings of this particular study.
- Are gaps in knowledge about the research problem identified?
Possibly.
The problem of balancing the content of CATs is an issue that has not been
resolved because achieving statistical equivalence entails the adjustment of
test scores such that the resulting score distributions are comparable. This
process is not straightforward it eh case of equating paper-and-pencil tests
with CATs, because IRT has shown that paper-and-pencil tests are less accurate
than CATS at the extremes.
- Are important relevant references omitted?
I don't know as I am just
beginning to look into this particular area of research.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
I can't identify a theoretical framework in this study. I
would have thought that the differences between the testing processes of
pencil-and-paper vs. computerized or computerized adaptive testing.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
Not
relevant.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
The
specific terms, independent and dependent variables, are not specifically
defined. However, based on the discussion of the method used, the reader
assumes the independent variable is the type of testing offered and the
dependent variable is the score achieved by each student.
- Are any confounding variables present? If so, are they identified?
The
confounding variables are not specifically; however, I believe there are
several. One of which deals with the verbal tests: Three of the sex sub-tests
of the GSAT are verbal tests, therefore no black pupils were included in the
sample, because English or Afrikaans, are often not their first language.
(Note: This survey was completed before May, 1998 at which time apartheid was
abolished!) Another confounding variable may have been the following: There
was some attrition of the original sample size due to testing taking place on
separate days.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
No. The hypotheses
are never stated because the purpose of the study was to obtain the
information necessary to publish a computerized adaptive version of the GSAT
that would provide results equivalent to those of the paper-and-pencil
versions of the test.
- Does each hypothesis describe a predicted relationship between two or
more variables included in each hypothesis? Not relevant.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
Depends. The population consisted of 16
year old English-speaking and Afrikaans-speaking high school pupils; however,
no black students were included and the study was conducted in South Africa
where there are English-speaking and Afrikaans-speaking pupils. The sample
size was 613.
- Is the sample representative of the defined population?
This is
difficult for me to know as I don't know what the total population is.
- Is the method for selection of the sample appropriate?
A random sample
of 20 schools were selected for the study, omitting those schools which were
included in the original standardization of the GSAT paper-and-pencil test,
and also omitting schools with fewer than 500 pupils. Within each school, 18
boys and 18 girls were drawn randomly from the 16 year old pupils. Having
lived in South Africa, I am fairly confident that schools with fewer than 500
pupils were those with people "of color" because the
"white" schools were much bigger.
- Is there any sampling bias in the chosen method?
Yes, racial and
language bias.
- Are the criteria for selecting the sample clearly identified?
Yes. The
authors indicate the sample of 18 boys and 18 girls were drawn randomly but
there is no indication as to how this breakdown of exactly 50% compares with
the total population of English-speaking and Afrikaans-speaking pupils.
Additionally, the sample was split into two groups of 242 and 371 pupils for
the comparisons to evaluate the equivalence of the different versions of the
GSAT. However, there was no explanation as to how the groups were divided. I
have assumed that the two groups were tested on different days as well but
this data is not provided.
Step 7. Research Design
- Is the research design adequately described?
No. The rationalization
for the particular research design was not discussed. Most of the description
of the methodology concerned the implementation of the test itself.
- Is the design appropriate for the research problem?
No. I was
concerned about the description of data analysis because few statistical
techniques were applied within this study and so it was unclear to me whether
the results of the study were warranted.
- Does the research design address issues related to the internal and
external validity of the study?
Not really. The paper-and-pencil version
of the GSAT had previously been standardized on a large, representative sample
of Afrikaans and English-speaking pupils between the ages of 13 years 6 months
and 18 years 6 months. However, I am uncertain as to what is meant by the term
"standardized". The terms internal and validity are not mentioned
within the study.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
There was
just the one data collection method described and, although the two groups
were tested on separate days, there was no indication as to why they had to be
tested on separate days.
- Are the data collection instruments described adequately?
Yes. The
collection instruments were adequately described and the students were allowed
to "practice" for several questions before beginning the
computerized tests.
- Do the measurement tools have reasonable validity and reliability?
Not
sure. The computerized test was designed using pre-standardized
paper-and-pencil samples. The MicroCAT program, used to design the adaptive
test, was not fully described and was said to be the only commercial package
available at the time. The authors made the comment that the paper-and-pencil
form of the GSAT had been extensively investigated and documented in the
manuals and that, therefore, it was considered sufficient to prove the
equivalence between scores of the CAT version of the GSAT and one of the
paper-and-pencil versions of the GSAT which would then ensure that the
validity and reliability of the GSAT paper-and-pencil version could be claimed
for the computerized adaptive GSAT. However, earlier in the study the authors
had indicated that the use of the computerized adaptive GSAT employed a
completely different testing process and so if the process is different, how
can the validity and reliability be transferred?
Step 9. Data Analysis
- Is the results section clearly and logically organized?
No. The
results section was clearly labeled and yet there were no specific data
described. Statements such as: In Table 1, the correlations between the
P&P and PC versions and results of t-tests for related groups are also
provided. One had to look to the tables for to find out that a SD and means
were used as well as a level of significance, and possibly a Pearson product.
- Is the type of analysis appropriate for the level of measurement for each
variable?
The type of analysis does not appear complete to me because
there are no additional techniques to evaluate the correlation between the
variables. A normalized standard scale was used to convert the theta scores of
the adaptive test to integer scores. Following this, and using the converted
theta scores as raw scores, norms with a mean of 100 and standard deviation of
15 were calculated to convert the scores obtained on the computerized adaptive
GSAT to SA scaled scores.
- Are the tables and figures clear and understandable?
The tables are
clear and understandable but this clarity is belied by the incompleteness of
the actual article itself.
- Is the statistical test the correct one for answering the research
question?
I believe that the analysis of the data is not complete.
Because the tables provide additional information that is not explained in the
report, I would assume that the individuals did not do their own number
crunching and were, therefore, unable to explain their work.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
The authors indicated that computerized testing changes
the dynamics of the testing situation and that this supported earlier findings
of other researchers. They explained that the differences in the scores could
at least partially be explained by the unfamiliar testing environments. They
also indicated that a second explanation could be the fact that the MicroCAT
does not allow the students to go back and change answers which would affect
the computerized version more than the adaptive version because the student
would eventually reach his/her appropriate level. However, one assumes that
the paper-and-pencil test-takers were allowed to change their answers which
would seen to compromise the testing results particularly because the
researchers used the determined that the computerized testing changed the
results and indicated the tests were not comparable.
- Are the interpretations based on the data obtained?
The authors don't
discuss the major interpretations which are the different scores; however,
they discuss the unfamiliar testing situations which were never really
discussed within the study.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Sometimes. As mentioned above, some of
the results are related to previous research but I saw no indications of
relations to a conceptual/theoretical framework.
- Are unwarranted generalizations made beyond the study sample?
Yes.
Without specifically testing for this situation, the authors indicated the
present could support the findings of another research who reported that
approximately 32% of the examinees were able to improve their ability scores
in a classroom test when they were allowed to review their answers on a
computerized test.
- Are the limitations of the results identified?
No limitations are
mentioned.
- Are implications of the results discussed? No with the exception
that the linear adjustments are vaguely described so as to ensure
equivalence between the two testing versions.
- Are recommendations for future research identified?
No.
- Are the conclusions justified?
No. The major conclusion was that
considering the time saving of approximately 75% in test administration time,
the computerized adaptive GSAT is a useful alternative to the GSAT
paper-and-pencil version; however, this was never mentioned in the actual test
design segment as a possibility.
Article 2. Citation: Ponsoda, V., Julio, O., Rodriguez, M.S., &
Revuelta, J., (1999). The effects of test difficulty manipulation in
computerized adaptive testing and self-adapted testing, Applied Measurement
in Education, 12(2), 167-185.
Article Summary: One aim of this work is to gather new data and to carry out
a new CAT (computerized adapted test) versus SAT (self-adapted test) comparison
for estimated ability, standard error of estimated ability, posttest anxiety,
and testing time. Previous woks were concerned with an efficiency comparison
between SAT and CAT, so that time spent on choice of item difficulty was
including in testing time. The testing time variable in this study excludes time
spent on difficulty choices, and allows us to compare time invested in answering
the item only, under both SAT and CAT conditions. Any differences appearing may
be due to the psychological processes involved in responding to the type of test
in question. A second aim is to check the effects of the number of items passed
on proficiency and anxiety measures. To achieve a different number of items
passed, easy and difficult versions of each type of test were needed as no
previous manipulation has been carried out in SATs. One of the aims of this work
was to study the motivational and psychometric effects of changing test
difficulty in SATs an din CATs. The mean number of items passed was higher in
the ECAT and ESTA conditions, as compared to DCAT and DSAT, respectively. This
means that tests differing in difficulty were obtained, despite procedural
differences. The study did not find clear positive SAT effects on ability and
state anxiety. Ability and posttest anxiety correlations did not show the
expected pattern, as stronger correlations have been found in the two easy
conditions, rather than in the CATs. Suggestions for future research were
suggested.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Research is trying
to obtain new computerized test formats in which motivational aspects of the
examinees are not disadvantaged by comparison with traditional testing
formats. The aim is to find SATs without motivational drawbacks and with null
or acceptable loss in precision and efficiency when compared to CATs.
- Is the problem adequately narrowed down into a researchable problem?
Yes. Easy and difficulty versions of SATs and CATs were compared with respect
to estimated ability, posttest state anxiety, number of correct responses,
testing time, anxiety change, and standard error of ability.
- Is the problem significant enough to warrant a formal research effort?
Yes. Current research trends clearly indicate a need to determine
best-practices with regards to both SATs and CATs.
- Is the relationship between the identified problem and previous research
clearly described?
Yes. There is a careful delineation between the
problem of computerized testing and previous research comparing the different
variables.
Step 2. Literature Review
- Is the literature review logically organized?
Yes. The literature was
logically organized and covered a wide range of topics ranging from estimated
ability of SATs and CATs; invariance property of item response theory; ability
precision; relationship between type of test and test anxiety to algorithm
selection and comparisons of CATs and SATs.
- Does the review provide a critique of the relevant studies?
Yes. On
one occasion, the authors indicate that a particular research "attempted
to…" while, in another case, research is reported as inconclusive with
literature on both sides cited. Additionally, the research on anxiety has
followed three approaches which are identified.
- Are gaps in knowledge about the research problem identified?
Yes.
Posttest anxiety was higher in CAT as determined by Ponsoda (1997) but that
difference did not reach the significance level. However, information in this
respect provided by other authors is said to be scarce and inconclusive. Part
of the problem appears to be found in the testing time where different
techniques, in various studies, are employed for what to include in the
testing time.
- Are important relevant references omitted?
I don't know as I have just
begun searching this topic and am not aware of the seminal works in this area.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
I don't believe that the theoretical framework as to why
computerized testing may be valid is addressed. The cited literature generally
refers to a particular issue being tested e.g. a comparison between SAT and
CAT rather than the motivation for computerized testing.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
It is
clear, throughout the study, that the concept of computerized testing forms
the basis of this research.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
There
are two independent variables which are defined and well established: type of
test (CAT vs SAT) and test difficulty (easy vs difficult).
- Are any confounding variables present? If so, are they identified?
As difficulty levels are achieved by different procedures in each type of
test, crossing type of test and test difficulty would not be correct because
the difficulty level obtained in both difficult conditions (or in both easy
conditions) may not be the same. The design allows the researchers to
compute the significance of the main factor type of test, the pooled effect
of the factor test difficulty, and simple effects of the second factor
inside each level of the main factor. No other information on the
interaction effect between the two factors is provided by this design.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
Interestingly, no
hypotheses are stated; however, it is assumed that there will be statistically
significant differences within the four conditions being tested: easy CAT,
difficult CAT, easy SAT, and difficulty SAT.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
There is no predicted
relationship for this study; however, the authors do suggest explanations for
findings related to specific literature citings.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
Unknown.
- Is the sample representative of the defined population?
Unknown.
- Is the method for selection of the sample appropriate?
Unknown.
- Is there any sampling bias in the chosen method?
Unknown.
- Are the criteria for selecting the sample clearly identified?
No. A
total of 187 high school students (127 boys, 60 girls) took part in the study.
The sample was taken from a Spanish private school in Galicia, Spain. Ages
ranged from 17 to 19 years but there is no additional information provided.
Step 7. Research Design
- Is the research design adequately described?
The design is adequately
described and includes a brief discussion of the independent variables,
resulting four conditions, and the use of one factor (test difficulty) being
nested in type of test. This design allows the researchers to compute the
significance of (a) the main factor type of test, (b) the pooled effect of the
factor test difficulty, and c) simple effects of the second factor inside each
level of the main factor.
- Is the design appropriate for the research problem?
The design appears
to allow the researchers the opportunity to address their problem.
- Does the research design address issues related to the internal and
external validity of the study?
The research design does not address
internal and external validity of the study with one exception. The item bank
for questions was apparently calibrated according to a three parameter
logistic model and those details as well as additional information are
provided in a cited article which I did not recheck.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
Yes
although there were no different types of data collection described.
- Are the data collection instruments described adequately?
The
procedure is described and included verbal directions that were repeated to
each group in order to eliminate the possibility of extraneous instructions
biasing the results.
- Do the measurement tools have reasonable validity and reliability?
There is no indication of validity and reliability with the exception of the
item test bank which had been tested earlier but, even so, there is no mention
of validity and reliability. An additional test was used, State Anxiety Scale,
post pre- and post-test; however, no validity and reliability results are
indicated.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes. A .05
level of significance was used in all the statistical analyses, a two-factor
hierarchical analyses of variance were applied, and whenever the nested factor
was significant, t tests were applied as well as calculated correlations
between estimated ability and posttest anxiety for each test type.
- Is the type of analysis appropriate for the level of measurement for each
variable?
The type of analysis appeared simple in that the mean,
standard deviation, level of significance and t test were the only measures
included. No other parametric statistics appeared to be used.
- Are the tables and figures clear and understandable?
Yes.
- Is the statistical test the correct one for answering the research
question?
I would have assumed that once a statistical level of
significance was discovered, additional parametric statistics might have been
used to examine the correlations e.g. a factor and or path analysis.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes, the investigators broke down each variable being
testing and discussed the effect on the overall outcome of the study.
- Are the interpretations based on the data obtained?
Yes, however, 4%
of the results had to be discounted but it appeared the subjects were, in some
cases, picking either significantly more difficult or less difficult questions
that did not match up with their estimated ability.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
The researchers make four major
conclusions that are linked to either earlier research findings or future
research studies.
- Are unwarranted generalizations made beyond the study sample?
No.
- Are the limitations of the results identified?
Yes. For example, the
study did not find clear positive SAT effects on ability and state anxiety;
however, the researchers felt the unique aspects of the study should be taken
into account e.g. non-U.S. population, younger mean age, and limitations of
test parameters itself. However, I felt that at least some of the limitations
should have been addressed earlier in the study. At the very least, there
should have been more of an explanation regarding the population choice or the
tests themselves should have been validated and reliability results published
for this particular group.
- Are implications of the results discussed? No.
- Are recommendations for future research identified?
Yes. However, the
recommendations appeared to deal with the relationship between anxiety and
test results rather than specifically on the differences seen in SAT vs. CAT.
- Are the conclusions justified?
Yes, but not conclusive and the actual
methodology could have been improved to minimize the effect of non-validated
variables, particularly in the choice of subjects, that might have biased the
outcome.
Article 3. Citation: Eggen, T.J.H.M. & Straetmans, G.J.J.M.,
(October, 2000). Computerized adaptive testing for classifying examinees into
three categories, Educational and Psychological Measurement, 60(5),
713-734.
Article Summary: The objective of this study was to explore the possibilities
for using computerized adaptive testing in situations in which examinees are to
be classified into one of three categories. Testing algorithms with two
different statistical computation procedures are described and evaluated. The
first computation procedure is based on statistical testing and the other on
statistical estimation. Item selection methods based on maximum information
considering content and exposure control are considered. The measurement quality
of the proposed testing algorithms is reported. The results of the study are
that a reduction of at least 22% in the mean number of items can be expected in
a computerized adaptive test compared to an existing paper-and-pencil placement
test. Furthermore, statistical testing is a promising alternative to statistical
estimation. Finally, it is concluded that imposing constraints on the maximum
information selection strategy does not negatively affect the quality of the
testing algorithms.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Very clearly. The purpose
of this article is to explore the possibilities for CAT based on item response
theory in a situation in which examinees are to be classified into one of
three categories.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
The researchers address the fact that although CATs were originally developed
to obtain an efficient estimate of an examinee's ability, they can also be
used to help classify individuals.
- Is the problem significant enough to warrant a formal research effort?
Yes. With more and more computers in use, it is imperative that we understand
the significance of different testing techniques in determining which is most
effective given a particular environment and/or desired end result.
- Is the relationship between the identified problem and previous research
clearly described?
Yes. The research question is described along with a
description of the algorithms behind CAT and citations include as well the
different results obtained from CAT studies.
Step 2. Literature Review
- Is the literature review logically organized?
Yes. However, it is
brief. Of the 19 references cited, only 6 were used in the literature review
and, of those, several were mentioned more than once.
- Does the review provide a critique of the relevant studies?
Yes,
although few studies are mentioned. In some cases, the studies cited were said
to limit their conclusions to the specific situation studied and several of
the studies were published in the 1980s.
- Are gaps in knowledge about the research problem identified?
It would
appear that the literature review is not comprehensive and some references are
old but possibly not of a seminal nature as I have not seem them referenced in
other articles.
- Are important relevant references omitted?
I don't know but I would
assume that at least with respect to the algorithmic nature of CAT, there were
many other studies that could have been utilized.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or dose it
seem forced?
I don't believe this article addresses a theoretical
framework although, arguably, the algorithm discussion of CATs might fit this
category although it was not then included within the parameters of the study
as a whole.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
The
conceptual framework explained concerned a few brief statements regarding the
nature of the educational system in the Netherlands and the expression of a
need to maintain confidentiality and increase the measurement accuracy for
large groups which is quite difficult for paper-and-pencil test scenarios.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Specific variables, whether independent or dependent, are not mentioned;
however, it is assumed that the independent variable would be the type of test
taken by the subject although this area is not discussed.
- Are any confounding variables present? If so, are they identified?
Although
confounding variables may be present, this is dependent on, among other
things, the selection process for the subjects but this is not described.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
The hypotheses are
listed as H0_1:0<=011 etc. I am sure that someone with more
experience could determine the meaning; however, it gave me great difficulty.
I needed the hypotheses to be stated in words as well and, although I checked
the tables and charts as well, the hypotheses were always mentioned in that
fashion.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
I don't know because I couldn't
interpret the hypotheses although there were several research questions
mentioned: (1) Which testing algorithm is most suitable for the computerized
adaptive placement test for mathematics, given a number of practical
requirements?; (2) Which statistical computation procedures are suitable for
classifying examinees into one of three different levels?; (3) Which item
selection methods should be considered?; and (4) How do the testing algorithms
operate in terms of measurement accuracy, the number of misclassifications,
measurement efficiency, adherence to content specifications, and the
distribution of exposure rates over the item bank?
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
As the intended purpose of the study is to determine the
viability of statistical testing or statistical estimation in the testing
algorithm, the aforementioned questions do seem appropriate and to fit within
the conceptual framework of determining the best testing procedures for large
groups of subjects.
Step 6. Sampling
- Is the sample size adequate?
I was confused here. The only mention of
a sample size was the following: "In the calibration study, 268 items
were administered to a sample of 1,198 students in an incomplete design in
which each student was administered one of 16 different, though overlapping,
booklets with about 43 items" (Eggen & Straetmans, p. 716). However,
upon first reading, I had assumed that "calibration study" referred
to a pilot study and then expected to see more information later on in the
study which were not forthcoming.
- Is the sample representative of the defined population?
Unknown.
- Is the method for selection of the sample appropriate?
Unknown.
- Is there any sampling bias in the chosen method?
Unknown.
- Are the criteria for selecting the sample clearly identified?
No.
Step 7. Research Design
- Is the research design adequately described?
The design was referred
to as "incomplete" with regard to the calibration study and as a
"simulation study" when the performance of the computation
procedures and item selection methods were investigated.
- Is the design appropriate for the research problem?
It appeared, to
me, that the focus of this study was not so much on the overall design but on
the evaluation of the algorithms being investigated.
- Does the research design address issues related to the internal and
external validity of the study?
The internal and external validity
issues were not specifically addressed. However, the mathematics item bank was
said to be calibrated although the term "calibration" was not
defined.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
I am unsure
as to how the data were collected. There is mention made of the OPLM computer
program that was used in the scaling of an item bank with 250 items that was
established by imposing the constraints for the mean item difficulty and the
discrimination indices.
- Are the data collection instruments described adequately?
No.
- Do the measurement tools have reasonable validity and reliability?
Unknown.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes, the
results are clearly organized with cited tables located near the text rather
than as a set of appendices at the end. The organization included the
following sections: (1) measurement accuracy with statistical estimation; (2)
the algorithms in the conditions of the placement test; (3) statistical
estimation; (4) statistical testing; (5) comparison of statistical estimation
and statistical testing; and (6) exposure data.
- Is the type of analysis appropriate for the level of measurement for each
variable?
It appeared to me that the critical component of this study
was the work designed to address the algorithmic features of CAT. Certainly,
each subject's choice of questions were carefully analyzed via a confidence
interval, standard error, examinee's ability, examinee's true ability, etc.
- Are the tables and figures clear and understandable?
Yes, there were
several tables and charts presented that illustrated the formulas used.
Someone of more experience would have had no difficulty understanding the
nature of the charts and tables.
- Is the statistical test the correct one for answering the research
question?
Although the algorithms for item testing and the analysis of
the data is comprehensive, I see little reference to inferential statistical
procedures such as regression, ANCOVA, etc. and I was wondering whether or not
there might be a difference between statistical tests used in international
and those in American studies.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes. Each section is clearly labeled and the results
for each segment are discussed; however, the interpretations are presented in
the discussion section.
- Are the interpretations based on the data obtained?
Yes. With regard
to the testing algorithms used for the classification of examinees into three
categories, the conclusion is that statistical testing as a computation
procedure is a promising alternative to the more traditional statistical
estimation procedure.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Yes, for some aspects. Results
reported in this study on the relation between the size of the acceptable
decision error rates and the width of the indifference zone and the
performance of the test are consistent with those of earlier studies for
two-way classification.
- Are unwarranted generalizations made beyond the study sample?
No. In
fact, as noted below, the authors indicate that is a problem with the
statistical basis for this type of study.
- Are the limitations of the results identified?
Yes. The authors point
out that the comparison between statistical testing and estimating does not
have a proper statistical basis and that, therefore, the generalizability of
the results above estimating is not guaranteed. The authors point out the
problem is that, in estimation, there are no formal relationships between
indifference zones and acceptable decisions error rate. They indicated there
was a newer study that employed a different procedure that leads to a matching
of the accuracy of testing and estimating.
- Are implications of the results discussed? Yes. The following
implications are mentioned: (1) the quality of the item bank is
satisfactory; (2) the maximum of 25 items for each test is realistic; (3)
the reduction in the number of required items can be expected to amount to
between 22%-44%; (4) applying the double SPRT; (5) the imposition of
constraints on item selection in the form of content or exposure control;
and (6) the final implementation of a CAT in the placement test should be
used only after determining whether or not the algorithms operate the same
in simulations as in real testing situations.
- Are recommendations for future research identified?
Yes. The authors
suggest that the classification system should be expanded to three categories
rather than the previously studied two; the quality of the testing algorithms;
and the consequences of truncating algorithms at a maximum test length for
acceptable decision error rates.
- Are the conclusions justified?
Yes. However, the conclusions do not
appear comprehensive and the discussion appears to address the difficulties
with the study more than the appropriateness of the conclusions.
Article 4. Citation: Lloyd, D., & Martin, J.G., (March, 1996). The
introduction of computer-based testing on an engineering technology course,
Assessment & Evaluation in Higher Education, 21(1), 83-91.
Article Summary: The authors indicate that lecturers are exploring the
possibility of using non-traditional methods in some aspects of their work to
deal with the increasing number of students along with the concurrent reduction
of resources. One possibility is to apply new technology in the assessment of
students. This paper presents a controlled comparison between traditional
paper-based tests and those using a computer. It concludes that the new
technique is acceptable to students and produces results with no deterioration
in their validity and has great potential for using staff time in other areas
rather than in assessment.
Step 1. The Problem
- Is the problem clearly and concisely stated?
The problem for the
engineering department is that the program uses phase testing to assess
students which means that a series of written examinations are set throughout
the year to assess different syllabus sections. Although the system worked
with 30-40 students, there are now upwards of 100 plus students which is
causing problems with increased staff workload and a question of uniformity of
assessment when students are taught in several different groups.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
- Is the problem significant enough to warrant a formal research effort?
Yes although, clearly, the problem is set up as a case study for this
particular university and is not intended to be generalized to other areas.
- Is the relationship between the identified problem and previous research
clearly described?
There is no previous research cited. The only
explanatory notes regard the previous assessment procedure at the University.
Step 2. Literature Review
- Is the literature review logically organized?
The cited literature
regarding computerized testing, although minimal, is cited when the authors
discuss their testing strategies.
- Does the review provide a critique of the relevant studies?
No.
- Are gaps in knowledge about the research problem identified?
No.
- Are important relevant references omitted?
The authors do not appear
to have wanted a literature review component in this particular study so one
would assume that relevant references have been omitted. In particular, one
would have expected some more citations dealing with the nature of
computerized adaptive testing. However, it is important to note that this
article was published in 1996 which is a long time ago with reference to
computerized testing research studies.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or doe it
seem forced?
No theoretical framework is specified.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
No
conceptual framework is established with the exception of a perceived need to
change the system for the current lecturers and to adopt a system of
computerized testing.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
No.
Neither independent nor dependent variables are specifically addressed.
- Are any confounding variables present? If so, are they identified?
Although
the terms, are not used, certainly one confounding variable was that the
computing skills of the students was said to be minimal. However, there were
no allowances for that built into the design.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
The hypotheses are
not clearly identified; however, it is assumed that the students' performance
on both modes of testing as well as their self-assessment comments would form
the basis of the study.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
There does not appear to be any
discussion of the relationship between students, estimated and/or known
ability, and performance on the tests.
- Do the hypotheses logically flow form the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
Unknown.
- Is the sample representative of the defined population?
Unknown.
However, one assumes the sample was representative of at least the engineering
students who were enrolled in the program.
- Is the method for selection of the sample appropriate? Unknown.
However, there was a control group (N=?) who took the pencil-and-paper in
1992 while the second phase took place in 1993 with three groups who were
assigned "on a random basis" to Group A: (traditional test); Group
B: (computer-based examination); and Group C: (computer-based but over a
four day period of time). Interestingly, the group size varied with Group C
the smallest so as to minimize the extra load on the open access computer
facilities which were always in demand.
- Is there any sampling bias in the chosen method?
There would appear to
be significant sampling bias. For example, the group size; the fact that, once
begun, the computer test had to be completed in one session; and that the
Group C participants might have spoken with others involved in the study
because they had to complete their test within a four day period which could
have provided substantial sampling bias.
- Are the criteria for selecting the sample clearly identified?
No.
Step 7. Research Design
- Is the research design adequately described?
The research design is
not adequately described; however, the procedure is fairly detailed.
- Is the design appropriate for the research problem?
It would appear
that this was indeed a simplistic design and appeared to be organized more
around a case study or even an action research problem.
- Does the research design address issues related to the internal and
external validity of the study?
No issues of validity were addressed.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
There is no
description of the data collection by the control group and the only
description of data collection for the Phase Two groups indicates the computer
program marked the results.
- Are the data collection instruments described adequately?
No.
- Do the measurement tools have reasonable validity and reliability?
No.
Although I have read the study several times, there were no indications of
validity and reliability measurements.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes but there
are hardly any results.
- Is the type of analysis appropriate for the level of measurement for each
variable?
Well, not exactly. The only analysis I see is the average
score.
- Are the tables and figures clear and understandable?
There are none.
- Is the statistical test the correct one for answering the research
question?
I can't say that there was an adequate and/or appropriate test
used for answering the research question. It appears they only wanted to
determine the difference in scoring but did not determine the reliability or
validity of their measurement tools which means, in effect, that any
differences are clearly not correlated.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
I suppose so but the findings were minimal.
Additionally, the researchers used a questionnaire at the end to determine
which test the examinees had preferred but there are no notes indicating the
problems with self-assessment as a reliable tool of measurement.
- Are the interpretations based on the data obtained?
Yes but I believe
the testing was faulty and, therefore, the interpretations cannot be correct.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework? No other significant links here
except a few made about early computer testing. There are, however, no
descriptions of instructional design differences between paper-and-pencil and
computerized tests.
- Are unwarranted generalizations made beyond the study sample?
Yes. The
statement, "Comparison of results from the two modes of assessment show
that computer testing techniques can be valid and also by their nature ensure
reliability" (Lloyd & Martin, p. 89) cannot be made based on the
study described in this paper.
- Are the limitations of the results identified?
No limitations are
identified.
- Are implications of the results discussed? The implied result is
that computerized testing can be used to save staff time and that the
results will be reliable.
- Are recommendations for future research identified?
While unsupervised
computer examinations did not show any bias from the results, it was felt that
further safeguards against student collusion will need to be developed before
such open-access testing can be performed on a wider population.
- Are the conclusions justified?
If the foundation is faulty, the
conclusions cannot be justified.
Article 5. Citation: Desai, M.S., (December, 2000). A field experiment:
Instructor-based training vs. computer-based training, Journal of
Instructional Psychology, 27(4), 239-244.
Article Summary: This article evaluates the impact of instructor-based
training vs computer-based training. One of the major issues of end-user
computing is training individuals to use it effectively and so the researchers
have looked at key variables such as training support, delivery, techniques, and
individual differences that can be manipulated to enhance the training program.
The findings indicated that the major differences between IBT and CBT subjects
were attributed to the performance, enrollment for the classes, motivation and
general attitude toward training method, and satisfaction with the facility.
Step 1. The Problem
- Is the problem clearly and concisely stated?
The authors distinguish
between education and training and indicate that the market forces are causing
the corporate world to act in a defensive manner where they must constantly
struggle to keep their staff trained on the newest software. This, they
indicate, is getting to be more of a problem as administrators strive to
determine the most efficacious training methods.
- Is the problem adequately narrowed down into a researchable problem?
Yes. It is clear that, although this is termed a longitudinal study, the aims
are clear-cut in that businesses need to find an effective training
methodology for computer applications.
- Is the problem significant enough to warrant a formal research effort?
Yes. There is an ongoing search to distinguish between methodologies
appropriate to F2F vs computer-based environments.
- Is the relationship between the identified problem and previous research
clearly described?
Previous research is not addressed until the
conclusions are explained.
Step 2. Literature Review
- Is the literature review logically organized?
There is no apparent
literature review although the bibliography includes 19 references. At the end
of the study, 6 citations are made that deal with effective training
techniques.
- Does the review provide a critique of the relevant studies?
No.
- Are gaps in knowledge about the research problem identified?
No.
- Are important relevant references omitted?
As the literature review is
parsimonious at best and the article was published in 2000, I am certain that
important references are omitted.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
There was no theoretical framework proposed.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
There
was no conceptual framework proposed. In fact, this study appeared to be
instigated solely on the basis of need. The bottom line of corporate America
means efficiency is critical to their success (contrary to most educational
institutions!) and, therefore, money spent on such studies adds up to
increased profits for them and an efficiently trained workforce. It would
seem, therefore, that this study was motivated not by theoretical or
conceptual frameworks but by a business model.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
"Key" variables were defined as the training methods and tasks;
however, the terms independent and dependent variables were not mentioned.
- Are any confounding variables present? If so, are they identified?
With the exception of highest level of education achieved, which appeared to
be a factor in whether the individuals chose CBT or IBT, there were no
apparent confounding variables. Interestingly, although the level of
education is addressed, one would have thought the researchers would have
worked that into the design itself because the subjects chose the method
themselves.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
No hypotheses were
presented.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Not relevant.
- Do the hypotheses logically flow form the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
I don’t believe so although I don't
know how large the defined population was as the authors stated, "The
target population for this study ws the end-users of information systems
technology. The subjects for this study were employees of a Fortune 100
corporation located in a major southwestern city in the United States"
(Desai, p. 239).
- Is the sample representative of the defined population? I don't
believe so. Even if the defined population size was given, the samples for
the different groups: IBT and CBT were not comparable as they were 90 vs 21
respectfully.
- Is the method for selection of the sample appropriate?
It doesn't
appear that the end result was obtained from an optimal sample size,
particularly from the CBT group. The authors indicate a "self-selecting
and convenience sample was employed" (Desai, p. 239); however, there was
a definite difference between the groups e.g. high level of education achieved
which was not addressed.
- Is there any sampling bias in the chosen method?
There does not appear
to be any justification for the self-selecting model.
- Are the criteria for selecting the sample clearly identified?
Some
criteria are indicated. For example, the subjects chose the type of training
most closely aligned to both their workload and the actual training schedule
itself.
Step 7. Research Design
- Is the research design adequately described?
I don't believe so. There
is clear information about the beginnings of the study but there is no
indication as to how long the study was maintained, only that the subjects
were testing at the beginning and end of training as well as one month after
training completion.
- Is the design appropriate for the research problem?
The design does
not appear to be well described
- Does the research design address issues related to the internal and
external validity of the study?
No issues of internal and/or external
validity were addressed.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
The authors
indicated that the subjects' scores were recorded along with demographic data
on each subject. It was stated that the demographic data were used in
assessing the prior knowledge of the subjects.
- Are the data collection instruments described adequately?
Not really.
Scores were recorded but how the score was assigned is not made clear and
neither is the difference between computer- and instructor-based assessment
noted.
- Do the measurement tools have reasonable validity and reliability?
Although not clearly stated, it would appear that the computerized software
used was of a commercial nature. In that case, there should have been an
indication of validity and reliability measures but nothing was stated.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Although the
results are presented, they are not clearly organized.
- Is the type of analysis appropriate for the level of measurement for each
variable?
The employee's performance data were analyzed using one-way
ANOVA test and employee's satisfaction data were analyzed using Kruskal-Wallis
and Mann-Whitney tests at a significance level of 5%.
- Are the tables and figures clear and understandable?
The only table
presented was one that indicated the breakdown of males and females and the
totals of those who chose IBT or CBT. No other tables were included.
- Is the statistical test the correct one for answering the research
question? With such a small sample size, particularly for the CBT, I
would have thought the level of significance would have been lowered to 1%
to avoid the possibility of a Type I error. Additionally, there is no
indication as to the level of achievement was prior to the testing
particularly given that all employees were expected to complete the training
regardless of positions, regardless of knowledge.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
This is difficult to determine because the results are
not presented in either tabular or expository format.
- Are the interpretations based on the data obtained?
The authors often
make comparisons between the results achieved in MS Word and those obtained in
MS Excel which I find strange because the skills necessary for MS Word
excellence would not, I assume, be similar to those necessary to prove
excellence in MS Excel. Interestingly, although the authors state the results
show CBT to be more effective than IBT, the sample size was quite small and,
more importantly, there is apparently no statistically significant difference
between the end of training and one after training results for any teaching
mode.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
The authors employ citations of former
research studies which are all intended to address the implications of the
results.
- Are unwarranted generalizations made beyond the study sample?
Yes, the
findings indicated that the major differences between IBT and CBT subjects
were attributed to the performance, enrollment for the classes, motivation and
general attitude toward the training method and satisfaction with the
facility; however, these issues were not clearly addressed within the body of
this report. The authors report that motivation and attitude were not measured
but were interpreted based on the general comments made by the subjects which,
surely, cannot be used to indicate an official finding.
- Are the limitations of the results identified?
No limitations are
identified by the authors.
- Are implications of the results discussed? Yes. Several
implications were identified: (1) CBT as a methodology is difficult to
"sell" to the employees; (2) the viability of any training over
the long term; (3) identification of those who need training vs those who do
not; (4) a consideration of learning styles; (5) the mixture of CBT and IBT
to enhance training; and (6) the need for training managers to work in close
alliance with software vendors.
- Are recommendations for future research identified?
Except as
indicated above with regard to implications, no further future research
recommendations are identified.
- Are the conclusions justified?
The conclusion, "It was determined
by the research that CBT is an effective means of training; however, its
acceptance as a formal training tool was not favorable" (Desai, p. 233)
was indicative of the types of statements made throughout the study that did
not appear to be fully explained.
Article 6. Citation: Brown, K.G., (Summer, 2001). Using computers to
deliver training: Which employees learn and why? Personnel Psychology, 54(2),
271-297.
Article Summary: Note: This paper was adapted from the author's doctoral
dissertation. The author states that computer delivered training typically
offers learners more control over their instruction. In learner-controlled
environments, learner choices regarding practice level, time on task, and
attention are expected to be critical determinants of training effectiveness. To
examine the effect of learner choices in computer based training, a study was
conducted with 78 employees taking an Intranet-delivered training course.
Learner choices were assessed and predicted with goal orientation (mastery and
performance) and learning self-efficacy, as well as age, education, and computer
experience Results indicated considerable variability among trainees in practice
level and time on task, which both predict knowledge gain. performance
orientation interacted with learning self-efficacy to determine practice level,
and mastery orientation had an unexpected negative effect. Implications for the
use of computers to deliver training and for future research were discussed.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. The author examines
the different aspects of computer-based instruction but, more importantly, the
author addresses the fact that a new model will need to be incorporated into
learning theory that specifically addresses learner choices.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
This study provides two contributions to existing research. The first
contribution is the introduction of individual differences and learning theory
to research on computer-based training. The model presented provides a
conceptual framework for research on computer-based training that is
consistent with prior theory-driven research on individual differences and
learning. This study asserts practice level and time on task as classic yet
understudied constructs that are central to determining learning outcomes. The
second contribution of this study is that it examines learner choices and goal
orientation with adult employees enrolled in an organizationally sponsored
course.
- Is the problem significant enough to warrant a formal research effort?
Yes. Enough studies have been compiled on computer based training for us to
realize that now a different phase has been reached. Clearly, the technology
is available for the creation of virtually any course but we do not yet
understand what makes one learner more successful than another and while
paper-and-pencil assessments, etc. will yield some data based on good
practices, we must now learn how to use the computer to evaluate the data as
well as the learner.
- Is the relationship between the identified problem and previous research
clearly described?
Yes, contrary to other studies I have read, the
author of this study breaks down previous research and describes both the main
points as well as possible explanations for any abnormalities and/or
unexplained findings.
Step 2. Literature Review
- Is the literature review logically organized?
Yes. The author presents
the literature review in a topical fashion whereby each area to be addressed
in the current study is examined and justified from an earlier research point
of view.
- Does the review provide a critique of the relevant studies?
Yes. At
various times, the author makes suggestions regarding the findings of earlier
researchers. For example, "the failure of goal orientation to predict
these choices may have resulted from the limited duration of the training and
the limited number of practice opportunities available" (Brown, p. 275).
In another example, "Again, differences among learners may have been
restricted because all students were given the same amount of time for
practice" (Brown, p. 275).
- Are gaps in knowledge about the research problem identified?
I can see
no gaps related to this particular study.
- Are important relevant references omitted?
This is unknown to me;
however, the bibliographic list is extensive and there are many references to
earlier authors.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
Goal orientation theory forms the framework for this study
because the author is concerned with choices made by learners.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
I
believe that the emphasis of this study is on the goal orientation theory
mentioned above.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Age, education, and computer experience were all assessed with adequate
measures. In this particular study, the emphasis was on the learner's
processing during the testing and, therefore, the focus was constantly on the
learner because, the author believed, effective learning strategies need to be
identified and then taught to others.
- Are any confounding variables present? If so, are they identified?
I
did not notice any confounding variables.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
Yes. Several
hypotheses are clearly stated: (1) Individual differences are expected to
predict behavior and cognition of learners during the learning experience, and
these behaviors and cognition are expected to be the most proximal predictors
of learning; (2) The more learners practice and spend time on task, the more
they will learn; and (3) the more learners engage in off-task attention, the
less they are hypothesized to learn.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Yes. In some cases, the author
points out that a particular hypothesis will need to be supported in a number
of laboratory studies but that it must also be studied within the context of
organizationally sponsored training programs as well.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Yes. All are related to goal orientation theory.
Step 6. Sampling
- Is the sample size adequate?
Although it is not clear how large the
original population was, the sample size consisted of 78 technical employees.
- Is the sample representative of the defined population?
It appeared
that employees were given the choice of taking or not taking the computer
version and this might have added bias into the results because one is not
aware of why an individual chose the computer version over the
paper-and-pencil versions.
- Is the method for selection of the sample appropriate?
I don't know
how selection process was completed. It appears that it was of a strictly
volunteer nature.
- Is there any sampling bias in the chosen method?
If, indeed, the
subjects were self-chosen, then bias might be possible because their reasons
for the decisions to do CBT would not have been known.
- Are the criteria for selecting the sample clearly identified?
No.
Step 7. Research Design
- Is the research design adequately described?
The design is described
by procedure rather than by name with indications as to what each learner was
going to face during the two day exercise. However, during the conclusion
section, the research model was said to be a mediated model that depicted
learning choices as the key process by which individual differences influence
learning.
- Is the design appropriate for the research problem?
Yes. The
procedural explanation explained by the author appears to incorporate all of
the areas to be considered when addressing the problem of learner motivation.
- Does the research design address issues related to the internal and
external validity of the study?
Again, although the specific terms were
not mentioned, certainly the descriptions of the methodologies chosen indicate
that the internal validity would be sufficient to guarantee an accurate result
while the external validity of having the study performed in a computer
laboratory and a corporate training center.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
Yes.
Learners began by completing a computerized survey that included individual
difference measures as well as pretests. The self-report measure of off-task
attention was collected after learners completed 3 modules out of a total 9
modules. The computer recorded the practice activities completed and time on
task. Posttests were available on the computer when a learner had completed
the course. Scoring focused on technical accuracy and evidence of application
while ignoring presentation e.g. spelling. Two advanced graduate students were
trained to grade the responses. Following training the raters coded all
answers independently. Ratings were correlated to examine multi-rater,
multi-question relationships. The average cross-rater, same-question
correlation was .74 on the pretest and .82 on the posttest, indicating that
raters provided consistent ratings.
- Are the data collection instruments described adequately?
Yes. See
above.
- Do the measurement tools have reasonable validity and reliability?
Yes.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes. The
results were indicated by measure and the following were included: age,
education, computer experience, goal orientations, self-efficacy, time on
task, practice level, off-task attention, and knowledge.
- Is the type of analysis appropriate for the level of measurement for each
variable?
Yes.
- Are the tables and figures clear and understandable?
Yes. There are
several tables and figures presented which represent descriptive statistics
and correlations, standardized coefficients for regression of learning choices
and knowledge posttest on individual differences, and standardized
coefficients for regression of knowledge posttest on learning choices.
- Is the statistical test the correct one for answering the research
question?
Yes. The effects of individual differences on the three
learner choices and the learning outcome were examined using multiple
regression. Then, hierarchical regression was used to test the effects of
learner choices on knowledge gain by regressing posttest score on choices,
controlling for pretest score. In addition, the mediation model was examined
using the procedures recommended by an earlier, published research author.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes. Age and computer experience were not strongly
associated with learner choices; however, computer experience was positively
related to both pre- and posttest scores while practice level and time on task
both were related to knowledge test scores in the expected direction.
- Are the interpretations based on the data obtained?
Yes. For example,
it was hypothesized that mastery orientation would predict practice level and
time on task, and performance orientation would predict off-task attention.
Off-task attention was predicted by performance orientation and mastery
orientation. The relationships among goal orientations, practice level, and
time on task were small and statistically insignificant, except for an
unexpected negative relationship between mastery goal orientation and practice
level. Therefore, the hypotheses regarding the prediction of off-task
attention were supported but the hypotheses regarding the prediction of time
on task and practice level were not.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Yes. This study addresses several
earlier research questions including the need for more research on goal
orientation with adults as well as the need to examine goal orientation and
self-efficacy as predictors of training outcomes.
- Are unwarranted generalizations made beyond the study sample?
No.
- Are the limitations of the results identified?
Yes. For example, in
the discussion about high mastery-oriented learners, the author addressed one
particular limitation. Although high mastery oriented learners may have failed
to practice and learn key features of the problem solving process, they may
have become familiar with the process as a whole, gained more proficiency with
the training program and sued the system more following training. This, the
author proposes, may have been the case with the mastery-oriented learners;
however, no data was collected to test these hypotheses. Future research might
then explore these possibilities by studying a broader range of training
outcomes including use of the course web site for performance support.
Additionally, this study did not use a control group and so conclusions cannot
be drawn about the effectiveness of this particular course relative to other
instructional designs or formats. A third limitation was that the sample
consisted of volunteers who may, themselves, have been a different sampling
than the general population. The last major limitation concerned the small
sample size and modest reliabilities of some measures.
- Are implications of the results discussed? Yes. The results suggest
that employees may not use control over their learning wisely as indicated
by the number of skipped practices, quickly moving through training, and,
therefore, obtaining a much lower score.
- Are recommendations for future research identified?
Yes. Research
should address how learner choices regarding practice level and time on task
influence more distal training outcomes, such as skill maintenance and
generalization.
- Are the conclusions justified?
Yes. The results of this study suggest
considerable variability in learner choices and, as a consequence, learning.
The author suggests that as responsibility for learning is shifted from
trainers to learners, learner choices will become an increasingly important
determinant of overall training effectiveness.
Article 7. Citation: Barab, S.A., Young, M.F., & Wang, J., (1999).
The effects of navigational and generative activities in hypertext learning on
problem solving and comprehension, International Journal of Instructional
Media, 26(3), 283-310.
Article Summary: The study examined learning while using a linear text,
navigational hypertext, or a generative hypertext system. In one experiment,
students were assigned either the linear or navigational hypertexts and expected
to learn the information to solve a posed problem, while students in the second
experiment learning the information to pass a reading comprehension test. In the
third experiment, the effect of carrying out generative activities on problem
solving and reading comprehension was examined. The results indicated that the
number of generative activities explaining a significant amount of the variance
in problem-solving and reading comprehension scores.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Although hypertext
environments are becoming omnipresent, the research related to the benefits of
hypertext learning environments is divided. Some studies have found increased
benefits of learner-controlled instruction while others have found that
students with high degrees of learner control performed less effectively than
those receiving program control. The research does indicate, however, that a
more appropriate use of hypertext systems might be that involving
problem-solving tasks rather than reading comprehension tasks.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
- Is the problem significant enough to warrant a formal research effort?
Yes, most certainly. It is not enough just to be able to use computers as an
enhanced learning environment, we must conduct more studies to determine which
Internet facilities are useful for which type of learning and for which type
of learner.
- Is the relationship between the identified problem and previous research
clearly described?
Yes. In particular, the authors do a good job of
identifying both the positive and negative research findings for this
particular issue.
Step 2. Literature Review
- Is the literature review logically organized?
Yes. In particular, the
authors do a good job of identifying both the positive and negative research
findings for this particular issue.
- Does the review provide a critique of the relevant studies?
Not
particularly. However, in this instance there are a number of studies that
show contradictory results. A critique of some of the studies to determine
whether or not the methodologies might have been one of the reasons for the
discrepancies would have been helpful; however, this particular study is
actually three-in-one and perhaps there was some sort of inherent word-count
limitation in which the authors determined the results of the studies were
more important than a literary critique of the available literature.
- Are gaps in knowledge about the research problem identified?
Not
really except to point out the divergent results of previous studies.
- Are important relevant references omitted?
Unknown.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
One of the theories put forth by the authors is that of
generative learning which has, in the recent past, appeared to help
significantly increase both retention and text comprehension compared to
control groups who did not work with a generative model.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
I
believe that the theoretical components of this study were more relevant than
any conceptual framework.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Yes, although not through those terms. The independent variables included:
total time, self-determination, problem-solving, and reading comprehension.
- Are any confounding variables present? If so, are they identified?
I
don't know whether or not there are other confounding variables. I couldn't
find evidence of any pre-testing and/or post-testing which would have helped
to clarify the issue.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
There were several
hypotheses presented including: that learners who engaged in the generative
activities would do better at recalling information due to their active
processing while reading.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Yes. In the aforementioned
hypothesis, for example, the authors sought to address the issue under two
conditions: (1) when subjects had a specific problem solving goal in mind for
reading, and (2) when their goal was simply to comprehend the text in the
context of preparing for a traditional test of reading comprehension.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Yes because the basis of this study is the effect of
hypertext on generative learning in the hopes that learning is actually
enhanced and of a long-term nature.
Step 6. Sampling
- Is the sample size adequate?
Not known.
- Is the sample representative of the defined population?
Not known. The
participants for this study were recruited from psychology and education
classes at a northeaster US university. The process of recruitment was not
explained.
- Is the method for selection of the sample appropriate?
Unknown except
that, as an added incentive, the undergraduate students were given extra
course credit and then randomly assigned to one of three conditions. There
were 13 males and 35 females.
- Is there any sampling bias in the chosen method?
Although this was not
mentioned, there are approximately 51% females and 49% males in the
universities; however, I don't know what the breakdown is for education and
psychology classes but certainly the percentages mentioned above do not appear
to be a representative sample. Additionally, the additional course credit
might have dictated a certain type of individual to participate in the study.
- Are the criteria for selecting the sample clearly identified?
No.
Step 7. Research Design
- Is the research design adequately described?
The students were
randomly assigned to one of three treatments and the design regarding the
linear, navigational, and generative was well described. I do not believe
there was a control group and there don't appear to have been any duplicate
studies done.
- Is the design appropriate for the research problem?
I thought it was
interesting that the authors didn't question the fact that the chosen subjects
were all liberal arts students. I would have thought that an interesting
extension of this design would have been to include some math and/or science
students as well.
- Does the research design address issues related to the internal and
external validity of the study?
I believe the internal validity for this
study is consistent; however, it is not an extensive study as the sample size
is quite small and does not seem to be representative of the whole population.
Regarding the external validity, it is difficult to assess because, although
the students were randomly assigned to a treatment, there is no indication as
to whether or not the course was required or just "extra". In fact,
even if they did received extra course credit, their motivation should have
been addressed within the design.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
The data
collection was obtained from a self-determination questionnaire, a
problem-solving condition, meaningful generations, and a 52 reading
comprehension measure.
- Are the data collection instruments described adequately?
Yes.
- Do the measurement tools have reasonable validity and reliability?
Unknown because there is no in-depth description of how the scorers were
moderated although the authors did mention there was a content expert, a high
school history teacher, and two educational psychologists.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
There was a
short but descriptive results section that included a brief discussion of the
statistical analysis tools employed and the reasons for each choice.
- Is the type of analysis appropriate for the level of measurement for each
variable?
Yes.
- Are the tables and figures clear and understandable?
Yes. There were
several tables including: a means and standard deviation chart, a chart
illustrating the number of generative activities, and on reading
comprehension.
- Is the statistical test the correct one for answering the research
question?
Scatter plots reflecting a positive linear relationships
between the variables, MANOVA, standard deviations, Wilks' criterion,
discriminant function analysis, Tukey's honestly significant difference
procedure, t-tests, C-values, path analysis, and linear regression were all
employed. However, I am not familiar with all of those tests and so am unable
to answer whether they are correct for this particular situation.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes. In particular, the researchers were surprised that
the generative condition did not yield as positive a result as was obtained
from those with navigational control only.
- Are the interpretations based on the data obtained?
Yes.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Yes. For example, students in the
navigational condition were found to have significantly higher degrees of
self-determination than students in the other two conditions which is
consistent with earlier studies confirming that students improved with
increased learner control.
- Are unwarranted generalizations made beyond the study sample?
No.
- Are the limitations of the results identified?
In some cases. For
example, when students had reading comprehension goals (as contrasted with
problem-solving goals), there was no apparent difference between the
navigational, linear, or generative conditions. The authors indicated that
this might have be attributed to poor design, measurement error, other
sample-related problems, or the lack of a treatment effect. Interestingly,
there was no comparison of the samples used for each treatment although
comparisons were made across treatments.
- Are implications of the results discussed? Yes. At the time of the
study, the researchers indicated that generative activities, although useful
for some students, were less than optimal and that, in some cases, might
have even hindered the progress of some because of time spent off-task. The
authors go on to say that although the literature clearly indicates learners
perform better when they have more control, educators must pay attention to
the actual goals of the learners because those appear to have a bearing on
the success and/or failure of the student in that particular task.
- Are recommendations for future research identified?
Yes. Future
research should continue to compare navigational paths of different groups of
individuals, including high and low achievers, knowledge seekers, and confused
hypermedia users or users who have adopted different goals for using
hypermedia.
- Are the conclusions justified?
It would appear that the conclusions
are justified; however, the results really did little to dispel the
controversy surrounding the use of hypertext in the educational arena and this
issue was not addressed. Certainly, one suggested study should be to
investigate why the results are inconclusive with various studies.
Article 8. Citation: Brownlee, J., Purdie, N., & Boulton-Lewis, G.,
(April, 2001). Changing epistemological beliefs in pre-service teacher education
students, Teaching in Higher Education, 6(2), 247-269.
Article Summary: A teaching program designed to foster the reflection on and
development of epistemological beliefs was implements with 29 pre-service
graduate teacher education students in Australia. As part of the year long
teaching program, students were required to reflect on the content of an
educational psychology unit in relation to the epistemological beliefs. The
students were interviewed at the beginning and conclusion of the teaching
program. The questionnaire used was designed to measure beliefs about knowing.
The results of both the quantitative and qualitative data analysis indicated
that the group of students engaged in the teaching program experienced more
grown in epistemological beliefs. Certainly, the success of the teaching program
has implications for how teacher educators develop learning environments.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. The authors indicate
that the intention of the study was to improve learning for pre-service
teacher education students but that in order to accomplish this, they would
have to have a clearer idea as to the development of current students
regarding epistemological beliefs.
- Is the problem adequately narrowed down into a researchable problem?
Yes. Current research indicates that teaching programs aimed at improving
learning may need to focus on students' epistemological beliefs and that the
focus of interventions should attempt to help students see that critical
interpretation is sometimes necessary.
- Is the problem significant enough to warrant a formal research effort?
At the present time, there is sufficient research available to guarantee an
interest in epistemological interventions, particularly in light of the fact
that there is not yet an agreed upon benchmark upon which to work although
Perry's work comes close to being seminal in this area.
- Is the relationship between the identified problem and previous research
clearly described?
Yes. The different research theories regarding
epistemological beliefs are clearly explained.
Step 2. Literature Review
- Is the literature review logically organized?
Yes.
- Does the review provide a critique of the relevant studies?
No. The
intent of the authors does not appear to be that of critics but as of
organizers and contributors to the growing body of epistemological
intervention research.
- Are gaps in knowledge about the research problem identified?
Unknown.
- Are important relevant references omitted?
Unknown.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
Not relevant.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
The
purpose of this study was to supplement already existing (but not conclusive)
epistemological belief interventions. As such, the concept addressed would
involve how interventions could change established belief systems,
particularly for educators.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
The
independent variables included: age, gender, areas of study, and teaching
experiences. The dependent variable was to involve the amount of change
observed over a period of one year.
- Are any confounding variables present? If so, are they identified?
None
were identified.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
The hypothesis was
that, with intervention, the subject's epistemological belief system would
become more sophisticated.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Although the aforementioned
hypothesis was assumed, the issue was not described.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
There were 29 students in the treatment
group and 25 in the comparison group although the overall population size
isn't mentioned.
- Is the sample representative of the defined population?
Unknown.
- Is the method for selection of the sample appropriate?
The purpose of
the study was explained to the students who were then given the opportunity to
opt out for another group but none withdrew.
- Is there any sampling bias in the chosen method?
The undergraduate
qualifications of the students were varied and included: business, social
science, psychology, visual and performing arts, science, literature, and
nursing. However, there were only 3 males and 26 females. The authors do not
indicate whether this gender imbalance is irregular; however, they do indicate
the two groups have a similar make-up.
- Are the criteria for selecting the sample clearly identified?
No. The
only criteria mentioned was that of being a graduate student in a pre-teaching
training course.
Step 7. Research Design
- Is the research design adequately described?
As the purpose of the
study is to determine the epistemological beliefs of individuals and to track
any changes, the design incorporating both qualitative and quantitative
components appears appropriate.
- Is the design appropriate for the research problem?
Yes.
- Does the research design address issues related to the internal and
external validity of the study?
The internal validity is address because
this study forms a component of a program already known to the students and
the external validity is established because of the real-life situation of a
tutorial group within a school setting.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
Regular
journal reflections and the use of the Schommer questionnaire, and interviews
were conducted twice during the year.
- Are the data collection instruments described adequately?
Yes.
- Do the measurement tools have reasonable validity and reliability?
The
test-retest reliability on the Schommer is .70. Inter-item reliabilities for
individual items within each factor range from .63 to .85. The comparison
group wrote two written statements about their beliefs two times during the
year.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes. The
authors indicated that the analysis used a predominantly inductive approach.
The results were presented both from a qualitative and quantitative point of
view.
- Is the type of analysis appropriate for the level of measurement for each
variable?
Yes.
- Are the tables and figures clear and understandable?
Yes.
- Is the statistical test the correct one for answering the research
question?
Yes. This was one of the few studies that used qualitative
information which is the reason I chose it to see how the journals and
interviews were interpreted.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes. In this particular study, the students appeared to
experience a stronger growth in inconsistent beliefs which were not
necessarily the expected results although the authors did indicate that even a
change to inconsistent beliefs signified a change in their original belief
system.
- Are the interpretations based on the data obtained?
Yes.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Yes. In particular, the authors
believed that an understanding of metacognition would enable the teachers to
become better trained and more capable of dealing with the ill-defined
problems found in today's educational environment.
- Are unwarranted generalizations made beyond the study sample?
Possibly. Although the authors presented a fairly comprehensive literature
summary of epistemological studies, there didn't appear to be any research
that indicated the appropriate change from one level to another. In fact, one
of the researchers had indicated that a possible problem with, for example,
Perry's work was that it was of a linear nature and the boundaries between one
stage and the other were not clearly delineated. Therefore, when the authors
indicate that changes to one's belief system, even though of an inconsistent
pattern, were indicated of change, this was not clearly explained anywhere
else within the study and seemed to be more representative of a
rationalization rather than a reasoned argument.
- Are the limitations of the results identified?
Yes. The fact that the
students experienced stronger growth of inconsistent beliefs was discussed.
- Are implications of the results discussed? The authors indicate
that teaching programs should be developed that will encourage students to
reflect on epistemological beliefs in order to help the students become more
metacognitive.
- Are recommendations for future research identified?
Not really
although a more thorough discussion of the actual intervention techniques
would have been appropriate as well as a discussion as to whether or not
different teachers were in charge of each group.
- Are the conclusions justified?
Potentially, this study could have been
useful in helping to refine teaching education programs but it didn't seem to
actually go anywhere although the design appeared appropriate. There was too
much left unexplained in terms of the actual interventions, the number of
teachers, and the types of reflections offered by the students. Additionally,
although the students were participating in the study for one year, it
apparently wasn't long enough to show viable and consistent changes. At best,
the conclusions were that self-reflection may provide an atmosphere of
potential personal change.
Article 9. Citation: Welch, M., (November/December, 2000). Descriptive
analysis of team teaching in two elementary classrooms: A formative experimental
approach, Remedial & Special Education 21(6), 366-377.
Article Summary: This article reports the results of a descriptive analysis
of team teaching in two classrooms. The study employed a relatively new approach
to field based research, referred to as formative experiments, to conduct
formative and summative evaluation procedures. Results of quantitative and
qualitative analyses to assess student outcomes, teaching procedures, and
teacher impressions are presented. Descriptive information regarding planning
time, type of instructional format of team teaching, student groupings, and
follow-up evaluation time was obtained through weekly teacher logs. Focus groups
and written teacher comments provided information regarding teacher satisfaction
of the team-teaching experience. Performance of typical students and students
with learning disabilities on curriculum-based assessment measures given pre-and
post team teaching suggest academic gains in reading and spelling for all
students.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Through a critical
review of current literature, the author provides several areas that need
additional work and that will be addressed in this current study.
- Is the problem adequately narrowed down into a researchable problem?
Yes. The author identifies several research questions: (1) How much time did
teams of teachers spend planing, implementing, and assessing their team
teaching activities?; (2) What formats of team teaching were employed by
teaching teams?; (3) What student grouping formats were employed during team
teaching?; (4) Will there be an overall improvement in student performance on
criterion-referenced assessment scores?; (5) Will teachers achieve their
instructional objectives using team teaching?; and (6) What will be teachers'
impressions and levels of satisfaction regarding the use of team teaching?
- Is the problem significant enough to warrant a formal research effort?
Yes. Recent literature suggests that the number of adults in a teaching room
may improve the students' performance and, as it is physically impossible to
minimize the numbers of students per classroom, this option may prove viable
in improving students' outcomes.
- Is the relationship between the identified problem and previous research
clearly described?
Definitely. The author presents several areas that
are not closely examined within the literature and incorporates those areas
into this study.
Step 2. Literature Review
- Is the literature review logically organized?
The organization appears
to follow the needs of the researcher rather than a chronological or thematic
organization.
- Does the review provide a critique of the relevant studies?
The author
doesn't critique the relevant studies so much as indicate the gaps present in
current research.
- Are gaps in knowledge about the research problem identified?
- Are important relevant references omitted?
Unknown.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
Interestingly, the author appeared more interested in
justifying the methodology which included alternative approaches and
recommendations rather than discussions of team teaching itself. Additionally,
the type of qualitative research done, in this case, would seem to add more to
the body of research while, at the same time, clarify issues not consistently
addressed in earlier studies.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
Not
relevant.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Not
really. Although the author refers to this study as a combination
qualitative/quantitative experimental design, I find little evidence for the
quantitative portion.
- Are any confounding variables present? If so, are they identified?
Not
really.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
The author indicates
that the purpose of this study is to be descriptive and it appears to be so. I
suppose that an assumed hypothesis would be that team teaching, whatever the
approach, will provide more successful and long-term student success.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Not relevant.
- Do the hypotheses logically flow form the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
Two schools were chosen as the team
teaching sites and two teachers were chosen for each of the two schools.
- Is the sample representative of the defined population?
The teachers
were apparently a subset of a larger group that had completed a video-based
staff development training program. However, the total number of staff
participating in the program is unknown. The total of schools in the area is
unknown as well.
- Is the method for selection of the sample appropriate?
Although the
teachers are a subset as mentioned above, it is not known how they were
chosen.
- Is there any sampling bias in the chosen method?
There were only two
schools chosen and both were said to be in the middle to upper class
socioeconomic level. Additionally, all the teachers were female Caucasian and
the composition of the classes were such that different nationalities were
listed (with no indication as to whether or not there were ESL difficulties)
as well as students requiring special education and those with learning
disabilities. However, with all that the descriptive statements don't appear
to describe any potential bias. Perhaps this was due to the fact that a
descriptive study is just that, a descriptive statement.
- Are the criteria for selecting the sample clearly identified?
No.
Step 7. Research Design
- Is the research design adequately described?
The research is described
as a mixed methodology of both qualitative and quantitative research. The
authors stated that the descriptive information is necessary for social
validation and to add the to body of literature about team teaching.
- Is the design appropriate for the research problem?
As journals, log
recordings, power entries, and student performance data were necessary.
Because of the dependence on the opinions and thoughts of the teachers, then
the descriptive nature was appropriate. However, I am not as sure about the
relevance of the student performance because I can see no evidence that
variables such as learning disabilities, special needs, English language
proficiency, etc. were factored into the study.
- Does the research design address issues related to the internal and
external validity of the study?
Certainly the use of journals to measure
trends and opinions offers acceptable internal validity; however, I do have
some questions as to the external validity of the study based on problems with
the study design mentioned in other sections.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
The
formative and summative evaluation variables included time analysis teaching
formats and student grouping formats, student performance, and instructional
outcomes and teacher impressions and satisfaction. Data regarding planning and
teaching formats were collected on the same weekly log. Instructional planning
consisted of recording the academic activity as well as the materials used and
determining how student performance on the activity would be measured. Student
performance was measured by a comparison of pre-and post-team teaching mean
scores.
- Are the data collection instruments described adequately?
Yes, there
were descriptions of the time analysis and teaching formats, student
performance and instructional objectives, and teacher impressions and
satisfaction. However, there was one monthly meeting with the teachers that
would last from 60-90 minutes and it doesn't appear that these meetings were
recorded.
- Do the measurement tools have reasonable validity and reliability?
Unknown.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes. The
results section is divided into a team-teaching and student-grouping formats,
student performance and instructional objectives, and teacher impressions and
satisfaction.
- Is the type of analysis appropriate for the level of measurement for each
variable?
The level of analysis appears to have been at a basic level.
For example, the results on student performance included paired t-tests,
significant differences, and means at both pretest and posttest tests.
However, there was no description of the type of testing provided.
Additionally, although the purpose of the study was to provide descriptive
information on the teachers, there does not appear to have been much analysis
beyond the groupings of certain comments. For example, themes do not seem to
have developed and there appeared to be direct comparisons between the two
schools and two groups of teachers although there is no evidence to support
the compatibility of the two environments.
- Are the tables and figures clear and understandable?
There are several
tables included depicting the results indicated above.
- Is the statistical test the correct one for answering the research
question?
Not as detailed as one would have expected.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Some of the data interpretations, particularly with
regards to special needs students, appear generous at best. Eight students
were classified as learning disabled and yet the researchers indicate
impressive growth, such as a 72% gain in reading fluency…. However, the
environment was not controlled for different variables and it doesn't appear
possible that team teaching alone could have accounting for that degree of
improvement. Indeed, if true, then team teaching would truly become the
panacea of all special education students' needs! Additionally, I don't
believe that generalizations such as, "Consequently, it appears that team
teaching can supplement rather than exclusively supplant segregated service
delivery in specialized settings" (p. 370), are warranted based on this
particular study.
- Are the interpretations based on the data obtained?
Sometimes. See
above statement.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Sometimes. However, the relationships
are somewhat obtuse. For example,
This implies that team teaching could have a positive impact on all
students' performance in inclusive settings. These results appear to support
the study by Self, Benning, Marston, and Magnusson (1991) that also
incorporated curriculum-based assessment to measure students reading skills
before and after teach teaching. However, it is not possible to discern
whether student achievement in this study would have occurred anyway,
without team teaching, as no comparison group was utilized. (p. 371)
- Are unwarranted generalizations made beyond the study sample?
At
times.
- Are the limitations of the results identified?
Yes. A number of
variables could not be controlled. Students could not be randomly assigned to
groups. Similarly, comparison or control groups were not employed due to the
voluntary nature of the implementation. Likewise, there was no observation of
the team teaching to validate the integrity of the team-teaching procedures or
the information that was self-reported on the planning logs. Finally, the
results of this study cannot be generalized to other settings, grades, or
student populations.
- Are implications of the results discussed? The authors suggest that
perhaps the most important contribution of this study is the student outcome
data, which suggest that students have demonstrated improvement in each of
the academic areas. Additionally, they state that team teaching does not
appear to have an adverse effect on the academic performance of special
needs students. The data logs appeared to indicate that station teaching is
perhaps the most effective team teaching tool for both large and small group
environments. Finally, the authors offer the suggestion that teacher
education programs must continue to emphasize and examine factors such as
attitudes, beliefs, values, and role expectations before establish team
teaching.
- Are recommendations for future research identified?
Yes. This section
was particularly helpful and well written. Many suggestions for future
research are identified including: (1) focus groups and interviews may also
shed light on why a particular format of team teaching is used or preferred
over others; (2) it is important to assess outcomes of all students and not
just those eligible for special education services; (3) continued efforts to
understand complex social variables associated with team teaching; and (4)
continued use of log journals coupled with qualitative methods such as
interviews may shed light on why some teaching teams require less time to plan
than others, as well as in what ways teachers find the time to plan.
- Are the conclusions justified?
It would appear that the suggestions
for future research are indeed valid as are the limitations of this particular
research. However, I would hesitate to apply any of the direct conclusions,
particularly those related to the special education students, due to the very
small sample size as well as lack of a control group.
Article 10. Citation: Reeves, J., (August, 2000). Tracking the links
between pupil attainment and development planning, School Leadership &
Management, 20(3), 315-333.
Article Summary: The study was part of a more extensive research project
commissioned by the Scottish Office Education Department in 1994 called the
Improving School Effectiveness Project. The project gathered extensive
quantitative and other data on 80 schools and qualitative data on a subset of 24
schools. The research brief for the study was to examine the theme of
development planning within the context of the overall project. The article
tracks through the development of a set of associations between the value-added
attainment results of 12 primary and 12 secondary schools and some
characteristics of their approach to development and features of their culture
and organization.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Not particularly.
- Is the problem adequately narrowed down into a researchable problem?
Apparently
the problem addresses development planning.
- Is the problem significant enough to warrant a formal research effort?
An
interesting question because, certainly, development plans are important to
all organizations. I chose this article because it was, in fact, part of a
much larger research project that had been ongoing for a number of years.
- Is the relationship between the identified problem and previous research
clearly described?
No, not clearly described. However, throughout the
report, references to previous research are woven into the results of this
current research.
Step 2. Literature Review
- Is the literature review logically organized?
Not really as citations
are made throughout the report but are not organized in and of themselves.
- Does the review provide a critique of the relevant studies?
No.
Studies are cited that coincide with either the hypotheses or the
rationalization for this current study.
- Are gaps in knowledge about the research problem identified?
No.
- Are important relevant references omitted?
Unknown.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
Not relevant.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
This
was an interesting study because it appeared to evolve throughout the years
and still maintain a basic framework. The researchers hoped to obtain a
conceptual framework for determining why certain schools were more successful
than others in their approaches to change.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
The
study was designed along two strands: one based on capacity building and the
other on the quality of the school's development plan. As such, several
variables were defined including: (1) people's level of understanding; (2)
attitudes; (3) skills; (4) resource availability; (5) internal structures; and
(6) internal procedures.
- Are any confounding variables present? If so, are they identified?
Apparently, according to the authors, there were many problems that evolved
out of the study that were, in fact, partly due to the longevity of the
program in addition to the gender imbalance, among other areas.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
There were two
primary hypotheses mentioned: (1) Schools which adopt strategies which have a
high level of impact on the capabilities of staff and the school's capacity to
accommodate change are more likely to add value to their pupils' attainments
that those who do not; and (2) School which produce development plans which
conform with models of good practice are more likely to add value to their
pupils' attainments than those which produce poor development plans.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
No.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Yes.
Step 6. Sampling
- Is the sample size adequate?
It would appear that the size was
adequate.
- Is the sample representative of the defined population?
Eighty schools
across Scotland were involved in providing attainment and attitudinal data
from a cohort of pupils and attitudinal data from teachers and parents. From
these 80 schools, a representative sample of 24 schools were chosen to be the
case study schools which would provide the evidence for the qualitative strand
of the Improving School Effectiveness Project.
- Is the method for selection of the sample appropriate?
There is no
method discussed.
- Is there any sampling bias in the chosen method?
Not known.
- Are the criteria for selecting the sample clearly identified?
No
criteria are described.
Step 7. Research Design
- Is the research design adequately described?
Yes. However, as was
noted in the study, this work has extended over a period of time and,
therefore, much of the basic descriptions have been either amended or edited
over time.
- Is the design appropriate for the research problem?
The use of both
qualitative and quantitative research is appropriate for this study as it
involves an analysis of change and is an attempt to develop a change model for
both primary and secondary schools.
- Does the research design address issues related to the internal and
external validity of the study?
This is difficult to determine; however,
one would assume that the internal validity of this study is accurate as well
as the accuracy of real-life situations for the external validity. However, it
appears that there were many changes occurring within the study throughout the
years.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
Yes.
- Are the data collection instruments described adequately?
Yes. For each school, qualitative data about development
planning was collected using the development analysis interview and the
school's development plan. As the basis of the DAI, each school was asked to
choose a major development which had taken place in the school in the last few
years. The purpose of this interview was to find out how the school leaders
perceived the management of change. Two other sources of information were
included in the data collection: (1) teacher interviews and (2) teacher
questionnaires.
- Do the measurement tools have reasonable validity and reliability?
There are no validity and reliability scores provided.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
This section
was logically order with information about both the leaders of the school and
the teachers. Not all of the analysis was detailed because it had been
published in a previous article but that article was cited.
- Is the type of analysis appropriate for the level of measurement for each
variable?
There was an interesting analysis of the Interview
transcripts. The analysis led to the calculation of a strategy impact score of
the development and the second analysis identified key features associated
with the description of the initiative. The authors said that by using such a
simplistic they were unable to account for variance, particularly with the
secondary school data, and there was evidence of sharp conflict between staff
over the individual innovations. By reading through the transcripts several
times, different factors could be applied. However, I saw no evidence of the
more traditional analysis parametric and nonparametric statistics. Perhaps
this is due to a particular methodology applicable to the school system in
Scotland or the fact that this was not the original descriptive study article.
- Are the tables and figures clear and understandable?
Yes. The tables
are incorporated into the report.
- Is the statistical test the correct one for answering the research
question?
I saw no evidence of a particular statistical test.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
In the analysis of change, several factors were
determined to be quite important and included: (1) the importance of new
people; (2) a high level of resistance; (3) involvement of parents; (4)
involvement of students; (5) involvement of learning support; and (6) problems
with sustaining resources. Of note, is that these factors may have had a
positive or negative influence on the implementation of change. There was
significant descriptions of the differences between the primary and secondary
data.
- Are the interpretations based on the data obtained?
Because this study
was quite extensive and long term, I assume the interpretations are based on
the obtained data. However, it is important to note that not all of the
details were present in this report.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
In some cases, the findings were
related to previous research e.g. the complexity of change and the importance
of staff involvement in the development process, etc.
- Are unwarranted generalizations made beyond the study sample?
I could
see no evidence of unwarranted generalizations.
- Are the limitations of the results identified?
Limitations of the
study were mentioned throughout the report and included such items as an
imbalanced gender grouping which may have affected results.
- Are implications of the results discussed? Although the study did
succeed in terms of the original brief in showing a correlation between the
processes of planning for development in schools and school effectiveness,
the author believed this to be a generalization that left more questions
than answers. One point mentioned was that this particular study dealt only
with internal factors affecting change and made no attempt to analyze
external factors which, the author stated, made the issue of change even
more complex.
- Are recommendations for future research identified?
The author makes
some interesting suggestions regarding development plan future research
studies. Some of those suggestions include: (1) looking at whether or not
primary and secondary development should involve different improvement
strategies; (2) whether the focus of development should also include the
content development of change as well as the structure of the implementation
itself.
- Are the conclusions justified?
It would appear that, at least based on
this study, the described findings are minimal in contrast to the energy and
longevity of the study. However, this may be due to the fact that this was not
the original description of the study results but an additional article
written to supplement the findings.
Article 11. Citation: Retalis, S., Psaromiligkos, Y., & Avgeriou, P.,
(2000). Web engineering: New discipline, new educational challenges, Information
Services & Use, 20(2/3), 95-109.
Article Summary: Sophisticated applications are being deployed in increasing
numbers on the WWW without having been developed according to appropriate
methodologies and quality standards. The main reason for this ad hoc development
philosophy is the lack of specialized training/education on the web engineering
subject domain. This discipline is new and has recently started getting the
attention of researchers, developers, and of the major players in the web-based
application development market and training market. There is now justifiable and
increasing concern about he manner in which students and lifelong learners are
well educated and trained in this new discipline. It's also only one year ago
that few universities have started providing courses on this discipline and
offer seminars to lifelong learners. The Department of Electrical and computer
Engineering at National Technical University of Athens began offering a one
semester course called "Internet Publishing". In this paper, an
overview of the course, its web-enriched delivery method as well as the
quantitative and qualitative results extracted after the completion f the
evaluation study in 1999-2000.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Web engineering is
concerned with the establishment and use of sound scientific, engineering and
management principles and disciplined and systematic approaches to the
successful development, deployment and maintenance of high quality Web based
systems and applications. However, in higher education today, there is no
discipline in the current curriculum that teaches all or even an extensive
part of these skills. Even at universities where there are a lot of practical
classes, the student has to make specific choices in courses during his/her
studies such as: programming language courses, database management systems,
software engineering and object-oriented design, information systems,
networking and Internet protocols, in order to have adequate background as a
web engineer.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
- Is the problem significant enough to warrant a formal research effort?
Yes. The WWW is clearly a new electronic frontier, much like the Wild West, in
which the pioneers are testing out the parameters of the system while now, the
sheer numbers of users and sites, demands a more methodical approach to course
design on the Internet.
- Is the relationship between the identified problem and previous research
clearly described?
Yes.
Step 2. Literature Review
- Is the literature review logically organized?
The literature review is
not organized in the traditional sense but consists of a review of web
engineering courses.
- Does the review provide a critique of the relevant studies?
Apparently, an earlier review of the literature revealed several gaps in the
research which are addressed below. However, there does not appear to be any
special critique of the studies.
- Are gaps in knowledge about the research problem identified?
Yes. The
authors point out three gaps in current research on the impact of technology
in education. These include: (1) a lack of theoretical or conceptual
framework; (2) the different learning styles of students related to the use of
different technologies, is not taken into consideration; and (3) the feelings
and attitudes of the students are not adequately investigated.
- Are important relevant references omitted?
Unknown.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
As indicated in the introduction, the authors do not
believe there is a theoretical or conceptual framework in the area of web
engineering.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
Not
relevant.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Variables are listed including: (1) usage of the learning environment; (2)
effect of the instructional delivery mode to students' learning styles; (3)
contribution of the learning resources to the acquisition of knowledge and
skills; (4) effect of the instructional delivery mode to the acquisition of
knowledge and skills; (5) quality of the learning resources; and (6) a
comparison of the enriched classroom delivery mode with the traditional
ex-cathedra one.
- Are any confounding variables present? If so, are they identified?
No
confounding variables are identified.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
No hypotheses were
proposed although the assumed hypothesis is that the WWW learning environment
is a potential enriching environment that needs categorization and that web
engineering courses will facilitate the development of web courses that follow
a particular theoretical or conceptual framework
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
Not show.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Not relevant.
Step 6. Sampling
- Is the sample size adequate?
The sample size seems small: 16
individuals (2 women and 14 men)
- Is the sample representative of the defined population?
The defined
population would appear to be the number of students who successfully
completed the course, 40.
- Is the method for selection of the sample appropriate?
There is no
indication about the selection method described.
- Is there any sampling bias in the chosen method?
Clearly, the size is
a problem. In addition, the gender balance would appear to indicate a study in
an of itself as well as the fact there was a 21.6% attrition rate for the
course itself and these individuals were not included in the study.
- Are the criteria for selecting the sample clearly identified?
No. It
appears to have been by default.
Step 7. Research Design
- Is the research design adequately described?
The evaluation study
followed a specific methodology, called CADMOS-E which is a pre-test and
post-test method incorporating some aspects of the illuminative evaluation
approach. It is a stepwise method supported by specially developed pre- and
post-test questionnaires, which provide data for both quantitative and
qualitative analysis.
- Is the design appropriate for the research problem?
According to the
authors, this is appropriate because the focus of this evaluation is on the
learning effectiveness of the course and its delivery mode as well as the
identification of extensions and reviews that were required to take place.
- Does the research design address issues related to the internal and
external validity of the study?
The specific issues of internal and
external validity are not described although the internal validity would be
assumed through the use of questionnaires and the external would appear to be
present because of the real-live environment of the course.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
The pre-and
post- test questionnaires consisted of 29 closed-end questions but also
included several open-ended questions on feelings and emotions. There is no
reliability or validity results for this questionnaire.
- Are the data collection instruments described adequately?
Not
described.
- Do the measurement tools have reasonable validity and reliability?
Not
mentioned.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Sort of.
However, in each and every occasion the Likert-type scale was presented and
this seemed gratuitous in light of the fact that it has already been
explained.
- Is the type of analysis appropriate for the level of measurement for each
variable?
The authors decided that the size of the sample was not
statistically appropriate for quantitative analysis and so a comparative
statistical analysis of the data was performed. The basic statistical analysis
depicted the trends of the learners' opinions.
- Are the tables and figures clear and understandable?
The tables are
absolutely necessary because the data are not presented within the report
itself but the reader is referred to the tables.
- Is the statistical test the correct one for answering the research
question?
No statistical test was used and there do not appear to have
been any correlations described between the questions and trends and so one is
left with just the continuum of answers for each question.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
The authors do not appear to offer interpretations of
the data as the findings are presented.
- Are the interpretations based on the data obtained?
I could not see
any indication of interpretations.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
No. This was disappointing because, in
the introduction, one of the complaints about earlier research was that there
were no theoretical and/or conceptual frameworks presented and yet nothing was
presented here either. This report, I believe, was more of a descriptive
statement about one individual course rather than a research study taking its
place among many others.
- Are unwarranted generalizations made beyond the study sample?
No.
- Are the limitations of the results identified?
Except for the sample
size, no limitations were discussed.
- Are implications of the results discussed? None discussed.
- Are recommendations for future research identified?
No.
- Are the conclusions justified?
Not really. For example, the authors
indicated that the evaluation study showed that the course was of high quality
and that the open learning mode was most appropriate for the postgraduate
students and yet there is no discussion regarding the high attrition rate,
almost 22%, nor the lack of female participants.
Article 12. Citation: Dominguez, P.S., & Ridley, D.R., (March, 2001).
Assessing distance education courses and discipline differences in their
effectiveness, Journal of Instructional Psychology, 28(1), 15-20.
Article Summary: This study illustrated a new, "parsimonious" (p.
15) model that investigators interested in distance education can use to ask
meaningful questions about the relative quality of distance education courses.
The approach removed the emphasis from student-level data and placed it upon
course-based data. Sample data comparing online and traditional higher education
courses covering nine disciplines were reported. These data revealed that
preparation for advanced courses was statistically equivalent whether the course
prerequisites were online courses or their traditional classroom counterparts.
The article further explored the usefulness of this framework for identifying a
significant discipline-related difference in the relative effectiveness of
online and traditional prerequisites as preparation for advanced courses.
Step 1. The Problem
- Is the problem clearly and concisely stated?
Yes. Focusing on
student-level data tells only a limited tale. For example, generating a
profile of the successful distance education student does not provide
institutions with practical information for program improvement or refinement.
The authors remove the emphasis on distance education students and places it
on the course itself. Additionally, the expand the scope of investigations to
include distance education students' subsequent performance in other classes.
- Is the problem adequately narrowed down into a researchable problem?
Yes.
- Is the problem significant enough to warrant a formal research effort?
Yes. The point is well made. What does the research say to universities about
improving the quality of their on-line courses. I did, however, think it was
interesting to note that the authors have mentioned nothing about the recent
growth in literature about on-line instructive design models.
- Is the relationship between the identified problem and previous research
clearly described?
Not really. There are only two sources cited in the
bibliography. Clearly, the intent of the authors is not to do a literature
review.
Step 2. Literature Review
- Is the literature review logically organized?
There is no literature
review.
- Does the review provide a critique of the relevant studies?
Not
relevant.
- Are gaps in knowledge about the research problem identified?
No.
- Are important relevant references omitted?
Yes. There are no
references whatsoever to current research dealing with instructional design
methodologies for the Internet.
Step 3. Theoretical or Conceptual Framework
- Is the theoretical framework easily linked with the problem, or does it
seem forced?
Not relevant.
- If a conceptual framework is used, are the concepts adequately defined,
and are the relationships among these concepts clearly identified?
The
conceptual framework used in this study is clearly identified. The authors
suggest a shift from a focus on the students to determine the validity of a
course to the course itself and to the students' subsequent performances in
future classes.
Step 4. Research Variables
- Are the independent and dependent variables operationally defined?
Traditional or online courses were two variables discussed within this study.
- Are any confounding variables present? If so, are they identified?
I
would humbly suggest that the confounding variables are ignored, if present at
all. For example, there is no indication as to the success of particular
students before taking certain courses. There is, also, no discussion as to
whether or not certain student or certain learning styles favored traditional
over online or vice versa.
Step 5. Hypotheses
- Are the hypotheses clear, testable, and specific?
One hypothesis was
discussed: One reason for a finding of no overall difference between course
delivery formats might be that the effectiveness of online instruction varies
with the department or discipline being studied.
- Does each hypothesis describe a predicted relationship between two or more
variables included in each hypothesis?
The one variable that appeared
relevant to this study regarding disciplines of study.
- Do the hypotheses logically flow from the theoretical or conceptual
framework?
Although the hypothesis does not necessarily flow from the
conceptual framework, it flow from the examination of the records the
researchers were conducting.
Step 6. Sampling
- Is the sample size adequate?
The sample size was determined by the
study itself and could not be varied within the given time period.
- Is the sample representative of the defined population?
The sample is
not so much representative as the total population fitting the particular
criteria.
- Is the method for selection of the sample appropriate?
The authors
chose courses that were offered at the advanced level in a more traditional
format where the students had completed the prerequisite course online. Hence,
the guiding question: Do online courses prepare students for advanced study as
well as traditionally accepted forms of prerequisites?
- Is there any sampling bias in the chosen method?
No.
- Are the criteria for selecting the sample clearly identified?
Yes.
Step 7. Research Design
- Is the research design adequately described?
I am assuming that the
research design presented is the "parsimonious" model mentioned
earlier in the report.
- Is the design appropriate for the research problem?
Interestingly, I
enjoyed the analysis component of this report because the question was
clear-cut and each statistical test explained. Therefore, I assume it was an
appropriate design for this particular research problem.
- Does the research design address issues related to the internal and
external validity of the study?
No.
Step 8. Data Collection Methods
- Are the data collection methods appropriate for the study?
The data
collection was of past records and so involved searching rather than
collecting new data.
- Are the data collection instruments described adequately?
Not
relevant.
- Do the measurement tools have reasonable validity and reliability?
I
am not aware of how the original data were collected.
Step 9. Data Analysis
- Is the results section clearly and logically organized?
Yes. The
results section centered around the statistical analysis itself.
- Is the type of analysis appropriate for the level of measurement for each
variable?
Yes.
- Are the tables and figures clear and understandable?
Yes. There are
two tables presented: (1) final grades and (2) tabular results by department.
- Is the statistical test the correct one for answering the research
question?
First, the frequencies were presented and, using Fisher's
Exact test of significance, a probability of .09 would be associated with the
distribution. Although there was no statistically significant difference, the
question of a statistical interaction needed to be tested. The statistical
interaction between the method and the discipline (relative advantage)
appeared to show the non-Management courses were positive and Management
courses negative. Two chi-squared analyses of the distributions of online
passes and fails were performed to test for an interaction between target
discipline and prerequisite format and, finally, a second method was used that
set the expected frequencies proportionately with the rates of passes and
fails.
Step 10. Interpretation and Discussion of the Findings
- Does the investigator clearly distinguish between actual findings and
interpretations?
Yes. No interpretations of the results are offered in
the results section.
- Are the interpretations based on the data obtained?
Yes. In
particular, the authors noted possible explanations for the results shown in
the Management section.
- Are the findings discussed in relation to previous research and to the
conceptual/theoretical framework?
Not really except to confirm that the
alternative model of evaluating courses will provide effective information to
administrators.
- Are unwarranted generalizations made beyond the study sample?
No. All
interpretations are clearly shown to be possibilities.
- Are the limitations of the results identified?
No.
- Are implications of the results discussed? Yes. The authors
indicate that there appears to be no significant difference between the
results at higher level courses between those students who completed the
prerequisites in the traditional manner or online. Therefore, institutions
may continue to expand their online course offerings. However, the authors
do point out that some course material may not be conducive to online
management and that administrators should consider that option.
- Are recommendations for future research identified?
Yes. The authors
suggest that perhaps if a particular format is not suitable for a specific
subject, the expectations of the instructors might be influenced. For example,
if an online course provided less effective preparation for a target course in
the past, teachers might come to expect that result. A phenomenon that the
researchers suggest should be studied.
- Are the conclusions justified?
Yes.
Summary
While evaluating these 12 articles, the author was surprised at the lack of
clarity within several of the studies. For example, several studies (Desai,
2000; Welch, 2000) did not appear to link their discussion of the results with
the original research objectives. Other studies (De Beer, 1998; Eggen, 2000;
Lloyd, 1996; Ponsoda, 1999) did not appear to handle the data in a matter
conducive to successful statistical analysis. It was, in fact, disappointing to
see that several authors (Brownlee, 2001; De Beer, 1998; Desai, 2000; Lloyd,
1996) made broad-based generalizations not based on the results of their own
studies. Other studies appeared to excel in certain areas while skipping over
the details of the study, such as an adequate description of the design itself.
One problematic area appeared to be the sample size which, in many cases (Barab,
1999; De Beer, 1998; Eggen, 2000; Lloyd, 1996; Ponsoda, 1999; Retalis, 2000;
Welch, 2000), appeared far to small to have been of any value. As the author
mentioned in the introduction to this paper, one purpose was to evaluate both
qualitative and quantitative research studies. It was discomforting to see the
lack of specific text analysis in the qualitative studies (Barab, 1999;
Brownlee, 2001; Retalis, 2000; Welch, 2000). In fact, most of those studies were
said to be of a combination "qualitative and quantitative" design as
if to provide legitimacy to their own work. Another problematic area concerned
the validity and reliability measures of statistic tests. Most authors (De Beer,
1998; Ponsoda, 1999) never mentioned some of the critical terms that would have
helped to establish the reliability of the study as a whole.
Bibliography
Barab, S.A., Young, M.F., & Wang, J., (1999). The effects of navigational
and generative activities in hypertext learning on problem solving and
comprehension, International Journal of Instructional Media, 26(3),
283-310.
Brown, K.G., (Summer, 2001). Using computers to deliver training: Which
employees learn and why? Personnel Psychology, 54(2), 271-297.
Brownlee, J., Purdie, N., & Boulton-Lewis, G., (April, 2001). Changing
epistemological beliefs in pre-service teacher education students, Teaching
in Higher Education, 6(2), 247-269.
De Beer, M. & Visser, D., (March, 1998). Comparability of the
paper-and-pencil and computerized adaptive versions of the General Scholastic
Aptitude Test (GSAT) senior, South African Journal of Psychology, 28(1),
21-28.
Desai, M.S., (December, 2000). A field experiment: Instructor-based training
vs. computer-based training, Journal of Instructional Psychology, 27(4),
239-244.
Dominguez, P.S., & Ridley, D.R., (March, 2001). Assessing distance
education courses and discipline differences in their effectiveness, Journal
of Instructional Psychology, 28(1), 15-20.
Eggen, T.J.H.M. & Straetmans, G.J.J.M., (October, 2000). Computerized
adaptive testing for classifying examinees into three categories, Educational
and Psychological Measurement, 60(5), 713-734.
Lloyd, D., & Martin, J.G., (March, 1996). The introduction of
computer-based testing on an engineering technology course, Assessment &
Evaluation in Higher Education, 21(1), 83-91.
Ponsoda, V., Julio, O., Rodriguez, M.S., & Revuelta, J., (1999). The
effects of test difficulty manipulation in computerized adaptive testing and
self-adapted testing, Applied Measurement in Education, 12(2), 167-185.
Reeves, J., (August, 2000). Tracking the links between pupil attainment and
development planning, School Leadership & Management, 20(3), 315-333.
Retalis, S., Psaromiligkos, Y., & Avgeriou, P., (2000). Web engineering:
New discipline, new educational challenges, Information Services & Use,
20(2/3), 95-109.
Welch, M., (November/December, 2000). Descriptive analysis of team teaching
in two elementary classrooms: A formative experimental approach, Remedial
& Special Education 21(6), 366-377.
To The Top