Why was the study conducted? What hypotheses were being tested? Archives

WISC-V Stability Study

Find and read a peer-reviewed research journal article using intelligence or achievement testing in research and share what you learned from this article with your classmates. Specifically (and in your own words):

1. Why was the study conducted? What hypotheses were being tested?

2. What test(s) were used?

3. What findings were reported, and what conclusions were drawn

Why was the study conducted? What hypotheses were being tested?,
What test(s) were used?,
What findings were reported and what conclusions were drawn?

Response (concise, in my own words)

1) Purpose & hypotheses. Watkins et al. (2021) examined the long-term temporal stability of WISC-V scores in a clinical outpatient sample because most published reliability evidence for the WISC-V focuses on short retest intervals or normative samples. The study tested whether WISC-V composite scores (e.g., Full-Scale IQ, index scores) and subtest scores remain sufficiently stable over a multi-year interval (mean ≈ 2.6 years) to support clinical decisions. Implicitly, the authors expected omnibus and broad indices to be more stable than individual subtests or within-person difference (profile/ipsative) measures.

2) Tests used. The researchers administered the ten primary WISC-V subtests on two occasions to 225 children/adolescents seen in an outpatient neuropsychology clinic. From those subtests they derived the five primary index scores and the Full-Scale IQ (FSIQ). Analyses included mean comparisons, test-retest correlations (stability coefficients), and measures of replication for intraindividual (idiographic) score patterns.

3) Findings & conclusions. Mean composite scores were relatively constant, but subtest stability was modest (average r ≈ .66). Only the Verbal Comprehension Index (VCI), Visual Spatial Index (VSI), and the FSIQ exceeded the commonly cited 0.80 threshold for long-term stability. Intraindividual difference scores and “profile scatter” showed poor replication across administrations (low kappa), indicating that observed strengths/weaknesses on subtests often did not recur at retest. The authors conclude that while FSIQ and some broad indices may be defensible for nomothetic (between-person) comparisons over years, ipsative/person-relative interpretations (e.g., treating a single subtest peak as a stable strength) are not reliable enough for confident clinical decision-making. Practically, clinicians should rely more on composite scores for long-term high-stakes decisions and be cautious when using subtest/profile differences to guide eligibility or intervention without corroborating evidence or repeat assessment.

Reference (APA 7)
Watkins, M. W., Canivez, G. L., Dombrowski, S. C., McGill, R. J., Pritchard, A. E., Holingue, C. B., & Jacobson, L. A. (2021). Long-term stability of Wechsler Intelligence Scale for Children–Fifth Edition scores in a clinical sample. Applied Neuropsychology: Child, 11(3), 422–428. https://doi.org/10.1080/21622965.2021.1875827

Intelligence Predicts Grades

1. Why was the study conducted? What hypotheses were being tested?

2. What test(s) were used?

3. What findings were reported, and what conclusions were drawn

Questions

Why was the study conducted? What hypotheses were being tested?,
What test(s) were used?,
What findings were reported, and what conclusions were drawn?,

Answers / What I Learned

1. Why was the study conducted? What hypotheses were being tested?

The study I looked at is “The Predictive Validity of Four Intelligence Tests for School Grades” (Frontiers, 2017) Frontiers.

Purpose / motivation: The authors wanted to examine how well different intelligence tests can predict later school performance (grades). Over many decades, psychologists have assumed that higher intelligence should lead to better academic outcomes, but the strength of that prediction can vary depending on which intelligence test is used and what subject or grade is being predicted.
Hypotheses tested: The authors hypothesized that each intelligence test would significantly predict average school grades over time, and that some tests might predict specific subject grades (e.g. math or language) better than others. They also expected that longitudinal prediction (predicting grades several years later) would hold for at least some of the tests.

2. What test(s) were used?

They used four intelligence tests commonly used in German-speaking countries:

Intelligence and Development Scales (IDS)
Reynolds Intellectual Assessment Scales (RIAS)
Snijders-Oomen Nonverbal Intelligence Test (SON-R 6-40)
Wechsler Intelligence Scale for Children (WISC-IV) Frontiers

These were administered to children around age ~9. Then the researchers collected their school grades (average grades, plus specific grades in mathematics and language) about 3 years later. Frontiers

So this was a longitudinal prediction design.

3. What findings were reported, and what conclusions were drawn?

Findings: Intelligence Predicts Grades

All four intelligence tests showed significant prediction of the average school grades measured 3 years later. Frontiers
For specific subjects:
• The IDS and RIAS predicted both mathematics and language grades. Frontiers
• The SON-R 6-40 test predicted math grades. Frontiers
• Interestingly, the WISC-IV did not show a significant association with later math or language grades when considered separately (though it did predict the overall averaged grades). Frontiers
Their sample size for the 3-year follow up was modest (54 children for whom longitudinal data was available). Frontiers
The authors caution that because of this small follow-up sample, conclusions should be tentative. Frontiers

Conclusions:

Intelligence test scores have useful predictive validity for later academic performance (grades), especially when using tests like IDS, RIAS, and SON-R.
Some tests are better predictors for specific subjects than others.
The fact that WISC-IV failed to predict individual subject grades suggests that not all intelligence measures are equally good for all predictions.
In psychological practice (for guidance, placement, or interventions), intelligence tests can help anticipate academic difficulties or strengths, but one must interpret results carefully and in context.
Because of study limitations (small longitudinal sample, focus on German-speaking context), the results should be seen as preliminary evidence.

My reflections / what I learned generally

Intelligence tests are not perfect, but they do provide meaningful information about future academic success.
The choice of the test matters: some tests may be more predictive in certain domains (math, language) or contexts.
Longitudinal designs (testing intelligence early, then measuring achievement later) are powerful because they help us see causal potential rather than just correlations at one time point.
Even with significant predictive power, intelligence is only one piece of the puzzle — many other factors (motivation, teaching quality, environment, effort) also influence achievement.
When interpreting test results, especially in educational settings, one must consider sample sizes, cultural context, and whether the test was standardized in a comparable population.

Tag Archives: Why was the study conducted? What hypotheses were being tested?

WISC-V Stability Study

Intelligence Predicts Grades

Questions

Answers / What I Learned

1. Why was the study conducted? What hypotheses were being tested?

2. What test(s) were used?

3. What findings were reported, and what conclusions were drawn?

My reflections / what I learned generally