Assessing efficacy of stuttering treatments$
Department of Psychology, University College London, Gower Street, London WC1E 6BT, UK
Received 20 September 2000; received in revised form 27 February 2001; accepted 15 June 2001
Efficacy has been defined as the extent to which a specific intervention, procedure,
regimen, or service produces a beneficial result under ideally controlled conditions whenadministered or monitored by experts. Studies on efficacy can be divided into those thatstudy methods of conducting treatment (i.e., treatment process research) and those that areconcerned with the effects of treatments (i.e., treatment outcome research). This reviewcovers both areas, emphasizes the former, and considers such key determinants of efficacyas measurement, treatment integrity, and design issues. A set of criteria is given and ameta-analysis of whether studies published since 1993 meet these criteria is reported(incorporating some pragmatic and ethical considerations). The review ends byconsidering directions that warrant further investigation in the future.
Educational objectives: The reader will learn about and be able to describe (1)
measurements appropriate for evaluating treatment efficacy studies; (2) how to evaluatereports of stuttering treatment programs; and (3) different designs used in treatmentefficacy studies. D 2001 Elsevier Science Inc. All rights reserved.
Keywords: Efficacy; Treatment; Stuttering
The treatment of stuttering has been described as a controversial and
perplexing issue for speech language pathologists (Ingham & Riley, 1998), and
$ This research was supported by a grant from the Wellcome Trust. Authorship is shared equally. * Corresponding author. Tel.: +44-207-679-7566; fax: +44-207-436-4276. E-mail address: p.howell@ucl.ac.uk (P. Howell).
0094-730X/01/$ – see front matter D 2001 Elsevier Science Inc. All rights reserved. PII: S 0 0 9 4 - 7 3 0 X ( 0 1 ) 0 0 1 0 3 - 6
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
recent concerns have been expressed about the absence of rigorous documenta-tion regarding the efficacy of particular interventions (Ansel, 1993; Conture,1996; Conture & Guitar, 1993; Cordes & Ingham, 1998; Starkweather, 1993). Ithas even been asserted that the state of stuttering treatment research, at least up toearly 1996, was abysmal and that some leaders in the field appear to haveabandoned basic scientific principles that are at the heart of any attempt toestablish treatment efficacy (Cordes, 1998). Efficacy is the extent to which aspecific intervention, procedure, regimen, or service produces a beneficial resultunder ideally controlled conditions when administered or monitored by experts(Last, 1983). In contrast, treatment effectiveness is the extent to which anintervention or treatment employed in the field does what it is intended to dofor a specific population (Last, 1983). Treatment efficacy research can becharacterized as an investigative tool for examining the effects of environmentalvariables (i.e., treatment) on organismic variables (i.e., communication behav-iors). Moreover, it has been suggested that the beauty of efficacy research is itsability to address both theoretical and clinical questions simultaneously (Ols-wang, 1993).
The dawn of a behavioral orientation to stuttering treatment in the 1960s
introduced a set of principles and practices for determining treatment efficacy. This model was based primarily upon the quantification of the target of treat-ments, plus systematic evaluations of relevant behaviors across clinically import-ant settings for meaningful periods of time (Ingham & Andrews, 1973; Kazdin,1978) and did, to some extent, transcend theoretical orientations (Bloodstein,1987). However, Schwartz (1976) published his account of ‘‘solving stuttering,’’with its accompanying claim that the disorder had been treated with an 89%success rate. Reaction throughout the field was ‘‘principally directed at a glaringabsence of data-based therapy evaluation’’ (Ingham, 1993, p. 134).
Then, a second catalyst occurred in 1987, when Cooper claimed that ‘‘at least
two out of every five adolescent and adult abnormally disfluent individuals areincurable stutterers’’ (Cooper, 1987, p. 381), which was attacked on similargrounds. It has since been suggested that the procedures recommended forevaluating the efficacy of stuttering treatment have become overwhelminglycomplex, while at the same time, prevailing notions about the nature of stutteringhave become increasingly biological (Ingham & Cordes, 1997). As a result,Ingham and Cordes (1997) claimed that even the most recent studies of stutteringtreatment seem to have been conducted without evaluation procedures and thattreatments are now being recommended with little or no empirical support. Insupport of this claim, Cordes (1998) reviewed 88 selected publications andreported that treatments that were most often recommended were not treatmentsthat had been the most comprehensively researched.
Systematic assessments of the efficacy of treatments utilized by a profession
are ‘‘essential to the maintenance of the clinical integrity of any profession’’(Curlee, 1993, p. 328). Hence, the purpose of this article is to identify some of thefundamental issues that should form the bases of evaluating treatment efficacy for
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
stuttering. As Purser (1987) noted, evaluations of treatment efficacy involve bothtreatment process research (i.e., study of methods of conducting treatment) andtreatment outcome research (i.e., study of the effects of treatments). This review,therefore, examines both aspects, with particular emphasis on the former, as wellas the issues that need to be resolved or addressed before treatment efficacy canbe assessed.
Our review endeavors to provide an impartial look at studies of efficacy across
treatments of stuttering that will complement recent discussions in the literaturein this area (Yaruss, 1998) and in wider aspects of health care services (Kazdin &Kendall, 1998). We do not intend to promote any particular theory or to evaluatein detail programs of treatment. These objectives rule out, respectively, ourconsideration of Steps 2, 3, 6, and 7 of Kazdin and Kendall’s (1998) list of ‘‘StepsToward Developing Effective Treatments.’’ This review, instead, criticallyaddresses: (i) measurement issues, (ii) treatment integrity, and (iii) design issues,that correspond to Kazdin and Kendall’s conceptualization of dysfunction(Step 1), specification of treatment (Step 4), and tests of treatment outcome(Step 5). Yaruss (1998) provided a framework for describing the etiology andrange of problems that a speaker may experience using language that other healthprofessionals employ. However, this does not readily lend itself to assessment ofthe activities of professionals engaged in delivering treatments, which fall undertopics (ii) and (iii).
2. Clinical issues in the measurement of stuttering
It was once thought that stuttering was a comparatively simple disorder to
measure (Ingham & Andrews, 1973), and the counting of moments of stuttering,which began in the 1930s, not only operationalized the measurement of stutteringbut brought with it the rigor of scientific inquiry. There is general agreement inthe literature, as well as considerable content validity, supporting the notion thatreductions in stuttering frequency and severity are associated with effectivetreatment outcome (Conture & Guitar, 1993; Ingham & Costello, 1984; Yairi,1993, 1997). However, the literature also expressed growing concern about thereliability and validity of clinic-based perceptual measures of stuttering, whichstemmed primarily from problems that independent observers had in agreeingsatisfactorily on the loci of stutters, thereby threatening the validity of a study’sresults (see Cooper, 1990; Ingham, 1990). This problem was not resolved evenwhen observers were given a definition of stuttering (Curlee, 1981; Martin &Haroldson, 1981; Young, 1975a), were required to repeat their judgments overseveral sessions (Cordes, Ingham, Frank, & Ingham, 1992; Young, 1975b), orlistened to slowed nondistorted recordings of stuttered speech (Kroll & O’Keefe,1985). The implications of such findings have been described as underminingmeasurement of treatment success with stuttering (Bloodstein, 1990; Cooper,1990; Ingham, 1990).
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
In addition, Kully and Boberg (1988) had 10 audio recordings of speech, eight
of stuttered, and two of nonstuttered speech, evaluated at 10 clinics throughoutthe world, including Australia, Canada, England, and the US. They reported notonly large interclinic discrepancies (e.g., the percentage of syllables stutteredranged from 3.80% SS to 13.70% SS), but large discrepancies in % SS betweensamples of the two normal speakers as well. Such high variability in stutteringmeasures questioned not only the value of data that depict outcome evaluations,but the fact that some clinics rated fluent speakers with relatively high % SS (e.g.,one clinic scored a fluent speaker with 4.79% SS), which raised further concernsabout the clinical significance of posttreatment measures of stuttering reduction. These findings were later replicated by Ingham and Cordes (1992), and a studyby Ham (1989) also reported large differences in the ways that clinicalresearchers quantify stuttering events.
The identification of stuttering behaviors has an established literature of its
own (e.g., see Costello & Ingham, 1984; Smith, 1990; Starkweather, 1987 forfurther discussion). It is apparent that the reliability of clinical measures ofstuttering provides a shaky foundation for evaluating treatment efficacy (Ingham,1990). Nevertheless, some authors have proposed that such measures do notnecessarily pose a serious threat to the assessment of treatment efficacy (e.g.,Starkweather, 1993). Kully and Boberg clearly showed that different cliniciansemploy different protocols in counting stutters, but the purpose of collecting suchspeech measures in clinical practice is to document trends in clients’ speechperformance before, during, and after treatment (Onslow, 1996). Kully andBoberg’s data showed disparate percentages of stuttered syllables; however, theirdata also indicated that clinicians generally identify the same trends, whichsuggests that several measurement issues may play a more critical role inevaluations of treatment efficacy.
Even though interjudge reliability is an important issue, intrajudge reliability
may be more important in clinical practice because measures of stuttering need tobe internally consistent from session to session across extended periods of time. Packman, Ingham, and Onslow (1993) examined seven clinicians who worked inthe same clinic and found that the number of stutters counted in various speechsamples differed when the clinicians re-counted the same samples at a later time;however, the relatively high levels of intrajudge agreement indicated that theclinicians were capable of making consistent clinical measures. As might beexpected, the most valid measures of stuttering are those based on perceptualjudgments of reliable observers who are well acquainted with the clinical signs ofstuttering and that systematic comparisons of such measures across situations andtime are appropriate for evaluating treatment efficacy (Curlee, 1993). In addition,Cordes et al. (1992) reported that a judge’s amount of experience with stutteringand ability to review repeatedly recorded speech samples play an important partin determining levels of agreement.
Due to the well-documented variability of stuttering within subjects, speech
samples should ideally be obtained under multiple conditions and on multiple
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
occasions (Conture, 1997; Conture & Guitar, 1993; Yaruss, 1997). This can beparticularly important for young children, as stuttering has been reported tofluctuate greatly over time and sometimes cease entirely (Ingham & Riley, 1998). Druce, Debney, and Byrt’s (1997) study of 6- to 8-year-old children obtained twopretreatment measures during conversational interactions with a family memberand with an unknown person to represent low- and high-stress speakingsituations, respectively. Boberg and Sawyer (1977) reported that follow-upmeasures are likely to be biased if collected in the same clinical environmentin which treatment was administered because clients were observed to stuttermore when conversing outside the clinical setting than when speaking with astranger in the familiar surroundings of the clinic.
It has also been recommended that speech measures be collected without
clients’ knowledge that their speech is being evaluated, so that they do not react tobeing assessed and try to create a favorable outcome. Ingham (1972) comparedcovert and overt assessments and found that stutterers generally did speak morefluently when they were aware that their speech was being evaluated. However,these findings have been greatly contested by studies (Andrews & Craig, 1982;Howie, Tanner, & Andrews, 1981; Howie, Woods, & Andrews, 1982) which havefound that such behavior occurs only in some individuals. Perhaps more importantis that speech samples be obtained in clients’ natural environments, as ecologicalvalidity is uncertain when only measures obtained in the clinic are used (Conture &Guitar, 1993; Costello & Ingham, 1984; Starkweather, 1993). In an experimentaltrial of an operant treatment of early stuttering, Onslow, Andrews, and Lincoln(1994) collected (a) home recordings of children speaking to family members inthe family home; (b) away-from-home recordings of children speaking to non-family members in the homes of family friends and relatives; (c) covert recordingsof children speaking to family members at home without the children’s knowledge;and (d) recordings of children conversing with an investigator. Without a broadcollection of measures in various settings, investigators cannot be certain thatclinic-collected data can be generalized to many outside-of-clinic speakingsituations. These authors were able to use the mean/median posttreatment speechmeasures obtained in everyday speaking situations of these children to showsignificant decreases from those collected pretreatment in the same situations.
Many types of dependent variables have been measured, in addition to the
percentage of stutter and syllable counts, and have played an increasinglyimportant role in evaluating treatment efficacy. Profiles of treatment outcomesshould include stuttering severity, speech rate, and speech naturalness. Variousmethods of assessing stuttering severity have been described (Conture, 1997;Costello & Ingham 1984), and severity can be influenced by a number of factorssuch as parental concerns, teasing from peers, and a child’s own frustrations. Furthermore, if a treatment employs techniques that aim to alter speech rate,(e.g., prolonged speech treatments), speech rate measures are critical in evalu-ating outcomes because such treatments may result in unnaturally slow ormonotone speech.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Some researchers believe that assessments of treatment efficacy should also
include measures of speech naturalness (Costello & Ingham, 1984; Martin,Haroldson, & Triden, 1984; Starkweather, 1993), which can be affected byspeech rate, inflection, articulation, resonance, loudness, vocabulary, and sen-tence structure (Ingham & Riley, 1998). Because so many treatments incorporatechanges in clients’ speech patterns to induce fluency, measures of speechnaturalness allow clinicians to ensure that spontaneous, normal-sounding speechhas not been sacrificed or overlooked by efforts to objectify the efficacy of agiven treatment. Druce et al. (1997) collected measures of % SS, speech rate,speech naturalness, and subjective ratings of stuttering severity in evaluating anintensive treatment program for 15 young children who stuttered. This combina-tion of measures enabled them to conclude that the reduction in % SS wasconsistent with improvements in both severity and naturalness ratings, therebyoffering complementary support of the children’s improved fluency.
Issues of reliability are important in treatment efficacy studies because
stuttering frequency, severity, speech rate, and speech naturalness are filteredthrough what Ingham and Riley (1998) described as ‘‘the complexities of humanperception,’’ which are subject more often to error and bias than are measurescollected with objective instruments. This again raises the issue of adequate inter-or intrajudge reliability and the need to report on a number of such key variablesas level of training and qualifications of clinicians and the procedures used toensure independent judgments (Cordes, 1994; Cordes et al., 1992; Lewis, 1994)if the efficacy of treatments is to be critically examined.
One variable, which was noted as underinvestigated elsewhere (Woods, Fuqua,
& Waltz, 1997), is clients’ use of avoidance behaviors to diminish stutteringseverity, which may be particularly relevant to some operant-based treatments. Forexample, clients may undergo avoidance conditioning whenever they performcertain behaviors to avoid stuttering (Bandura, 1969). Woods et al. (1997) reportedthat a 6-year-old boy’s frequent replies of ‘‘I don’t know’’ were used to avoid moreextensive answers to conversational questions. Starkweather (1993) cited suchbehaviors as stalling for time, forcing words out, changing words altogether, orlosing eye contact as ways to minimize the aversive experience of stuttering. Ifsuch avoidance behaviors displace stuttering, they may limit or prevent effectiveapplication of treatment as well as limit assessment of therapeutic efficacy. Futureresearch should include assessment of avoidance behaviors through functionalanalyses of avoidance responses, which can be accomplished by monitoring theoccurrence of various avoidance responses in stuttering-prone situations andduring treatment (Woods et al., 1997) so that avoidance responses can beidentified and procedures for their elimination introduced.
Conture and Guitar (1993) argued that neither the short-term, medium-term,
nor long-term efficacy of therapy can be documented without objective measures. Subjective measures of experimenters, people who stutter, family, or friends areinadequate on their own and may often be colored by expectations, hope, etc. Bloodstein (1995) stated that variables which can affect subjective judgments
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
may be subtle and difficult to predict. For example, Lanyon, Lanyon, andGoldsworthy (1979) found that clinician’s predictions of how successful a personwho stutters would be in mastering a biofeedback treatment procedure weresignificantly related to their eagerness for clients to present themselves in afavorable light, as indicated by k scores on the MMPI. Findings also revealed thatthe k scores of persons who stutter had little relationship to objective measures oftheir progress. Substantial individual variation is often present at onset and in thedevelopment of stuttering, which encourages the development of individualizedtreatment programs. However, this may mislead clinicians to believe that a clienthas achieved more than he/she has. Starkweather (1993) claimed that the onlyway to control this problem is to maintain clear identification and measurement oftreatment goals. Research has been unable to provide a conclusive demonstrationthat one method of counting stuttering is more valid than another (Ingham &Riley, 1998). Therefore, studies of treatment efficacy must describe how stutterswere identified and recorded, the clinicians who made them, their experience, andthe reliability of their counts.
Another important issue that needs to be addressed is whether a clinician
administers an intervention correctly because this determines if the treatment thatwas administered is the same as that for which efficacy data are reported (Conture& Guitar, 1993). The call for high-quality, detailed descriptions in publishedreports of the specific treatments, procedures, and measures employed (Ingham &Riley, 1998) to support the reliability and validity of efficacy data is the hallmark ofscience that would ensure that others applying the same procedures would obtaincomparable treatment effects. Siegel (1990) has argued that replication is the moststringent test of reliability and for extending the external validity of researchfindings (Venty & Schiavetti, 1986). Similarly, Meyers (1990, p.178) emphasizedthe importance of replication in noting that ‘‘Replication is worth 1000 t tests.’’
Onslow (1992) critically examined the literature on stuttering intervention and
found that only a few of the many studies conducted have been replicated evenonce, over a period of 20 years. Muma’s (1993) survey of one fourth of thestudies published in the Journal of Speech and Hearing Disorders and Journal ofSpeech and Hearing Research over a decade (1979 – 1989) led him to claim thatTypes I and II errors likely account for 50 – 250 false findings in the 1712 studiessurveyed and could be misdirecting treatment in the field. Only 12 directreplications were found, in contrast to the unknown number of false findingsestimated by Muma, revealing an urgent need for more replications in this area. Replications that yield conflicting results raise considerable questions aboutwhich study is to be believed, but Muma pointed out that different outcomescould be obtained because of different subject samples, different performancesamples, invalid measures, variations in procedures, and inappropriate data
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
analyses. For example, Onslow, Adams, and Ingham (1992) may have failed toreplicate the results of Martin et al. (1984) for a number of these reasons. Incontrast, if replications yield comparable results, such as Vihman and Greenlee’s(1987) replication of Grunwell’s (1981) work on children’s phonological devel-opment, the substantive base of the field is extended. Researchers’ replications ofthe work of others not only increases their own knowledge but may expand theirown research interests and test new hypotheses as well.
The implications for treatment efficacy research are highly significant,
replications would ascertain the likelihood of false findings, or would extendthe generalizability of studies with relatively few participants, which applies tothe large proportion of stuttering treatment research that involved single-subjectresearch designs. A number of researchers have commented that replication isignored too often in treatment efficacy research (e.g., Attanasio, 1994; Meline &Schmidt, 1997; Muma, 1993; Onslow, 1992). Moreover, a critical review ofchildren’s treatment success by Craig, Chang, and Hancock (1992) found that aconsiderable number of treatment studies, ranging from response contingent toprolonged speech methods, lacked replication, thereby reducing confidence in thesuccess of such treatments.
Another important issue has been described as the surety of treatment fidelity
(Yeaton & Sechrest, 1981). Ingham and Riley (1998) maintained that informationon the training and supervision of clinicians is important but that empiricalevidence that a treatment was, in fact, administered correctly is much morepowerful by the focus on the reliability of administration of independentvariables. In recent years, parent’s presence in therapy sessions, as observersor active agents in home programs, has significantly increased with the aim offacilitating generalization and maintenance of treatment effects (Felsenfeld,1997). As a result, several findings have been identified that may haveimplications for treatment efficacy. Studies of fathers’ and mothers’ paralinguis-tic behaviors have shown that fathers tend to talk more during parent – childinteractions (Kelly, 1993; Kelly & Conture, 1992; Schulze 1991) and thatreductions in parents’ speech rates are correlated (r = .47) with reductions inchildren’s disfluencies (Starkweather & Gottwald, 1993). Several other variableshave also been found to correlate with children’s disfluencies, includinginterruptions (Rustin & Cook, 1983), methods of discipline (Prins, 1983),negativity, and excessive questioning (Fosnot, 1993). A key variable that canaffect studies of the efficacy of various treatment approaches is the accuracy withwhich clinicians are delivering prescribed treatments. Ryan and Van Kirk Ryan(1983) described several discrepancies in clinicians’ behavior, even after train-ing, which included failing to teach slow, prolonged speech patterns duringinitial stages of treatment, and undercounting or failing to count stutters in laterstages. As Ingham and Riley (1998) pointed out, if a treatment efficacy studydoes not include evidence of treatment fidelity in all aspects of administration,readers of the study cannot be sure that the results reported are the product of thetreatment applied.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Treatment efficacy research requires designs which can establish that the
treatment effects observed are clearly the product of the treatments applied. Theadvantages and disadvantages of the different designs used in stuttering treatmentresearch have an extensive literature base (e.g., Barlow & Hersen, 1984; Ingham& Riley, 1998; Schiavettii & Metz, 1997); therefore, this discussion focuses onfactors or components within designs that are critical for evaluating treatmentefficacy. It should be noted, however, that single-subject experiments are suitablefor demonstrating overall effects of a given treatment and differential effects ofindividual components of a treatment (e.g., Costello, 1975). Group studies arebest done once single-subject studies have convincingly demonstrated that atreatment produces desirable clinical effects (Ingham & Riley, 1998). Althougheffect – size statistics (Meline & Schmidt, 1997; Young, 1993), which estimate themagnitude of statistically significant differences, have improved the clinicalrelevance of group data analyses, this is still a less sensitive demonstration oftreatment efficacy than are findings from single-subject studies.
An efficient way of illustrating ‘‘good’’ and ‘‘bad’’ designs is to examine key
criticisms that have been directed toward various therapy approaches. Traditionalapproaches to treating stuttering, especially for children who stutter, involveparent counseling. These approaches attempt to enhance fluency indirectlythrough manipulations aimed at improving parent – child relationships (Blood-stein, 1987). Such manipulations may include play therapy, reducing anxiety,increasing confidence, or parental modeling of communication behaviors (Greg-ory & Hill, 1980). Most reports of this work are anecdotal or present only casehistory evidence of a treatment’s success, and the lack of objective evidenceseverely limits conclusions that can be drawn about the efficacy of suchtreatment. Response contingent methods, which are viewed as scientificallyrigorous methods, have attracted their share of criticisms on the conclusionsdrawn about treatment success. For example, Manning, Trutna, and Shaw (1976)attempted to determine if tangible forms of rewards were more effective thanverbal rewards in reducing stuttering. Both forms of reward appeared to besuccessful, but their unique contributions could not be measured becausetreatment involved a number of therapy procedures. As a result, the efficacy ofindividual treatment components could not be isolated and assessed. Response-contingent procedures have also received substantial criticism for small samplesizes, lack of control groups (Martin, Kuhl, & Haroldson, 1972; Reed & Godden,1977), poor external validity and lack of replication (Costello, 1975), and failureto monitor speech rate and progress in the long term (Costello, 1975; Onslow,Costa, & Rue, 1990).
The Gradual Increase in Length and Complexity of Utterance (GILCU)
program is another treatment reported to have significant clinical success (Ryan,1971). Ryan and Van Kirk Ryan (1983) compared the effectiveness of fourtreatment programs (i.e., programmed traditional, delayed auditory feedback,
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
time-out contingency, and GILCU) with 16 school age children who stuttered. Allfour treatments were reported to be effective in reducing stuttering, but only 4 of16 children completed follow-up stages. The study’s large dropout rate and lackof a control group, which avoided raising ethical concerns, raised questions aboutthe success of these treatments. However, this study does illustrate some gooddesign principles. For example, treatment effects were assessed not only in theclinic but also in the children’s classroom across the four treatment groups. Inaddition, replications of the GILCU program have shown that it can be asuccessful treatment for stuttering (Rustin, Ryan, & Ryan, 1987).
Treatment variants of prolonged speech such as smooth speech have also
produced high success rates, but many studies failed to obtain long-termoutcome data (e.g., Debney & Druce, 1987), failed to use case history reporting(e.g., Casteel & McMahon, 1978), and lacked matched control groups andreplication support (e.g., Cooper, 1987; Culp, 1984). Regulated breathingtechniques have also come under criticism for using a number of treatmentsin combination. Azrin and Nunn (1974) used a mixture of relaxation therapies,self-awareness of stuttering, regulated breathing techniques, together withparent and social support. Although almost total elimination of stuttering wasreported, there were no control groups and only subjective measures ofoutcome. Also Ladouceur and Martineau (1982) reported that regulatedbreathing was more effective in a combination treatment program than as asingle treatment procedure. Several other approaches to stuttering therapy havefallen into similar design traps, which has led to questions about the treatments’efficacy (see Craig et al., 1992).
Whatever treatment is applied, long-term monitoring of clients, especially
children who stutter, should not be ignored. Because of the high proportion ofchildren who evidence spontaneous remissions of stuttering, plus the highdropout rates in studies, treatment success is difficult to measure. Furthermore,Andrews and Harvey (1981) claimed that regression to the mean invalidatesfindings of intervention studies that do not establish stable pretreatment base-lines or use adequate randomly assigned or matched untreated control groups. Moreover, Hanna and Owen (1977) reported that stutterers and their parentshave a tendency to seek help when stuttering appears to be at its worst. Aftertreatment is sought and before it begins, there appears to be a spontaneousreturn of symptoms to their average level, which may be nonspecific benefits ofhaving sought help. Andrews and Harvey (1981) described six studies thatassessed stuttering 2 months prior to treatment. All six showed a trend (i.e.,differences were not statistically significant) to less stuttering on the secondassessment. Nevertheless, the authors concluded that study designs shouldallow time for regression of stuttering to mean severity levels in pre-posttreat-ment outcome designs to avoid inflating the magnitude of treatment effects andconfounding estimates of improvement that are due to therapy. They suggestedcollecting time series data until a stable baseline is achieved or holding peoplewho stutter on a waiting list for at least 3 months. As noted earlier, speech
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
samples assessing treatment outcomes should be obtained in nontreatmentsettings (Ingham & Riley, 1998).
Another issue is how to control for spontaneous recoveries in young children,
which occurs in a relatively large proportion of them within the first year of onsetand presents a serious complication for treatment efficacy studies of preschoolage children (Andrews & Harris, 1964; Bloodstein, 1995; Yairi & Ambrose,1992). This undermines confidence of the findings reported by treatment efficacystudies, as stuttering may disappear irrespective of any treatment received. Curleeand Yairi (1997) argued that a control group is essential in such cases so that atreatment group would have to ‘‘beat the odds’’ to provide convincing evidenceof treatment effects (Ingham & Riley, 1998). It has been further suggested thatgroups should be matched on a number of key variables that may be predictive ofspontaneous recovery, such as duration of stuttering, age at onset, gender, andfamily history of stuttering recoveries (Curlee & Yairi, 1997; Yairi, Ambrose,Paden, & Throneburg, 1996). There is, however, an ethical question that needs tobe addressed — is it unfair to assign young people who stutter into untreatedcontrol groups, denying them treatment? Some parents whose children were incontrol groups opted for their treatment (Onslow et al., 1994).
Given the nature and number of therapy approaches employed for stuttering,
it is not surprising that no one design has emerged as the best for assessingtreatment efficacy. It is unfortunate that few firm conclusions can be drawn aboutmost treatments because there has been little attention paid to assessing long-term outcomes, a reliance on single-subject designs without replications or largernumbers, and group research lacking adequate controls, all of which may createfalse impressions and beliefs. Although incorporating adequate controls, pre-posttreatment measures, and many features noted earlier in this review, appear tobe critical for improving studies of treatment efficacy; investigators face anumber of challenges, not the least of which is balancing ethical with methodo-logical issues. Nevertheless, long-term assessments are essential (Starkweather,1993). Onslow (1996) sets the minimum post-treatment interval at one year, withsome advocating at least a 2- to 5-year follow-up assessments before confidencecan be placed in a treatment’s success (Bloodstein, 1995; Craig et al., 1992).
5. Criteria for assessing efficacy studies: comparison with Moscicki (1993)and a meta-analysis of studies from that date
The issue of balancing ethical and methodological considerations is picked up
in this section. A previous review on efficacy research in the Journal of FluencyDisorders (Moscicki, 1993) presented a list of criteria that studies should meet:
(a) Careful attention should be paid to the selection and representativeness of
participants, and estimates of the reliability and validity of all measuresshould be provided;
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
(b) Sample sizes must be sufficiently large to show statistically significant
differences between experimental and control groups;
(c) Studies should have clear operational definitions of outcomes, with
careful attention given to the instrumentation used in assessing andclassifying outcomes;
(d) Treatment must be provided using a standard treatment protocol by
clinicians who have received standardized training in administeringthe procedures;
(e) Administration of treatment should include procedures for monitoring
clinicians’ adherence to the treatment protocol and by study participants;
(f) Conducting pilot studies is essential to resolving design difficulties and
other issues prior to initiation of full-scale experimental studies;
(g) Follow-up assessments must be of sufficient duration to determine the
(h) Potential sources of bias need to be anticipated and accounted for in the
study’s design and by analytic models; and
(i) Data analysis must be appropriate to the study’s design.
There are some differences of emphasis between Moscicki’s methodological
criteria and those suggested in this review as a consequence of differences infocus. First, Moscicki’s remarks are directed specifically at randomized controldesigns. Second, her criteria are not divided into those pertaining to measure-ment, integrity, and design. Third, we believed it would be unfair to use hercriteria in evaluating published efficacy studies. After 8 years, however, it didseem appropriate to see if some efficacy studies had met at least some of thesecriteria. An evaluation of such studies will employ the criteria we propose,which include both pragmatic and ethical factors but still overlap those ofMoscicki to a large degree. As Robinson Crusoe learned, there is no point inbuilding something, a boat in his case, which is of such quality that prevents itfrom being used in practice (i.e., the weight of Robinson’s boat prevented itbeing moved into the sea). On the other hand, a boat that is too flimsy is adanger to its passengers. The real question is how to get the right balancebetween ideal and practice.
As will be seen, there is agreement with Moscicki on theoretical perspective as
well as broad agreement on criteria pertaining to efficacy – practice, too. How-ever, some of Moscicki’s criteria involve requirements that are rarely met inempirical studies, which can sometimes be justified on pragmatic or ethicalgrounds. An example is ‘‘representativeness’’ in criterion (a), which is oftenignored when speakers’ frequency of stuttering must be high enough to allow theeffects of a treatment to be shown (e.g., Ryan & Van Kirk Ryan, 1995). Fewstudies involve control groups because of concerns about the ethics of denyingtreatment to certain speakers who stutter. Although Hancock et al.’s (1998) studyincluded a control group, control subjects were not prohibited from treatment inthe long term, which prevented the study from obtaining long-term control data.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Efficacy studies are costly to perform, and alternative ways can obtain theinformation that pilot studies provide, such as single-subject experimentaldesigns (criterion f). Our criteria are not specific to randomized control designs,attempt to recognize the ethical and pragmatic constraints, and are organized asthree categories: measurement, treatment integrity, or design. Correspondenceswith Moscicki’s criteria are designated at the end of the following criteria by theletter from her list:
(1) Clear identification and measurement of treatment goals must be
provided, for example, by checks on the administration of thetreatment and/or detailed descriptions of specific treatments. Otherresearchers performing the treatments can then check and see whetherthey obtain comparable effects. Effects of treatment outcome should atleast assess reduction in stuttering frequency and preferably theinfluence of treatment on severity [c].
(2) Measures must be reliable and valid. Methods for assessing reliability
and validity need to be evaluated separately and reported in studies [a].
(3) Sampling and satisfactory analysis of speech measures (which is
essential for reliability and consistency). Although it is oftenrecommended that a range of samples from different contexts beobtained (see number 13, below), this may be impractical at times, andfewer data analysed properly are more valuable than a large data setanalysed badly [h, i].
(4) Accurate data analysis is essential. Valid data have little or no value
when analysed inappropriately [h, i].
(5) Adequate supervision and training should be provided to clinicians to
ensure the reliability of administration of the treatment [d].
(6) Empirical evidence should be collected to establish that a treatment was
administered properly, as diverse administrations of treatment can affectoutcome. This criterion assumes that treatment procedures were properlydescribed [e]. (Note: Subjects’ adherence, as Moscicki mentions, canbe checked only if subjects are self-administering treatments.)
(7) Adequate sample size is necessary. Few current studies include effect –
size statistics, which needs to be remedied in the near future. Single-subject experiments may be appropriate for individualized treatments. Group studies may best be done after single-subject experimentsdemonstrate that a procedure produces desirable effects [b].
(8) Control groups need to be appropriately matched on key variables
(e.g., factors known to be associated with spontaneous recovery) [b].
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
(9) Potential confounding factors in the treatment design must be
identified, monitored, and reduced or eliminated when possible. Suchfactors can gravely affect the outcome of results. Treatments programsneed to be carefully examined in terms of the impact of differentrewards and punishments, avoidance behaviors, impact of attrition onoutcome, and effects of individual treatments in combined treatmentprograms [d, e, h, i].
(10) Follow-up assessments should be conducted for at least a year after
treatment, especially for children. If treatment is unsuccessful,however, it may be unethical to hinder a client from trying othertreatments in the long term (Hancock et al., 1998) [g].
(11) Stable, pretreatment baselines or matched or randomly assigned control
group data are needed. If the need for treatment is believed to be urgent,collecting baseline data may not be ethical.
(12) Measures of intrajudge reliability should be collected, which should
show higher reliability than interjudge measures. On occasions, it maynot be possible or practical to have the same judge, for example, whenthere are larger samples. Single-subject studies should employintrajudge assessments, unless more than one judge participates inthe study. If interjudge measures are used, it is important to focus onthe trends or patterns of measures, rather than specific numbers.
(13) Speech assessments should be obtained under multiple conditions and
on various occasions. This may not be feasible if particular equipmentcan be used only in laboratory settings or if parents are not able to bringa child to the clinic, necessitating clinicians to visit a child at home.
This review supports the need for all but one of Moscicki’s criteria (i.e., pilot
studies are not warranted). Also, different emphases are placed on some criteria,such as the need to document adherence to treatments or to examine avoidancebehaviors. Some criteria are always met to the letter but not necessarily the spiritof the criterion. For example, describing how treatment is provided and itsoutcomes is covered in all studies, but only a few invested effort to fully meetthese criteria. Reliability varies substantially across studies with some reportingreliability statistics for overall number of stutters and others individual stutteringevents. Relatively little work has been done on intrajudge rather than interjudgereliability, which may be problematic because a high proportion of studies reportjudgements from a single judge, suggesting that intrajudge measures are moreappropriate, but may also report interjudge reliability on subsets of samples. Similar problems occur in reports of the statistics that most, but not all studies,include at present. For instance, it is not always clear if researchers have checkedtheir data to see that the assumptions for parametric tests are met. Studies can be
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
divided into those that use designs that are appropriate for single-subject orgroups. For reasons that will be apparent later, we agree with Moscicki’s (1993)preference for group designs. Some of the criteria may inter-relate. For example,ethical considerations may encourage early reports about promising investi-gations on small samples, which may outweigh statistical criteria.
Table 1Summary of whether eight studies since 1993 (identifiable by the letters along the top) met the criteriaalong the side (y) or not (n) when the criterion is appropriate for evaluating a study (NA indicates thata criterion was not appropriate)
The studies are (a) Eichstadt et al. (1998), (b) Druce et al. (1997), (c) Craig and Kearns (1995),(d) Stager, Ludlow, Gordon, Cotelingam, and Rapoport (1995), (e) Hancock et al. (1998), (f) Inghamet al. (1997),
(g) Ryan and Ryan (1995), and (h) Onslow et al. (1996).
a A criterion was set for two judges to meet. If they did not, the speech was reassessed. b The time interval measure used in study (f) does not provide a valid estimate of stuttering
(Howell, Staveley, Sackin, & Rustin, 1998).
c Study (b) included a high number of bilinguals, though analysis indicated that this had little effect. d Reporting no statistics in single case studies may be an overstrict criterion, so ‘‘ – ’’ is added to
studies (a) and (f). Craig and Kearns (1995) (study c), however, did perform statistical analyses. Theyfound significant effects of treatment that they dismissed (raising the possibility of a Type II error).
e ‘‘Control data’’ means different things in different designs — Moscicki regards this as coming
from nontreated individuals as she considered randomized control designs. (This criterion was onlymet by Hancock et al., 1998 [study e] and then only in the initial phase — see footnote f.) In the Stageret al. (1995) drug study (d), control was appropriate (a placebo phase was followed by a phase inwhich one of two drugs was administered, an active experimental drug and nonactive control drug,and the experimenters were blind as to which drug a subject was receiving). In single case studies (c)and (f), the control condition was the period when no treatment was administered between periodswhere treatment was given (control was not satisfactory in study a). In studies (b), (f), and (h), acontrol derived from a single baseline measure (e.g., as in study h) may not be satisfactory, as peoplewho stutter may seek help when their problem is worst. Nevertheless, at this time, a single baselinecontrol was categorised adequate. When the length of time between diagnosis and treatment is notspecified, this concern is signified by a ‘‘?’’ after y.
f The Hancock et al. study was a 4-year follow-up study. This made for some ethical problems
about what to do with the controls, as the authors recognized. They did not deny subjects who hadinitially served as controls subsequent treatment in the follow-up period. This seems justifiable from apragmatic perspective (bearing in mind the length of the follow-up period).
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Having voiced these concerns, there is no reason to avoid the issue of
assessing how adequately studies published since 1993 have met the revampedcriteria. Table 1 summarises assessment of eight studies with respect to the first11 of our criteria that were listed above. The studies were obtained from theJournal of Fluency Disorders and Journal of Speech, Language, and HearingResearch between 1993 and 2000. A total of nine were found, one of which was afollow-up of a study that appeared earlier during this period, so the earlier onewas not included (i.e., Hancock et al., 1998, was used instead of Craig et al.,1996). The studies are: (a) Eichstadt, Watt, and Gibson (1998); (b) Druce et al. (1997); (c) Craig and Kearns (1995); (d) Stager et al. (1995); (e) Hancock et al. (1998); (f) Ingham, Moglia, Frank, Ingham, and Cordes (1997); (g) Ryan and VanKirk Ryan (1995); (h) Onslow, Costa, Andrews, Harrison, and Packman (1996). Note that the countries in which these studies were performed are limited toAustralia, South Africa, and the US. Evaluating them with the criteria weproposed is somewhat subjective for the reasons discussed above. A minimumcriterion of acceptability was used for all criteria. For instance, any indication thatjudges’ reliability was measured was deemed satisfactory, implicit definitions oftreatment outcomes were considered acceptable, and if statistics were reported,they were considered appropriate even if there were no indications that there hadbeen a check that the assumptions of the test had or had not been met. Clearly,there is an urgent need to tighten the acceptability criterion before the next reviewappears. In Table 1, we present a breakdown of whether each study met (y) or didnot meet (n) each criterion. If a criterion was not appropriate for assessing a study(e.g., sample size in a single subject design, NA indicates that the criterion wasnot applicable for assessing the study). In single case studies (i.e., a and f),statistics were not reported (criterion 4), but descriptive data were given. Statistical analysis is possible in single-subject studies (e.g., time series analysesif there are enough data points or nonparametric randomization tests if there arefewer data). Few studies of conditioning treatments do this, so rather thanindicating that such studies had not met the criterion, a ‘‘ – ’’ was entered. Theprincipal finding of this analysis is that a high number of studies met the minimalcriteria. So, with the proviso of minimum criteria being used, the standards ofefficacy research published since Moscicki’s (1993) review seems high. Oneother factor, not apparent in the table, is that the Ryan and Van Kirk Ryan (1995)was a replication of a previous report of their own work.
Few disorders pose greater challenges to the assessment of treatment efficacy
than does stuttering (Curlee, 1993), and it is important to note that this review hasaddressed only some of the key determinants for efficacy evaluations. Forexample, there is little discussion of sampling speech measures, data, or poweranalyses (Murphy & Myors, 1999), parents as treatment components or admin-
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
isters of treatment, self-administration of treatment, or the influence of nontreat-ment variables on treatment effectiveness (Yairi, 1997). It is evident thatmethodological considerations are fundamental for documenting treatment effi-cacy. The issues discussed illustrate a significant point, however: A considerableamount of work is needed to refine, document, and assess treatment proceduresand their efficacy. In most cases, treatment research has not provided rigorousexperimental evidence (Zebrowski & Conture, 1998). Furthermore, as Onslow(1996) noted, posttreatment periods of at least 1 year are the minimum foradequate outcome evaluation and few studies met this methodological standardfor examining long-term treatment effects. In addition, Conture and Wolk (1990)noted that long-term treatment effectiveness research is long overdue, especiallystudies of the relationship of specific behavioral and attitude changes to thechanges in stuttering.
Baer (1990) suggested that treatment research would be improved if clinical
researchers discerned what clients’ main complaints or concerns are. It is not untilthis is done that variables that need changing can be identified (e.g., children maywant to reduce feelings of self-consciousness). A number of measures (i.e.,stuttering frequency, speech rate, abnormal speech quality) may not evidencesignificant differences between stuttered speech and normally fluent speech ofstutterers and nonstutterers, respectively (Ingham & Cordes, 1997). Thus,assessment of stuttering might benefit from a renewed focus on self-judgedmeasures. On the whole, however, the need for well-controlled research aimed atdeveloping and investigating the success of stuttering treatment is evident,including variables not directly involved in treatment being considered in parallelwith treatment variables.
Many treatment programs employ several treatment approaches, such as the
Monterey Fluency Program (Ryan & Van Kirk Ryan, 1995) and StutteringIntervention Program (Pindzola, 1999). Therefore, when evaluating their efficacy,the contribution of each treatment component should be determined in order toevaluate their combined effects. Although systematic replications are missingfrom the stuttering literature on treatment efficacy, Attanasio (1994) suggestedthat this reflects, in part, the reluctance of some journals to publish replicationstudies and an unwillingness to give journal space to unsuccessful or negativetreatment findings. It would be a positive move for such publication practices tomodify such attitudes as a means of encouraging clinicians to focus on soundmethodological treatment efficacy issues in their work. There has been littlesystematic study of treatment failures, which could, in fact, provide crucialinformation for improving the efficacy of stuttering treatments. If a treatment isnot effective, there is a need to know if the fault lies with treatment procedures,clinician’s application of those procedures, client’s practice habits, motivation, orother variables.
Initiation of a systematic program of treatment efficacy research of stuttering
treatments presents an important challenge and is one that the field mustembrace in order to advance (Moscicki, 1993). It is critical that standards are
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
adopted for evaluating treatment outcomes so that significant variations inoutcomes can be reliably identified. On the whole, studies should focus onidentifying specific therapy procedures that contribute the most to successfultreatment outcomes as well as variables that are responsible for treatmentfailures. Children and adults who stutter deserve nothing less than rigorouslytested and empirically supported treatments (Cordes, 1998); therefore, futureresearch needs to incorporate these basic principles in their designs and toidentify new critical variables for study if sounder treatment efficacy evaluationsare to become available.
Andrews, G., & Craig, A. (1982). Stuttering: overt and covert measurement of the speech of treated
subjects. Journal of Speech and Hearing Disorders, 47, 96 – 99.
Andrews, G., & Harris, M. (1964). The syndrome of stuttering. London: Heinemann. Andrews, G., & Harvey, R. (1981). Regression to the mean in pre-treatment measures of stuttering.
Journal of Speech and Hearing Disorders, 46, 204 – 207.
Ansel, B. (1993). Treatment efficacy research in stuttering. Journal of Fluency Disorders, 18,
Attanasio, J. S. (1994). Inferential statistics and treatment efficacy studies in communication disorders.
Journal of Speech and Hearing Research, 37, 755 – 759.
Azrin, N. H., & Nunn, R. G. (1974). A rapid method of eliminating stuttering by a regulated breathing
approach. Behaviour Research and Therapy, 12, 279 – 286.
Baer, D. M. (1990). The critical issue in treatment efficacy is knowing why treatment was applied. In:
L. B. Olswang, C. K. Thompson, S. F. Warren, & N. J. Minghetti (Eds.), Treatment efficacyresearch in communication disorders ( pp. 31 – 39) (Rockville, MD).
Bandura, A. (1969). Principles of behavior modification. New York: Holt, Rhinehart, and Winston. Barlow, D. H., & Hersen, M. (1984). Single case experimental designs. New York: Pergamon. Bloodstein, O. (1987). A handbook on stuttering (4th ed.). Chicago: National Easter Seal Society. Bloodstein, O. (1990). On pluttering, skivvering and floggering: a commentary. Journal of Speech and
Bloodstein, O. (1995). A handbook on stuttering (5th ed.). Chicago: National Easter Seal Society. Boberg, E., & Sawyer, L. (1977). The maintenance of fluency following intensive therapy. Human
Casteel, R. L., & McMahon, R. L. (1978). The modification of stuttering in a public school setting.
Journal of Childhood Communication Disorders, 2, 6 – 17.
Conture, E. G. (1996). Treatment efficacy: stuttering. Journal of Speech and Hearing Research, 39,
Conture, E. G. (1997). Evaluating childhood stuttering. In: R. F. Curlee, & G. M. Siegel (Eds.), Nature
and treatment of stuttering: new directions (2nd ed., pp. 239 – 256). Needham Heights, MA: Allynand Bacon.
Conture, E. G., & Guitar, B. E. (1993). Evaluating efficacy of treatment of stuttering: school-age
children. Journal of Fluency Disorders, 18, 253 – 287.
Conture, E., & Wolk, L. (1990). Efficacy of intervention by speech – language pathologists: stuttering.
Seminars in Speech and Language, 11, 200 – 211.
Cooper, E. B. (1987). The chronic perservatives stuttering syndrome; incurable stuttering. Journal of
Cooper, J. A. (1990). Research needs in stuttering: roadblocks and future directions. ASHA report 18.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Cordes, A. (1994). The reliability of observational data: I. Theories and methods for speech – language
pathology. Journal of Speech and Hearing Research, 37, 264 – 278.
Cordes, A. (1998). Current status of the stuttering treatment literature. In: A. K. Cordes, & R. J.
Ingham (Eds.), Treatment efficacy for stuttering: a search for empirical bases. San Diego: SingularPublishing Press.
Cordes, A. K., & Ingham, R. J. (1998). Treatment efficacy for stuttering: a search for empirical bases.
San Diego: Singular Publishing Group.
Cordes, A. K., Ingham, R. J., Frank, P., & Ingham, J. C. (1992). Time interval analysis of interjudge
and intrajudge agreement for stuttering event judgements. Journal of Speech and HearingResearch, 35, 483 – 494.
Costello, J. M. (1975). Time-out procedures for the modification of stuttering: three case studies.
Journal of Speech and Hearing Disorders, 40, 216 – 231.
Costello, J. M., & Ingham, R. J. (1984). Assessment strategies for stuttering. In: R. Curlee, & W. H.
Perkins (Eds.), Nature and treatment of stuttering: new directions ( pp. 303 – 333). San Diego:College-Hill Press.
Craig, A., Chang, E., & Hancock, K. (1992). Treatment success for children who stutter: a critical
review. Australian Journal of Human Communication Disorders, 20, 81 – 92.
Craig, A., Hancock, K., Chang, E., McCready, C., Shepley, A., McCaul, A., Costello, D., Harding, S.,
Kehren, R., Masel, C., & Reilly, K. (1996). A controlled clinical trial for stuttering in persons aged9 to 14 years. Journal of Speech and Hearing Research, 39, 808 – 826.
Craig, A. R., & Kearns, M. (1995). Results of a traditional acupuncture intervention for stuttering.
Journal of Speech and Hearing Research, 38, 572 – 578.
Culp, D. (1984). The preschool fluency development program: assessment and treatment. In: M.
Peins (Ed.), Contemporary approaches to stuttering therapy ( pp. 39 – 71). Boston: Little, Brown,and Company.
Curlee, R. (1981). Observer agreement on stuttering and disfluency. Journal of Speech and Hearing
Curlee, R. F. (1993). Evaluating treatment efficacy for adults: assessment of stuttering disability.
Journal of Fluency Disorders, 18, 319 – 331.
Curlee, R. F., & Yairi, E. (1997). Early intervention with early childhood stuttering: a critical exami-
nation of the data. American Journal of Speech – Language Pathology, 6, 8 – 18.
Debney, S., & Druce, T. (1987). Intensive fluency programme long-term follow-up. Poster presentation
Druce, T., Debney, S., & Byrt, T. (1997). Evaluation of an intensive treatment program for stuttering in
young children. Journal of Fluency Disorders, 22, 169 – 186.
Eichstadt, A., Watt, N., & Gibson, J. (1998). Evaluation of the efficacy of a stutter modification
program with particular reference to two new measures of secondary behaviours and control ofstuttering. Journal of Fluency Disorders, 23, 231 – 246.
Felsenfeld, S. (1997). Epidemiology and genetics of stuttering. In: R. F. Curlee, & G. M. Siegel (Eds.),
Nature and treatment of stuttering: new directions (2nd ed., pp. 3 – 23). Needham Heights, MA:Allyn and Bacon.
Fosnot, S. M. (1993). Research design for examining treatment efficacy in fluency disorders. Journal
of Fluency Disorders, 18, 221 – 251.
Gregory, H. H., & Hill, D. (1980). Stuttering therapy for children. Seminars in Speech, Language and
Grunwell, P. (1981). The development of phonology. First Language, 2, 161 – 191. Ham, R. E. (1989). What are we measuring? Journal of Fluency Disorders, 14, 231 – 243. Hancock, K., Craig, A., McCready, C., McCaul, A., Costello, D., Campbell, K., & Gilmore, G.
(1998). Two- to six-year controlled-trial stuttering outcomes for children and adolescents. Journalof Speech, Language and Hearing Research, 41, 1242 – 1252.
Hanna, R., & Owen, N. (1977). Facilitating transfer and maintenance of fluency in stuttering therapy.
Journal of Speech and Hearing Disorders, 42, 65 – 76.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Howell, P., Staveley, A., Sackin, S., & Rustin, L. (1998). Methods of interval selection, presence of
noise and their effects on detectability of repetitions and prolongations. Journal of the AcousticalSociety of America, 104, 3558 – 3567.
Howie, P. M., Tanner, S., & Andrews, G. (1981). Short- and long-term outcome in an
intensive treatment program for adult stutterers. Journal of Speech and Hearing Disorders,46, 104 – 109.
Howie, P. M., Woods, C. L., & Andrews, G. (1982). Relationship between covert and overt speech
measures immediately before and immediately after stuttering treatment. Journal of Speech andHearing Disorders, 47, 419 – 422.
Ingham, J. C., & Riley, G. (1998). Guidelines for documentation of treatment efficacy for young
children who stutter. Journal of Speech, Language and Hearing Research, 41, 753 – 770.
Ingham, R. J. (1972). A comparison of covert and overt assessment procedures in stuttering therapy
outcome evaluation. Journal of Speech and Hearing Research, 18, 346 – 354.
Ingham, R. J. (1990). Stuttering. In: A. S. Bellack, M. Hersen, & A. E. Kazdin (Eds.), International
handbook of behavior modification and therapy ( pp. 599 – 631). New York: Plenum.
Ingham, R. J. (1993). Stuttering treatment efficacy: paradigm-dependent or -independent. Journal of
Ingham, R. J., & Andrews, G. (1973). Behavior therapy and stuttering: a review. Journal of Speech
and Hearing Disorders, 38, 405 – 441.
Ingham, R. J., & Cordes, A. K. (1992). Interclinic differences in stuttering-event counts. Journal of
Ingham, R. J., & Cordes, A. K. (1997). Self-measurement and evaluating stuttering treatment
efficacy. In: R. F. Curlee, & G. M. Siegel (Eds.), Nature and treatment of stuttering: new directions(2nd ed., pp. 413 – 438). Needham Heights, MA: Allyn and Bacon.
Ingham, R. J., & Costello, J. M. (1984). Stuttering treatment outcome evaluation. In: J. M.
Costello (Ed.), Speech disorders in children: recent advances ( pp. 313 – 346). San Diego:College-Hill Press.
Ingham, R. J., Moglia, R. A., Frank, P., Ingham, J. C., & Cordes, A. K. (1997). Experimental
investigation of the effects of frequency-altered auditory feedback on the speech of adults whostutter. Journal of Speech, Language and Hearing Research, 40, 361 – 372.
Kazdin, A. E. (1978). History of behaviour modification. Baltimore: University Park Press. Kazdin, A. E., & Kendall, P. C. (1998). Current progress and future plans for developing effective
treatments: comments and perspectives. Journal of Clinical Child Psychology, 27, 217 – 226.
Kelly, E. M. (1993). Speech rates and turn-taking behaviors of children who stutter and their parents.
Seminars in Speech and Language, 14, 203 – 214.
Kelly, E. M., & Conture, E. G. (1992). Speaking rates, response time latencies, and interrupting
behaviours of young stutterers, nonstutterers, and their mothers. Journal of Speech and HearingResearch, 35, 1256 – 1267.
Kroll, R., & O’Keefe, B. (1985). Molecular self-analyses of stuttered speech via speech time expan-
sion. Journal of Fluency Disorders, 10, 93 – 105.
Kully, D., & Boberg, E. (1988). An investigation of interclinic agreement in the identification of fluent
and stuttered syllables. Journal of Fluency Disorders, 13, 309 – 318.
Ladouceur, R., & Martineau, G. (1982). Evaluation of regulated breathing method with and without
parental assistance in the treatment of child stutterers. Journal of Behaviour Therapy and Exper-imental Psychiatry, 13, 301 – 306.
Lanyon, R. I., Lanyon, B. P., & Goldsworthy, R. J. (1979). Outcome predictors in the behavioural
treatment of stuttering. Journal of Fluency Disorders, 4, 131 – 139.
Last, J. M. (1983). A dictionary of epidemiology. New York: Oxford University Press. Lewis, K. E. (1994). Reporting observer agreement on stuttering event judgments: a survey and
evaluation of current practice. Journal of Fluency Disorders, 19, 269 – 284.
Manning, W. H., Trutna, P. A., & Shaw, C. K. (1976). Verbal versus tangible reward for children who
stutter. Journal of Speech and Hearing Disorders, 41, 52 – 62.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Martin, R. R., & Haroldson, S. K. (1981). Stuttering identification: standard definition and moment of
stuttering. Journal of Speech and Hearing Research, 24, 59 – 63.
Martin, R. R., Haroldson, S. K., & Triden, K. A. (1984). Stuttering and speech naturalness. Journal of
Speech and Hearing Disorders, 49, 53 – 58.
Martin, R. R., Kuhl, P., & Haroldson, S. (1972). An experimental treatment with two preschool
stuttering children. Journal of Speech, Language and Hearing Research, 15, 743 – 752.
Meline, T., & Schmidt, J. F. (1997). Case studies for evaluating statistical significance in group
designs. American Journal of Speech – Language Pathology, 6, 33 – 41.
Meyers, S. (1990). Tempest is a t test? A reply to Finn and Gow. Journal of Speech and Hearing
Moscicki, E. K. (1993). Fundamental methodological consideration in controlled clinical trials. Jour-
nals of Fluency Disorders, 18, 183 – 196.
Muma, J. R. (1993). The need for replication. Journal of Speech and Hearing Research, 36,
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects:
minimum-effect tests in the general linear model. Journal of Applied Psychology, 84, 234 – 248.
Olswang, L. B. (1993). Treatment efficacy research: a paradigm for investigating clinical practice and
theory. Journal of Fluency Disorders, 18, 125 – 131.
Onslow, M. (1992). Choosing a treatment procedure for early stuttering: issues and future directions.
Journal of Speech and Hearing Research, 35, 983 – 993.
Onslow, M. (1996). Behavioral management of stuttering. London: Singular Publishing Group. Onslow, M., Adams, R., & Ingham, R. (1992). Reliability of speech naturalness ratings of stuttered
speech during treatment. Journal of Speech and Hearing Disorders, 35, 994 – 1001.
Onslow, M., Andrews, C., & Lincoln, M. (1994). A control/experimental trial of operant treatment for
early stuttering. Journal of Speech and Hearing Research, 37, 1244 – 1259.
Onslow, M., Costa, L., Andrews, C., Harrison, E., & Packman, A. (1996). Speech outcomes of a
prolonged-speech treatment for stuttering. Journal of Speech and Hearing Research, 39, 734 – 749.
Onslow, M., Costa, L., & Rue, S. (1990). Direct early intervention with stuttering. Journal of Speech
and Hearing Disorders, 55, 405 – 416.
Packman, A., Ingham, R. J., & Onslow, M. (1993). Reliability of listeners’ stuttering counts: the effect
of instructions designed to reduce ambiguity. Submitted for publication.
Pindzola, R. (1999). The stuttering intervention program. In: M. Onslow, & A. Packman (Eds.), The
handbook of early stuttering intervention. San Diego: Singular Publishing Group.
Prins, D. (1983). Continuity, fragmentation and tension: hypothesis applied to evaluation and inter-
vention with pre-school disfluent children. In: D. Prins, & R. J. Ingham (Eds.), Treatment ofstuttering in early childhood: methods and issues ( pp. 21 – 42). San Diego: College-Hill Press.
Purser, H. (1987). The psychology of treatment evaluation studies. In: L. Rustin, H. Purser, & D.
Rowley (Eds.), Progress in the treatment of fluency disorders ( pp. 258 – 273). London: Taylorand Francis.
Reed, C. G., & Godden, A. L. (1977). An experimental treatment using verbal punishment with two
preschool stutters. Journal of Fluency Disorders, 2, 225 – 233.
Rustin, L., & Cook, F. (1983). Intervention procedures for the disfluent child. In: P. Dalton (Ed.),
Approaches to the treatment of stuttering ( pp. 47 – 75). London: Croom Helm.
Rustin, L., Ryan, B., & Ryan, B. (1987). Use of the Monterey programmed stuttering treatment in
Great Britain. British Journal of Disorders of Communication, 22, 151 – 162.
Ryan, B. (1971). Operant procedures applied to stuttering treatment for children. Journal of Speech
and Hearing Disorders, 36, 264 – 280.
Ryan, B. P., & Van Kirk Ryan, B. (1983). Programmed stuttered therapy for children: comparison of
four established programs. Journal of Fluency Disorders, 8, 291 – 321.
Ryan, B. P., & Van Kirk Ryan, B. (1995). Programmed stuttering treatment for children: comparison
of two establishment programs through transfer, maintenance, and follow-up. Journal of Speechand Hearing Research, 38, 61 – 75.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Schiavettii, N., & Metz, D. (1997). Evaluating research in speech pathology and audiology (3rd ed.).
Schulze, H. (1991). Time pressure variables in the verbal parent – child interaction patterns of
fathers and mothers of stuttering, phonologically disordered and normal preschool children. In: H. F. M. Peters, W. Hulstijn, & C. W. Starkweather (Eds.), Speech motor control andstuttering ( pp. 441 – 451). New York: Elsevier.
Schwartz, M. F. (1976). Stuttering solved. New York: McGraw-Hill. Siegel, G. M. (1990). Concluding remarks. In: L. B. Olswang, C. K. Thompson, S. S. Warren, & N. J.
Mingetti (Eds.), Treatment efficacy research in communication disorders. Rockville, MD: Amer-ican Speech – Language – Hearing Foundation.
Smith, A. (1990). Toward a comprehensive theory of stuttering: a commentary. Journal of Speech and
Stager, S. V., Ludlow, C. L., Gordon, C. T., Cotelingam, M., & Rapoport, J. L. (1995). Fluency
changes in persons who stutter following a double blind trial of clomipramine and desipramine. Journal of Speech and Hearing Research, 38, 516 – 525.
Starkweather, C. W. (1987). Fluency and stuttering. Englewood Cliffs, NJ: Prentice-Hall. Starkweather, C. W. (1993). Issues in the efficacy of treatment for fluency disorders. Journal of
Starkweather, C. W., & Gottwald, S. R. (1993). A pilot study of relations among specific measures
obtained at intake and discharge in a program of prevention and early intervention for stuttering. American Journal of Speech – Language Pathology, 2, 51 – 58.
Venty, I. M., & Schiavetti, N. (1986). Evaluating research in speech pathology and audiology
Vihman, M., & Greenlee, M. (1987). Individual differences in phonological development: ages one to
three years. Journal of Speech and Hearing Research, 30, 503 – 521.
Woods, D. W., Fuqua, R. W., & Waltz, T. J. (1997). Evaluation and elimination of an avoidance
response in a child who stutters: a case study. Journal of Fluency Disorders, 22, 287 – 297.
Yairi, E. (1993). Epidemiological and other considerations in treatment efficacy research with pre-
school children who stutter. Journal of Fluency Disorders, 18, 197 – 219.
Yairi, E. (1997). Disfluency characteristics of childhood stuttering. In: R. F. Siegel, & G. M. Siegel (Eds.),
Nature and treatment of stuttering: new directions (2nd ed., pp. 49 – 78). London: Allyn and Bacon.
Yairi, E., & Ambrose, N. (1992). A longitudinal study of stuttering in children: a preliminary report.
Journal of Speech and Hearing Research, 36, 521 – 528.
Yairi, E., Ambrose, N., Paden, E., & Throneburg, R. (1996). Predictive factors of persistence and
recovery: pathways of childhood stuttering. Journal of Communication Disorders, 29, 51 – 77.
Yaruss, J. S. (1997). Clinical implications of situational variability in preschool children who stutter.
Journal of Fluency Disorders, 22, 187 – 203.
Yaruss, J. S. (1998). Describing the consequences of disorders: stuttering and the international clas-
sification of impairments, disabilities and handicaps. Journal of Speech, Language and HearingResearch, 41, 249 – 257.
Yeaton, W. H., & Sechrest, L. (1981). Critical dimensions in the choice and maintenance of successful
treatments: strength, integrity and effectiveness. Journal of Consulting and Clinical Psychology,49, 156 – 176.
Young, M. A. (1975a). Onset, prevalence, and recovery from stuttering. Journal of Speech and
Young, M. A. (1975b). Observer agreement for marking moments of stuttering. Journal of Speech and
Young, M. A. (1993). Supplemental tests of statistical significance: variation accounted for. Journal of
Speech and Hearing Research, 36, 644 – 656.
Zebrowski, P. M., & Conture, E. G. (1998). Influence of nontreatment variables on treatment effec-
tiveness for school-age children who stutter. In: A. K. Cordes, & R. J. Ingham (Eds.), Treatmentefficacy for stuttering: a search for empirical bases. San Diego: Singular Publishing Press.
C. Thomas, P. Howell / Journal of Fluency Disorders 26 (2001) 311–333
Assessing efficacy of stuttering treatments
1. Lanyon, Lanyon & Goldswothy’s (1979) study suggested that:
a. Biofeedback was an effective form of treatment for stutteringb. Patients should be given the MMPI prior to enrollment in fluency therapyc. Objective outcome measures are necessary because subjective judgements
of therapy outcomes may be controlled by a number of variables
d. The MMPI is not a valid instrument. e. MMPI K scores can used to measure therapeutic progress
2. What was Muma’s estimate of Types I and II statistical errors in studies
a. as many as 1 in 2b. as many as 1 in 3c. as many as 1 in 4d. as many as 1 in 5e. as many as 1 in 6
3. Which of these studies is a replication of an earlier one?
a. Conture and Guitar (1993)b. Druce et al. (1997)c. Eichstadt et al. (1998)d. Martin et al. (1984)e. Grunwell (1981)
4. What is the minimum suggested period advocated for following-up clients after
a. 6 monthsb. 1 yearc. 2 yearsd. 3 yearse. 4 years
5. Which of the following treatments was not investigated by Ryan and Van Kirk
a. Programmed traditionalb. Delayed auditory feedbackc. Controlled breathingd. Time-out contingencye. GILCU
Cyclops, , Euripides, e typographeo Clarendoniano, 1991, 0198145942, 9780198145943, . . Acharnenses , Aristophanes, 1863, , 235 pages. . Odysseus and the Cyclops , , 1995, Cyclopes (Greek mythology), 32 pages. A retelling of how Odysseus andhis companions outwit the giant one-eyed Cyclops and escape from his cave. Electra , Euripides, 2004, Drama, 72 pages. Drama Classics: The World's Great P
Using Data Mining methodology for text retrieval Institute of Computer Science, Warsaw University of Technology ul. Nowowiejska 15/19, 00-665 Warsaw, Poland Abstract: Futurologists and science-fiction writers have been foreseeing an information explosion phenomenon for many years, but during last decades we can experience it by ourselves. Thanks to the rapid development of Internet, p