NKF KDOQI GUIDELINES
APPENDIX 2: METHODS FOR EVALUATING EVIDENCE
The overall aim of the project was to develop CPGs and CPRs for management of the coexisting conditions of diabetes and CKD.
The Work Group developed the guidelines and recommendations using an evidence-based approach. Evidence regarding the guideline topics was derived primarily from a systematic summary of the available scientific literature. When sufficient evidence was lacking, recommendations were developed that reflect expert opinion. When appropriate, available guidelines or systematic reviews were used to support the current guidelines and recommendations.
OVERVIEW OF PROCESS
Development of the guidelines and recommendations required many concurrent steps to:
An Evidence Review Team, composed of experts in systematic review and guideline development, guided the Work Group in all methods and aspects of guideline development. The Work Group and the Evidence Review Team met in four 2-day meetings over 18 months.
Creation of Groups
The Chair and Co-Chair of the KDOQI™ Advisory Board selected the Co-Chairs of the Work Group and the Director of the Evidence Review Team, who then assembled groups to be responsible for the development of the guidelines. The Work Group and the Evidence Review Team collaborated closely throughout the project.
The Work Groups consisted of domain experts, including individuals with expertise in adult and pediatric nephrology, adult and pediatric diabetology and endocrinology, cardiology, pharmacology, social work, nursing, and nutrition. The first task of the Work Group members was to define the overall topics and goals of the guidelines. They then further developed and refined each topic, literature search strategies, and data extraction forms (described below). The Work Group members were the principal reviewers of the literature; from their reviews and detailed data extractions, they summarized the available evidence and took the primary roles of writing the guidelines and rationale statements. Completed data extractions were shared among Work Group members.
The Evidence Review Team consisted of nephrologists, physician-methodologists, and research assistants from Tufts-New England Medical Center with expertise in systematic review of the medical literature. They supported the Work Groups in refining the topics and clinical questions so that literature searches could be undertaken. They also instructed the Work Group members in all steps of systematic review and critical literature appraisal. The Evidence Review Team coordinated the methodological and analytical process of the report, defined and standardized the methodology of performing literature searches, of data extraction and of summarizing the evidence in summary tables. They performed literature searches, organized abstract and article screening, created forms to extract relevant data from articles, organized Work Group member data extraction, and tabulated results. Throughout the project the Evidence Review Team led discussions on systematic review, literature searches, data extraction, assessment of quality and applicability of articles, evidence synthesis, and grading of the quality of the body of evidence and the strength of guideline recommendations.
Refinement of Guideline Topics and Development of Materials
The goals of the Work Group spanned a diverse group of topics, which would have been too large for a comprehensive review of the literature. Based on their expertise, members of the Work Group focused on specific questions deemed clinically relevant and amenable to systematic review. Other sources of data included previously published guidelines and systematic reviews.
The Work Groups and Evidence Review Team developed: (1) draft guideline statements, (2) draft rationale statements that summarized the expected pertinent evidence, and (3) data extraction forms requesting the data elements to be retrieved from the primary articles. The topic refinement process began before literature retrieval and continued through the process of reviewing individual articles.
The Work Group members developed specific questions with regards to predictors and interventions related to specific outcomes. Search strategies were developed according to specific study topics, study design, and years of publication. Studies for the literature review were identified through MEDLINE searches of English language literature of human studies from January 1990 to December 2003. Selective updates were performed through May 2005. Broad MeSH (medical subject heading) terms and text words were used so that searches were both general in scope for high sensitivity in identification of pertinent literature and specific to preliminary topics selected by the Work Groups. The searches were also supplemented by articles identified by Work Group members through August 2005.
The principal kidney-related search terms used included: kidney, renal, kidney disease, albuminuria, proteinuria, hematuria, and hyperfiltration. Principal diabetes-related terms included: diabetes mellitus, hyperglycemia, retinopathy, and pregnancy in diabetes.
Only full journal articles of original data were included. Editorials, letters, abstracts, and unpublished reports were not included. Selected review articles, however, were included for background material. A separate search for systematic reviews of health education in diabetes was conducted for the behavioral management recommendation.
MEDLINE search results were screened by members of the Evidence Review Team for relevance using predefined eligibility criteria, described below. Retrieved articles were screened by the Evidence Review Team. Potentially relevant studies were sent to Work Group members for rescreening and data extraction. Domain experts made the final decision for inclusion or exclusion of all articles.
Generation of Data Extraction Forms
Data extraction forms were designed to capture information on various aspects of the primary articles. Forms for all topics included study setting and demographics, eligibility criteria, severity of kidney disease, type of diabetes, numbers of subjects, study design, study funding source, comorbid conditions, descriptions of relevant risk factors or interventions, description of outcomes, statistical methods, results, study quality based on criteria appropriate for each study design (see below), study applicability (see below), and sections for comments and assessment of biases. Training of the Work Group members to extract data from primary articles occurred at face-to-face meetings, supplemented by e-mails and teleconferences.
Generation of Evidence Tables
The Evidence Review Team condensed the information from the data extraction forms into evidence tables, which summarized individual studies. These tables were created for the Work Group members to assist them with review of the evidence and are not included in the guidelines. All Work Group members (within each topic) received copies of all extracted articles and all evidence tables. During the development of the evidence tables, the Evidence Review Team checked the data extraction for accuracy and rescreened the accepted articles to verify that each of them met the initial screening criteria determined by the Work Group.
Table 59. Topics for Which Systematic Reviews of Primary Studies Were Performed
Format for Summary Tables
Summary tables describe the studies according to 4 dimensions: study size and follow-up duration, applicability or generalizability, results, and methodological quality. Within each table, the studies are first grouped by outcome type.
Data entered into summary tables by the Evidence Review Team were derived from the data extraction forms, evidence tables, and/or the articles. All summary tables were reviewed by the Work Group members.
Within each outcome section of each table, studies are ordered first by methodological quality (best to worst), then by applicability (most to least), and then by study size (largest to smallest). Results are presented by using the appropriate metric or summary symbols, as defined in the table footnotes.
Systematic Review Topics, Study Eligibility Criteria, and Literature Yield
The topics listed in Table 59 were systematically reviewed. Predefined eligibility criteria are included. These were based on the study designs of the available literature (eg, whether there were an “adequate” number of randomized trials) and the volume of the literature (eg, whether there were “so many” studies that restriction based on such factors as study size or duration were deemed appropriate).
For the primary literature topics, the literature searches yielded 11,378 citations. Of these, 765 articles were retrieved in full. An additional 57 studies were added by Work Group members. From all 822 articles, 250 were extracted and included. Of these, 142 studies are included in Summary Tables. A supplemental search for systematic reviews of diabetes and health education yielded 901 citations, of which 10 systematic reviews were summarized.
Grading of Individual Studies
Study Size and Duration
The study (sample) size is used as a measure of the weight of the evidence. In general, large studies provide more precise estimates of effects and associations. In addition, large studies are more likely to be generalizable; however, large size alone does not guarantee applicability. A study that enrolled a large number of selected patients may be less generalizable than several smaller studies that included a broad spectrum of patient populations. Similarly, longer duration studies may be of better quality and more applicable, depending on other factors.
Applicability (also known as generalizability or external validity) addresses the issue of whether the study population is sufficiently broad so that the results can be generalized to the population of interest. The study population typically is defined primarily by the inclusion and exclusion criteria. The target population varied somewhat from topic to topic, but generally was defined to include patients with both CKD and diabetes (ideally DKD, CKD caused directly by diabetes mellitus). More specific criteria were sometimes appropriate, for example, subjects with retinopathy or pregnant women. A designation for applicability was assigned to each article, according to a 3-level scale. In making this assessment, sociodemographic characteristics were considered, as well as comorbid conditions and prior treatments. Applicability is graded in reference to the population of interest for each topic.
Sample is representative of the target population, or results are definitely applicable to the target population irrespective of study sample.
Sample is representative of a relevant subgroup of the target population. For example, sample is only representative of people with macroalbuminuria, or all elderly individuals.
Sample is representative of a narrow subgroup of patients only, and not well generalizable to other subgroups. For example, the study includes only a small number of patients or older patients with newonset diabetes. Studies of such narrow subgroups may be extremely valuable for demonstrating exceptions to the rule.
In general, the result is summarized by both the direction and strength of the association. Depending on the study type, the results may refer either to dichotomous outcomes, such as the presence of retinopathy or a laboratory test above or below a threshold value, or to the association of continuous variables with outcomes, such as serum laboratory tests. We accounted for the magnitude of the association and both the clinical and statistical significance of the associations. Criteria for indicating the presence of an association varied among predictors depending on their clinical significance. Both univariate and multivariate associations are presented, when appropriate. The following metrics were used: prevalence, relative effects (relative risk [RR], odds ratio [OR], hazard ratio [HR], or net change—change from baseline in the intervention group minus the change in the control group), correlation (r or r2), and test accuracy (sensitivity, specificity, and positive and negative predictive value). The choice of metric often was limited by the reported data. For some studies, only the statistical significance was reported.
Methodological quality (or internal validity) refers to the design, conduct, and reporting of the clinical study. Because studies with a variety of types of design were evaluated, a 3-level classification of study quality was devised:
Least bias; results are valid. A study that mostly adheres to the commonly held concepts of high quality, including the following: a formal study; clear description of the population and setting; clear description of an appropriate reference standard; proper measurement techniques; appropriate statistical and analytical methods; no reporting errors; and no obvious bias. Not retrospective studies or case series.
Susceptible to some bias, but not sufficient to invalidate the results. A study that does not meet all the criteria in category above. It has some deficiencies but none likely to cause major bias.
Significant bias that may invalidate the results. A study with serious errors in design or reporting. These studies may have large amounts of missing information or discrepancies in reporting.
Summarizing Reviews and Selected Original Articles
Work Group members had wide latitude in summarizing reviews and selected original articles for topics that were determined not to require a systemic review of the literature. However, a thorough review and summary of systematic reviews of diabetes and health education was performed.
Format of Guidelines and Clinical Practice Recommendations
The format for each CPG and CPR chapter is outlined in Table 60. Each CPG or CPR contains one or more specific “statements,” which are presented as “bullets” that represent recommendations to the target audience. Each CPG or CPR contains background information, which is generally sufficient for interpretation. A discussion of the broad concepts that frame the CPGs and CPRs is provided in the preceding section of this report. The rationale for each CPG contains a series of specific “rationale statements,” each supported by evidence. The CPG or CPR concludes with a discussion of limitations of the evidence and a brief discussion of clinical applications, and implementation issues regarding the topic. Research recommendations for topics related to all CPGs and CPRs are compiled in a separate chapter.
Table 60. Format for Guidelines
Rating the Strength of Guidelines and Rationale Statements
Grading the Strength of Evidence
The overall strength of each guideline or clinical practice recommendation statement was rated by assigning either “A”, “B”, or “C (CPR)” as described in Table 61.
The strength of evidence was graded using a rating system that primarily takes into account: (1) methodological quality of the studies; (2) whether the study was carried out in the target population, ie, patients with CKD and diabetes, or in other populations; and (3) whether the studies examined health outcomes directly or examined surrogate measures for those outcomes, eg, reducing death or improving albuminuria (Table 62). These 3 separate study characteristics were combined to provide a preliminary strength of evidence provided by pertinent studies. In addition, aspects of the GRADE recommendations for grading the quality of evidence and the strength of recommendations were incorporated to determine a final strength of evidence.598
Thus, specific criteria for assessing the quality of the body of evidence (including an initial categorization of evidence quality based on study designs of the available studies) were discussed with the Work Group. For questions of interventions, quality was High, if randomized controlled trials; Low, if observational studies; Very Low, if other types of evidence. The quality rating was then decreased if there were serious limitations to individual study quality, if there were important inconsistent results across studies, if the applicability of the studies to the population of interest was limited, if the data were imprecise or sparse, or if there was thought to be a high likelihood of bias. The quality rating for observational studies was increased if there was strong evidence of an association (ie, significant RR or OR of about >2 [or <0.5] based on consistent evidence from 2 or more observational studies, with no plausible confounders), if there was evidence of a dose-response gradient, or if plausible confounders would have reduced the effect. Four final quality categories were used: High, Moderate, Low, and Very Low.
Table 61. Rating the Strength of Guideline and CPR Statements
The Work Group and Evidence Review Team also discussed how the strength of the evidence would be determined based on the quality of evidence across all outcomes of interest, taking into account the relative importance of each of the outcomes (eg, death and CKD progression having greater weight than albuminuria or glucose levels) and a balance between net benefits and additional considerations, such as costs (resource utilization), feasibility, availability, likely differences in patient values, likely differences among populations and regions.
Each major item of evidence discussed in the Rationale sections for each CPG and CPR was given a strength rating. Upon consideration of the strength of evidence for the various sections of the body of evidence for a given set of recommendation statements, a determination was made whether the set of statements rise to the level of a CPG or whether the body of evidence is sufficiently weak to warrant only a CPR. Sets of statements that were graded as being Strong or Moderately Strong were designated as Guidelines. In the absence of strong or moderately strong quality evidence or when additional considerations did not support strong or moderately strong evidence-based recommendations, the Work Group could elect to issue expert opinion-based recommendations termed CPRs. These recommendations are based on the consensus of the Work Group that the practice might improve health outcomes. As such, the Work Group recommends that clinicians consider following the recommendation for eligible patients. These recommendations are based on either weak evidence or on the opinions of the Work Group.
In addition, the Work Group adopted a convention for using existing expert guidelines issued for populations other than the target population. Grades for the strength of evidence assigned by the professional societies that issued the guidelines were adopted. When the guideline or the evidence was not graded, this Work Group assumed that the guideline would be based on at least moderately strong evidence. The extrapolation of these guideline recommendations from the general populations to the target population was considered to support grade B recommendations.
Table 62. Rating the Quality of Evidence
Limitations of Approach
While the literature searches were intended to be comprehensive, they were not exhaustive. MEDLINE was the only database searched, and searches were limited to English language publications. Hand searches of journals were not performed, and review articles and textbook chapters were not systematically searched. However, important studies known to the domain experts that were missed by the literature search were included in the review. No meta-analyses were performed.
Back to top | Main Page