Osteoarthritis Initiative

Steering Group

A public, private partnership to develop and evaluate clinical biomarkers.

The Osteoarthritis Initiative
A Public/Private Research Collaboration

Monday, February 28, 2000
Tuesday, February 29, 2000

Lister Hill Auditorium
National Institutes of Health
Bethesda, Maryland

Meeting Summary

Organized by:

The National Institute of Arthritis and Musculoskeletal
and Skin Diseases, National Institutes of Health

The Foundation for the National Institutes of Health, Inc.

Sponsored by:

The American Academy of Orthopaedic Surgeons
The Arthritis Foundation
Aventis Pharma
Bristol-Myers Squibb Pharmaceutical Research Institute
Chiron Corporation
DuPont Pharmaceuticals Company
F. Hoffman-La Roche, Ltd.
Genetics Institute
Glaxo Wellcome, Inc.
Merck Research Laboratories
Monsanto/Searle
Novartis Pharmaceuticals Corporation
Parke-Davis
Pfizer, Inc.
Procter & Gamble Pharmaceuticals
The R.W. Johnson Pharmaceutical Research Institute

Osteoarthritis Initiative Steering Group Subcommittee Chairs:

Epidemiology
Stefan L. Lohmander, M.D., Ph.D.
University of Lund
Lund, Sweden

Biochemical Markers
Thasia G. Woodworth, M.D.
Pfizer Central Research
Groton, Connecticut, U.S.A.

Imaging
Charles G. Peterfy, M.D., Ph.D.
Synarc, Inc.
San Francisco, California, U.S.A.

Administration
Stephen A. Stimpson, Ph.D.
Glaxo Wellcome, Inc.
Research Triangle Park, North Carolina, U.S.A.

Welcome

Stephen Katz, M.D., Ph.D., Director, National Institute of Arthrkers identified by currently available specimen archives and proposed longitudinal cohort studies. The subcommittee's ultimate goal is to validate surrogate biochemical markers and contribute to the regulatory guidelines for new drug development in OA.

Accordingly, the subcommittee will seek to accomplish the following:

Characterize the research questions and generate hypotheses regarding potential surrogates.
Critically examine available archives.
Describe the state of the art for available assays.
Establish assay selection criteria.
Establish standards for biospecimen collection, management, and quality.
Identify key database characteristics.
Identify the prospective studies that are still needed for surrogate marker validation.

Dr. Woodworth noted that the important issues this effort needs to address relate to

The conditions of access.
Confidentiality, informed consent, and anonymous databases, particularly with respect to genetic samples.
Proprietary and intellectual property concerns.
The nature and objective of the measurement, including whether it is diagnostic, characterizes status, or assesses progression.
The kind of markers that should be used: biochemical, molecular, or genetic markers, or some combination thereof; combinations of bone and cartilage markers; or perhaps synovial tissue markers.

Discussion (Panel Members and Meeting Participants)

Question: While there are meaningful parallels between the use of biomarkers in the study of OP and OA, there are also important differences. In general, most osteoporitic conditions are systemic, generalized processes affecting essentially the entire skeletal system. In OA, although the articular process may be somewhat generalizable, the disease within a given joint is often driven by the impact of a local biomechanical, post-traumatic, or instability-based factor. To the extent that this is true, the use of serum or urinary biochemical markers as a means to track disease may be relatively insensitive. Is joint aspiration a preferred technique for this subset of patients?

Dr. Poole responded that this is a very important point- one that is made in the Biomarkers White Paper-and that he is a strong proponent of looking at joint fluid in this type of situation, believing it to be the most accurate way to assess pathobiology and the influence of therapy. A standardized procedure to obtain joint fluid by washout has been developed, along with the ratios of markers that address the issue of variable dilution of joint effusion. Dr. Poole also noted the importance of considering joint fluid analysis in the routine assessment of how best to study biomarkers.

Dr. Heinegard pointed out that although there are fewer variables when one looks at the fluid of an individual joint, in terms of therapy, when more than one joint is involved, there may be some joints at stage one and others at stages two and three. If the therapy affects stage one only, different markers-a combination of general circulation markers and synovial fluid-are needed to assess what is going on in both the stage one and the stage two and three joints.

It was noted that a significant number of patients who have total joint replacements have mechanical malalignment of either the hip or the knee, which points to the need to consider biomechanical markers. Sophisticated techniques for defining the kinematics of the knee ex vivo have also demonstrated that many people with very early grade one disease with minimal or no malalignment have well-defined, quantifiable abnormalities of gait.

Within OA, end points that can be evaluated will likely be pain, functionality, and-potentially-hip or knee replacements. Consideration should be given to existing good and objective functional-and sometimes biochemical-markers of biomechanics, function, and pain.

There are limitations in attempting to identify biomarkers that predict therapeutic effects from the characterization of a historical or natural process database. The pathophysiologic and immunologic mechanisms involved are highly complicated, and surrogate markers are frequently not well characterized. Accordingly, consideration should be given to applying additional emphasis to the characterization of interventional studies and interventional databases in which the discernible effects of a therapeutic agent on a marker are measured and then validated against a clinical outcome measure.

It is important to obtain x-rays of many joints to ensure that the controls selected do not have an unidentified, unforeseen form of systemic cartilage degradation.

It was pointed out that even if a substance does not change with disease, markers that alter something linked to the disease process can play an important role. In addition, the biomarkers being investigated should be measured in studies that assess the clinical outcomes of a variety of interventions and demonstrate, not assume, the link between the changes in outcome and the changes in the biomarker.

Considerable data exist on the relationship between markers measured in body fluid and joint pathology. For example, osteocalcin measured in synovial fluid relates very closely to scintigraphy scan abnormalities of the knee joint and can be found in serum. The COMP measured in serum also relates quite positively to scan abnormalities of the knee.

In response to a query about how close we are to identifying usable markers, the members of the panel indicated that

Screening can currently be conducted, e.g., by analyzing two markers that measure the same thing and, using general patient cohorts or sample cohorts, assessing marker levels and correlations.
Various kinds of interventions could start to be monitored now.
Some of the most promising studies have resulted from efforts by groups of researchers to compare, side by side in a clinical trial, currently available biomarkers.
Consideration should be given to linking work on OA biomarkers to the wealth of information on disease-modifying drugs and well-defined biomarkers that exist for rheumatoid arthritis.
It would be particularly helpful to the overall process if the OA initiative facilitated the conduct of collaborative animal studies.

The point was made that virtually all of the published studies that have linked a biomarker to OA progression have done so on the basis of standing AP knee radiographs, for which real problems have been identified regarding the reproducibility of the technique. Relationships between biomarkers that have been established based on standing AP radiographs need to be reexamined in light of these concerns.

Dr. Poole noted that such concern exemplifies the great importance of functional studies to assess disease progression and further reinforces the need to integrate all applicable areas of research. Studies incorporating the imaging of a knee joint provide an opportunity to look-in a carefully integrated way-at biomarker changes within a specific joint, the functional aspects of the joint, pain, and imaging.

There is a great need to coordinate all available forms of data from the beginning of any study. No particular tissue or technique for determining the processes involved in OA should be excluded because, years down the road, it may turn out that a particular aspect-e.g., white blood cells-will become important in efforts to define subsets of this disease.

Imaging

Introduction

Randall Stevens, M.D., Associate Professor of Medicine, Robert Wood Johnson Medical School, Director, Clinical Science, F. Hoffman-LaRoche and session moderator welcomed the participants to the Imaging Session and introduced Drs. Hall, Buckland-Wright, and Peterfy.

Perspectives and Considerations on Imaging Techniques

Laurance Hall, Ph.D.
Professor
Herchel Smith Laboratory for Medicinal Chemistry
University of Cambridge School for Clinical Medicine

Dr. Hall indicated that his presentation would address magnetic resonance imaging (MRI) measurement of the articular joint as an intact organ in order to detect, assess the etiology of, and quantitate OA and determine the efficacy of treatment.

MRI offers a safe and noninvasive way to look at all of the soft tissues within the articular capsule, the surrounding musculature and vasculature, and bony tissue, allowing for the assessment and comparison of these different joint elements in subjects undergoing all stages of OA and normal controls. The addition of measurement science to this technique allows for the quantitation of the relative rate of progression of the different elements of OA and, in principle, the efficacy of treatment.

Dr. Hall described a study of guinea pig knees carried out by his laboratory five years ago that serves as an example of what MRI can do for the study of OA in humans. The study, which assessed with high measurement precision a small number of guinea pigs on a very regular basis, integrated the use of MRI with x-rays and histology. The technique allowed for the study of each joint as a single intact organ, the identification of the natural progression of disease etiology, and the assessment of treatment response. The images produced allowed for the detailed measurement of subchondral sclerosis; the cystic disruption of the trabecular architecture; osteophytosis; and cartilage thickness, swelling, and loss.

For the OA Initiative, perhaps the greatest value of MRI lies in its capacity to aid in the selection of patients to be admitted into clinical trials, evaluate the extent and trajectory of damage to individual joints, help choose which subgroups of patients are appropriate for specific therapeutic windows of treatment, and assess the impact of treatment. The technique utilized by Dr. Hall can efficiently generate a huge amount of data that, with computer analysis, can be further refined to produce such resources as cartilage thickness maps. An even more powerful way of integrating these data into the context of biomarkers is to create a "virtual plug biopsy" by defining and making repeated assessments of a specific region of the joint. In addition, the process of measuring the magnetic resonance parameters of water in articular cartilage holds great promise as a technique that can identify the intrinsic quality of the cartilage.

Dr. Hall noted that the type of radiological assessment and scoring of the articular joint he has described has been available for a decade. Given the importance of moving forward with the OA Initiative, he recommends that this technique be used to produce scans and place them in a databank that can then be mined for quantitative measurements.

As Dr. Charles Peterfy will discuss in the upcoming straw proposal on imaging techniques, current MRI technology is capable of supporting a global level one study that places imaging data into a database archive that is fully integrated with biomarker data. Existing methods also allow for a level two study that applies computer-based measurement science to those same data, while level three refers to a proposed mechanism that leaves the door open for the incorporation of future innovations.

Perspectives and Considerations on Imaging Techniques (continued)

Chris Buckland-Wright, Ph.D., D.Sc.
Professor, Department of Applied Clinical Anatomy
School of Biomedical Sciences
King's College London

Dr. Buckland-Wright explained that his presentation reviews the radiographic aspects of examining OA joints.

The first outcome measure of interest is joint-space width and whether or not it is a reliable representation of the articular cartilage within a joint. Dr. Buckland-Wright reviewed the results of a study that found a very tight correlation between radiographic images of joint-space width and the sum of cartilage thickness in the medial diseased compartment. In the lateral compartment, however, radiographic images of joint-space width were not found to be a reliable measure of cartilage thickness. The study also found that, within the loaded area, the width of the joint space was significantly narrower than the combined cartilage thickness of the femur and tibia, indicating that weight-bearing radiographs can be used to measure both the thickness and biomechanical status of articular cartilage.

A critical factor in the use of radiographic images to assess OA joints is the need to clearly define the radioanatomical plane of measurement, which is determined by the position of the joint during radiography and the direction of the x-ray beam as it passes between the joint's bony margins. This plane of measurement should be perpendicular to the central ray of the beam, perpendicular to the joint margins, and parallel to the film. Another important aspect is whether or not there is any degree of joint rotation. The size and appearance of osteophytes can be affected by the angle of the x-ray beam, the degree of joint rotation, the degree of joint flexion, and radiographic magnification.

Accurate radioanatomical positioning of a joint under load and in a normal functional position allows for the accurate assessment of the cartilage that a patient is actually living on. Dr. Buckland-Wright described a study that looked at three types of films, each of which produced a single view of the knee and did not use fluoroscopy for positioning. The study found that most reliable radiographic technique was that one that utilized a posteroanterior (PA), semiflexed knee position and a horizontal x-ray beam in which the front of the film is lined up with the first metatarso-phalangeal (MTP) joint. This position provided accurate radioanatomical positioning and minimal distortion of the x-ray features, allowing for reproducible measurements of joint-space width, osteophytes, cortical thickness, and cancellous bone organization. The next best technique was the PA tunnel or Schuss knee position with the x-ray beam directed 5 degrees downwards. An advantage of this view is that it can be used to measure rapid loss of articular cartilage at the posterior aspect of the tibia plateau and popliteal surface of the femoral condyles. The study found that the least reliable view was that obtained in the anteroposterior (AP) fully extended knee position with the x-ray beam horizontal.

The more expensive fluoroscopy positioning procedure ensures joint positioning that is much more reliable. Because it reduces the degree of variability, it is likely that this technique can be used when there is a need to assess changes in a smaller number of patients. Dr. Buckland-Wright expects to have data comparing the advantages of fluoroscopy positioning with those of nonfluoroscopy positioning by approximately September 2000.

With respect to the hip, a radiograph that centers the beam on the single hip joint has been shown to result in better reproducibility than the standard view of both hip joints. Other work suggests that a lateral oblique view of the hip provides an opportunity to assess both the superior and posterior aspects of the femoral head. The only limitation to this method is that it requires fluoroscopy for reliable positioning. Hand radiography is reasonably well established; a reliable method appears to be imaging the fingers held together and aligned with the axis of the forearm, with the beam centered at the third metacarpo-phalangeal (MCP) joint. Further work to develop standardized radiographic procedures for the spine, the temporal mandibular joint, and the shoulder is needed to allow for a more global assessment of the body. Dr. Buckland-Wright noted that, wherever possible, computerized methods of joint-space width measurement should be used.

In relation to the measurement of different aspects of the joint, standardized radiographic protocols are needed to quantify:

Changes in articular cartilage thickness measured as joint-space width within a 2-year period using standard radiography or within 1 year using microfocal radiography.
Changes in joint-space width due to the effect of therapeutic intervention.
The status of osteophytes on both the tibial spines and at the tibial and femoral margins, using devices such as digitizing tablets and computers to measure their change in number and size over time.
Erosions, which Dr. Buckland-Wright calls juxtarticular areas of radiolucency, that tend to fluctuate in number and size over time and may be associated with inflammatory episodes.
Subchondral cancellous bone organization that, with the onset of OA, takes on a ladder-like appearance in the tibia. Fractal signature analysis techniques have been developed that allow for the quantification of these changes.
Changes in cortical plate thickness. It is quite possible that, in the hand, changes in the vascularization of this zone lead to the advance of calcified cartilage into the joint space.
The mechanical alignment of a joint.
The remodeling and reshaping of bone in patients with advanced OA.

Dr. Buckland-Wright concluded that both standard and microfocal radiography currently have the capacity to quantify joint-space width, joint-space narrowing, osteophyte number and area, the number and size of erosion sites, cortical plate thickness, trabecular bone organization, and the mechanical axis of the joint.

Discussion

Many current populations studies and European clinical studies use standardized, fully extended, weightbearing radiographs of the knee. It has been suggested that the way to diminish the CV's of these images is not necessarily to change the extension or the flexion of the knee, but rather to take the radiograph in a highly standardized way. Can these data be used?

Dr. Buckland-Wright responded that Kenneth Brandt, M.D., Director of the Multipurpose Arthritis Diseases Center at the Indiana University School of Medicine, had just shared data with her that shows that, historically, all standing extended knee films are invalid. With respect to the argument that it does not matter what position the knee is in so long as the radiograph is taken carefully, Dr. Buckland-Wright pointed out that research he has been involved in has found that there is variation in the status of the articular cartilage in different parts of the knee. One study showed that approximately 20 percent of the population assessed tended to lose their articular cartilage very rapidly over the posterior part of the tibia and popliteal surface of the femur, more rapidly than the area that they actually moved and walked on, e.g., the middle zone of the tibia plateau. In addition, a major limitation of the standing extended knee view is that it shows a part of the cartilage that is only rarely used; the only time a person's knee is at full extension is when they are standing at attention or goose-stepping.

The point was made that, although the current focus is on measuring interbone distance or joint-space width and cartilage, epidemiologic studies have shown that the bony changes may be a much more important reflection of patient symptoms. Acccordingly, Dr. Buckland-Wright was asked to comment on the bone changes in the osteophytes in the extended standing knee.

Dr. Buckland-Wright indicated that, if the radiographs are taken in a standardized, reproducible way, standing extended knee films are a very good way to assess osteophytes. This view may cause an enlargement in the perceived size of the osteophytes, but that is not of concern as long as the effect is accounted for and the image is reproducible. However, to assess cancellus bone, Dr. Buckland-Wright encourages viewing the horizontal tibea plateau.

Asked to comment on weightbearing x-rays of the hip and on x-rays of nonweight-bearing joints such as in the hand-and the resultant impact on the assessment of joint- space width-Dr. Buckland-Wright noted that, in the hip, the weightbearing view has been found to be more reliable than the nonweightbearing view. In the hand, good stabilization can be achieved without weightbearing by applying muscle contraction across the joint in the position described earlier (fingers together and aligned with the axis of the forearm).

A participant asked whether the information Dr. Buckland-Wright presented was applicable to MRI scanners dedicated to the lower limbs, which are less expensive and easier to use. Would it be possible for longitudinal epidemiological studies to use such scanners? Dr. Buckland-Wright responded that the new generation of peripheral limb scanners used to scan an ankle, knee, or hand cost roughly one fifth as much as a whole body scanner. These scanners operate at much lower field strengths than whole body scanners; hence the signal to noise and spatial resolution of each scan is lower, decreasing the accuracy of the scans produced. However, the potential of these peripheral limb scanners has yet to be objectively explored, and this is an issue that will be addressed at the end of Dr. Peterfy's presentation.

Presentation of Straw Proposal on Imaging Techniques

Charles Peterfy, M.D., Ph.D.
Chief Scientific Officer
Executive Vice President
Synarc, Inc.

Dr. Peterfy noted that his presentation focuses on two fundamental questions:

Are structural features of the joint (hip, knee, or hand) and the associated tissue changes (such as joint-space narrowing and osteophyte development) reliable markers of disease and disease progression? To answer this, it is important to apply the most reliable imaging techniques and markers currently available.
What is the relative validity and performance of alternative imaging markers of joint structure? The capacity to characterize markers according to predetermined criteria is crucial to the rational selection and prioritization of markers for different clinical and scientific applications.

The OA Initiative Imaging Subcommittee addressed these issues at a meeting on January 11, 1999. Attendees included representatives from academia, NIH, industry, FDA, and central laboratory service providers. The participants sought to:

Develop a mechanism that would both answer the first question and allow for the inclusion of more experimental and specialized techniques over the course of the study.
Come to an agreement on performance criteria with which to compare different imaging markers.
Achieve consensus on a list of candidate markers and techniques to include in each tier of the study (that will support the whole organ model of OA).
Build an imaging protocol for the OA Initiative that will support these choices.

Before this meeting, Dr. Peterfy circulated a preprint of a manuscript (Peterfy C. Scratching the surface: articular cartilage disorders in the knee. Magnetic Resonance Imaging Clinics of North America 8:2; May 2000) that addressed the subject of imaging marker performance as it applied narrowly to MRI of articular cartilage.

In his review of what was accomplished at the January 11 meeting, a comprehensive summary of which can be found at https://www.nih.gov/niams/news/oisg/imaging.htm, Dr. Peterfy emphasized the following points:

Tier one of the recommended three-tiered protocol framework comprises techniques-to be applied to all subjects and sites in the core study-that are the most established in terms of their validity and performance and suitable for multicenter image acquisition. In this tier, both knees should first be imaged with a nonfluoroscopic, flexed, standing radiograph to assess joint-space width, joint-space and osteophyte scoring, and subarticular bone texture. All participants would also receive a conventional MRI using two techniques: 1) a fat-suppressed, T1-weighted, 3-D gradient echo (to visualize articular cartilage; determine cartilage score, thickness, and volume; and assess osteophytes, subarticular cysts, and bone attrition) and 2) a T2-weighted dual echo, fast spin echo, or turbo spin echo technique (to allow for whole organ scoring and the evaluation of synovial fluid volume).
Tier two would consist of a subset of patients and/or sites that possess special facilities, special interests, and sufficient statistical power. Tier two studies would assess less well established but highly promising markers, as well as techniques applicable to multicenter acquisition that are less available, more difficult, or less tolerable to patients. The techniques for this tier would include, in addition to the knee, imaging of the hands and hips. The radiography of the knee would be aimed at optimizing the multicenter radiographic technique, comparing different nonfluoroscopic and fluoroscopic methods, and evaluating the joints in terms of joint-space width, osteophyte and joint-space score, and-potentially- alignment. Hip imaging would use a standardized standing joint-centered radiography and evaluate joint-space width and the morphologic score. Imaging of the hand would be conducted with standardized MCP3-centered radiography of each hand and scored with the OARSI technique.
Tier three of the framework would be composed of experimental substudies for which the demands for statistical power would be less stringent. These substudies would investigate: 1) markers that are promising but for which there is only scant data regarding their validity or performance and 2) techniques that are not widely available, difficult to perform, and/or demanding on patients. Tier three techniques would include all of the joints assessed in tier two plus the shoulder, spine, and temporomandibular joint, with the intent of capturing all of the alternative techniques and sites that may have relevance to this study.
In each of the three tiers, the markers would be characterized uniformly in terms of their surrogate validity, longitudinal sensitivity, and convenience and cost.
The performance criteria for comparing different imaging markers should be linked to the context of their application. Clinical practice, clinical research, and clinical trials each have slightly different priorities. However, a number of performance metrics that are common to these different applications can be derived and used to compare different imaging (and biochemical) markers.
Critical performance metrics for markers are surrogate validity, the magnitude of the clinically relevant component of the outcome captured by the change in the surrogate outcome (dynamic range), responsiveness to disease and/or therapy, how precisely the change can be measured, convenience, and cost.
Candidate imaging markers for the OA Initiative are:
- Markers of cartilage morphology (radiographic assessment of joint-space width; MRI assessment of cartilage score, thickness, and volume; optical coherence tomography (OCT) assessment of cartilage thickness and surface fraying).
- Compositional markers in cartilage (MRI and polarized OCT assessment of collagen organization in cartilage, MRI assessment of proteoglycan content).
- Bone markers (radiographic and MRI assessment of osteophyte score and size, technetium-MDP scintigraphy assessment of bone synthesis, radiographic and MRI assessment of trabecular measures, MRI and technetium-MDP scintigraphy assessment of marrow edema or inflammation, assessment by a variety of techniques of subarticular cysts and bone attrition).
- Markers of effusion and synovitis (MRI assessment of effusion score, effusion volume, and synovial perfusion; Doppler ultrasound assessment of synovial perfusion).
- Markers of other joint structures (MRI assessment of the meniscus, the cruciate and collateral ligaments, bursitis, etc.).
In the area of subject recruitment, the limitations of imaging relate principally to competition from the clinical cases for MRI time.
The total imaging cost for tier one will be $8,600 to $14,400 per subject (six visits over 4 years). For 5,000 subjects, the total cost of tier one would be $43 to $72 million. For tier two, there are a smaller number of subjects but more (nine) visits. The cost for 500 subjects over 4 years would be $12,600 to $21,600 per subject for a total of $6.3 to $10.8 million. Funding mechanisms for tier three have yet to be discussed.
An alternative that could help contain the imaging costs for all three tiers-and improve standardization- would be to use a small number of mobile, rather than clinical, MRI units at a cost of $1.1 million per year for each unit. To service the 15 or so centers that would be involved would take 3 mobile units. Accordingly, the total imaging costs for the entire study would be $11 million.
Remaining questions that need to be discussed include the determination of the type of cohort to be used; basic design, statistics, and analysis issues (e.g., whether to image one or both knees); the structure and content of the database; and issues related to imaging and analysis standardization and quality assurance (site qualification and training, image data collection and management, image analysis, and reporting).

Discussion

In response to a query on whether tier three would be supported through a Request for Applications (RFA), Dr. Peterfy responded that such a determination falls outside the role of the Imaging Subcommittee and was not discussed by the members of the group.

A participant pointed out that the verified, validated way of measuring bone blood flow noninvasively is with fluoride ion via positron emission tomography (PET). Although there are not many people representing the emission field on the Imaging Subcommittee, the OA Initiative should work to ensure that PET, ultrasound, and new contrast techniques are seriously considered for inclusion in tier three. Dr. Peterfy noted that this would be appropriate for tier three because its design is intended to include all viable imaging techniques.

In response to the question, what is the basic anatomical examination for tier 1 that everyone agrees to? Dr. Peterfy replied that the protocols for tier one were determined on the basis of methods for evaluating articular cartilage, bone, and other structures inside the knee that are the most established and lend themselves to a multicenter acquisition process. Radiographically, a nonfluoroscopic technique is considered the most applicable. With respect to which MRI technique to use, there are not many options if the existing clinical infrastructure is to be used in multiple sites, and those chosen by the members of the Subcommittee are felt to be very effective methods for identifying the morphological characteristics of the articular cartilage and quantifying its volume.

Following an attendee's query regarding where gadolinium ranks in this scenario, Dr. Peterfy noted that gadolinium would be an extra injection that would add both cost and trouble. It could be considered as an issue to study in a tier three or, potentially, tier two context. Questions related to synovial enhancement and synovitis in osteoarthritis could probably be answered outside of this study or in a tier three context.

Dr. Stevens announced that, due to time constraints, further discussion with the members of the Imaging Panel would be deferred until the evening session.

Biostatistics

Introduction

Yetunde Taiwo, section head, Late-Stage Arthritis Research, Procter & Gamble Pharmaceuticals and session moderator reviewed the concepts to be discussed during the biostatistics session and introduced Dr. Helms.

Presentation of Biostatistics White Paper

Ronald Helms, Ph.D.
Vice President, Statistics
Rho, Inc.

In his review of the Biostatistics White Paper (Statistical Issues for Establishing Relationships Among Biological Measures and Clinical End Points of Disease),
the full text of which will be posted on the OA Initiative Web site (https://www.nih.gov/niams/news/oisg/index.htm), Dr. Helms emphasized the following points:

The role of biostatisticians and epidemiologists in the OA Initiative is to identify appropriate statistical methods for evaluating and comparing biomarkers, select clinical trial design strategies, and perform sample size computations and cost calculations.
A surrogate serves as a substitute for an end point in many cases, and a study to validate surrogates must also evaluate corresponding end points. Ideally, the end points and their surrogates should be ethical; feasible; relevant to the clinical setting of the study and the stage of clinical experimentation; valid (although there is controversy regarding whether a surrogate must be valid); precise, reliable, and repeatable; sensitive and responsive to change; and comprehensible and credible to the relevant research, review, and application communities.
Appropriate statistical methods for evaluating surrogate markers are currently available. Mixed (random effects) general linear models can be used to assess continuous end points and surrogates; mixed effects logistics models can be applied to dichotomous end points and surrogates. Although there are some unsolved analytic problems associated with certain types of surrogate assessments, there are ways to work around them.
OA has multidimensional outcome end points that cannot be simultaneously summarized in one number or variable. OA outcomes can be based on clinical symptoms, pathology, histology, biomechanical factors, or biochemical changes. Accordingly, OA end points and surrogates must be restricted to one specific dimension (e.g., handicap, pain, or performance) or be represented simultaneously by multiple variables.
The validation of a surrogate end point is a scientific process that requires the combined effort of medical researchers, clinicians, and biostatisticians. No one discipline should dominate this process.
The use of surrogate end points can be dangerous; a number of studies have found that they can produce misleading or incorrect results.
Some statisticians are very conservative with respect to the use of surrogate variables in Phase III studies, tending to prefer that surrogate markers be used to measure biological activity in Phase II or screening trials. Although other statisticians are less conservative, those planning the OA Initiative should be aware that study review panels, including FDA panels, will include some statisticians who are very conservative with respect to the use of surrogates and will hold to a very high standard research done to validate OA surrogate end points.
Because so much about OA is unknown, Dr. Helms suggested that the study design have two cohorts that can be crossed with the other kinds of cohorts that have been discussed elsewhere: 1) a research development/hypothesis-generating cohort (consisting of a subset of the subjects in the study), which can be used to develop the tools that will help identify optimal biomarkers or potential surrogates, explore data, and assess possible combinations of biomarkers to serve as surrogates for specific endpoints and 2) a hypothesis-testing/evaluation cohort (made up of a different subset of subjects), which can be used to prospectively evaluate proposed surrogates, test hypotheses about proposed surrogates, and compare the surrogates to true end points.
The OA Initiative should plan to conduct a pilot study that tests the statistical cohorts. Prior studies and clinical trials can be used as partial pilot studies, and a run-in period pilot study should be conducted to ensure that measurements and assessments are performed consistently over multiple sites.

Discussion

A participant asked whether the two cohorts described would be designed to be equal in terms of patients and numbers. Dr. Helms indicated that the measurements applied to the cohorts would be the same and that he believed that the number of patients in each cohort would be approximately equal.

In response to a query regarding how he felt about Dr. Peterfy's ballpark estimate of 5,000 patients for 4 years, Dr. Helms noted that the biostatistics group had not worked on sample size calculations for the OA Initiative because of the need to address other issues first.

Asked to discuss how the term "clinically meaningful" should be validated, Dr. Helms responded that he felt the most important issues concerning validity related to the establishment of face validity and construct validity. Outcomes and end points that are generally accepted as measures of pain, discomfort, function, and so on are currently available. With respect to radiographic and MRI outcomes, validity has been determined subjectively by groups of experts who indicate whether a result is representative of the underlying phenomenon.

An attendee asked for an FDA representative to explain whether the FDA considers joint- space width to be a primary outcome for disease modification or a surrogate marker. If it is not considered a surrogate marker, what does FDA consider a primary outcome measure for disease modification? Kent Johnson, FDA, explained that the short answer is that the FDA has not identified a primary outcome measure for disease modification. The provisional guidance used by the agency essentially adopts the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) measures of pain, function, and patient global assessment, with consideration given to the effect of standard confounders such as analgesic use. For OA, FDA is currently entertaining the notion of utilizing an accelerated or conditional approval process such as that used for rheumatoid arthritis. In this process, conditional approval will be given to a treatment if a dramatic effect on structure is identified that, subsequent to approval, is expected to be documented as being clinically relevant. Short of that, the FDA is looking at structure as something that, instead of standing alone, should be accompanied by a clinical claim demonstrated by the results of a clinical trial. Another participant pointed out that the existing FDA guidance for OA, which is much shorter than that for rheumatoid arthritis, contains a footnote that reflects FDA's willingness to be flexible. In this guideline, following a description of how the product might improve the symptoms of, delay the structural progression of, or prevent OA, the footnote indicates that "an alternative approach may be used if such an approach satisfies the requirements of the applicable statutes, regulations or both."

The plea was made that the OA Initiative not be considered a project that has to be perfect. For example, in rheumatoid arthritis, agents that stop erosions from progressing are considered beneficial, yet there is no proof that stopping such erosion is unequivocally linked to long-term functional outcome. Similarly, a treatment that stops the progressive loss of cartilage in OA would be a good thing despite the fact that it would take a long time to prove the treatment's relationship to long-term functional outcome. The desirability of a therapy that stops the progression of osteophytes is less certain because individuals who grow large osteophytes are reported to stabilize their joints better than those who develop smaller osteophytes. These are examples of two clear-cut and unequivocal x-ray features which can be measured with precision and which may actually be completely different indicators of the long-term progression of disease.

Epidemiology

Introduction

Kenneth Brandt, M.D., Professor of Medicine, Head, Rheumatology Division, Director, Multipurpose Arthritis Diseases Center, Indiana University School of Medicine and session moderator welcomed the participants and indicated his interest in ensuring that over the course of the meeting the group discuss:

The differentiation between risk factors for categorical and painful OA.
Distinctions between risk factors for the initiation of OA in terms of pathology and disease progression.

Presentation of Epidemiology White Paper

Marc Hochberg, M.D., M.P.H.
Professor of Medicine and Epidemiology
University of Maryland School of Medicine

Dr. Hochberg explained that the role of the Epidemiology/Biostatistics/Genetics Subcommittee was to: 1) examine available clinical and laboratory data and specimens from existing OA natural history and epidemiology studies to determine their applicability to biomarker validation studies and 2) look at existing databases and determine their usefulness in meeting the needs of the OA Initiative. The information gathered on the cohorts was then included in the Epidemiology White Paper, the full text of which can be found at https://www.niams.nih.gov/Funding/Funded_Research/Osteoarthritis_Initiative/oaepip.asp

In his review of the Epidemiology White Paper, Dr. Hochberg highlighted some of the studies that are described in detail in the paper's appendices (https://www.niams.nih.gov/Funding/Funded_Research/Osteoarthritis_Initiative/oaepipappen_a.asp and https://www.niams.nih.gov/Funding/Funded_Research/Osteoarthritis_Initiative/oaepipappen_b.asp). He explained that since Dr. Nevitt would be covering hip OA later in the day, his presentation would focus on knee OA.

Population-Based Cohort Studies on the Development of Knee OA

Dr. Hochberg summarized the key characteristics of several population-based cohort studies that have assessed both the development and the progression of knee OA: the Baltimore Longitudinal Study on Aging, the Framingham Osteoarthritis Study, the Johnston County Osteoarthritis Project, the Michigan Bone Health Study, the Study of Women Across the Nation (MI site), the Chingford Study, and the Rotterdam Study. These studies, some of which comprised representative samples and some of which did not, cover the age span from the end of adolescence through the ninth and tenth decades of life. They tend to focus primarily on North American and European Caucasians, although two of the studies do offer data on African Americans.

The strengths of these studies in assessing both the development and progression of knee OA are that they are useful for either nested case-control or case-cohort designs and most offer data on covariates. The studies' major limitations in assessing the development of knee OA stem from the fact that the incidence rates of OA vary by age (causing much lower incidence rates in studies with a lower mean age), there was variability in the processing and storage of samples across studies, and no synovial fluid specimens were collected. With respect to knee OA progression, the studies are limited by the small number of cases that are available in several of the cohorts, the fact that most x-rays were taken in AP weightbearing views rather than in the Schuss or semiflexed position, and the variability that occurred in the processing and storage of samples across studies.

The risk factors associated with an increased incidence of knee OA are older age, female gender, being overweight, having hand OA, contralateral knee involvement (in persons with OA in one knee), a history of joint injury, higher levels of physical activity, quadriceps muscle weakness, greater bone mass, and lack of estrogen replacement therapy in postmenopausal women. These risk factors are considered a useful means for selecting high-risk groups in which to test OA biomarkers and interventions.

Many of the risk factors associated with the incidence of knee OA have also been found to be risk factors for progressive knee OA (increasing age, female gender, being overweight, having hand OA, having weak quadriceps muscles, and greater levels of physical activity). Other factors that have been linked to OA progression are micronutrient intake (vitamin C and D), lower bone mass, and inflammation.

Patient Cohort Studies on the Progression of Knee OA

Dr. Hochberg summarized the key characteristics of the following patient cohort studies that have assessed the progression of knee OA: the Boston OA of the Knee Study, the Indiana University Knee OA Progression Cohort, the Mechanical Factors in OA of the Knee Study, the Bristol OA 500 Cohort and more recent Bristol OA of the Knee Study, the Lund Postmeniscectomy Knee OA Cohort, the Nottingham Study, the Spenshult Knee Pain Cohort, and the Ulm OA Study.

The strengths of these studies are that data on covariates are available in most (although the degree of detail is lower than that found in population-based studies), MRI data are available in some of the cohorts, serial specimens allow for the measurement of change in markers, and several of the studies have synovial fluid specimens. They are limited in that most of the x-rays are taken in an AP weightbearing view (not in the Schuss or semiflexed position), processing and storage techniques vary across studies, and synovial fluid specimens are not available in some of the studies.

Summary and Conclusion

Dr. Hochberg concluded that existing population- and patient-based cohorts have data and specimens that are relevant to the goals of the OA Initiative. The data they can provide should be used to validate potential biomarkers for OA development and progression in the initial hypothesis-generating phase of the OA Initiative.

Discussion

Dr. Brandt asked Dr. Hochberg to comment on pain and risk factors for OA progression. Dr. Hochberg responded several studies have looked at knee pain as a risk factor for the progression of knee OA. In patient cohort and population studies in which the progression outcome was defined as having a total joint replacement, pain was clearly related to this outcome. Other studies using x-ray markers that assessed pain in the knee also found pain to be a risk factor for OA progression. Dr. Hochberg noted that pain in OA is related to changes in bone blood flow, most likely caused by inflammation, and in knee OA positive scintigraphy has been found to be a risk factor for radiographic progression. In addition, followup studies of OA patients have indicated that inflammation is probably a risk factor for OA progression in the knee.

Perspectives on the Status of Current Cohort Studies

David Felson, M.D., M.P.H.
Professor of Medicine and Public Health
Director, NIH Multipurpose and Musculoskeletal Diseases Center
Boston University School of Medicine

Dr. Felson noted that his talk would address the question of whether currently available OA population and clinical studies can provide the OA Initiative with sufficient epidemiologic data. Dr. Felson's presentation is based on studies of knee osteoarthritis and will be followed by Dr. Nevitt's presentation on studies of hip osteoarthritis.

Five general types of surrogates or biomarkers may be of interest to the OA Initiative: 1) diagnostic surrogates (a stand-in for disease) that can be obtained from a cross-sectional study, 2) surrogates for disease severity, also obtainable from cross-sectional study (neither of these first two types is of primary interest to drug development efforts), 3) predictors of response to treatment, which generally require trial data, 4) surrogates to monitor outcomes, which also usually require trial data, and 5) prognostic biomarkers that predict longitudinal outcomes.

Dr. Felson contended that for drug development and therapeutic development purposes, prognostic markers are very important. Cross-sectional or baseline prognostic markers might identify persons at high risk of OA progression who would be good candidates for studies. Longitudinal studies could help determine whether changes in disease status correlate with a biomarker or surrogate in a way that could be used to evaluate the likelihood that a drug is effective. The outcomes measured can be either incidence or progression of disease and might test whether a biomarker or a change in the biomarker correlates with a change in disease status and the strength of this correlation.

An observational study of a biomarker's validity requires a longitudinal design wherein disease assessment by a gold standard method is obtained at baseline and at followup and the biomarker is measured at baseline and followup. The outcomes of interest are cartilage loss (perhaps characterized by joint-space narrowing) and pain/disability. Dr. Felson noted that the Dougados studies showed a poor correlation between pain/disability and joint-space narrowing and pointed out that these two very different outcomes do not correlate well with each other. In addition, results emerging from the MRI field indicate that factors other than cartilage loss can narrow joints, an example being that meniscal subluxation can cause joint-space narrowing regardless of the presence of abnormal cartilage. Accordingly, structural changes in OA and OA symptoms are independent and important factors. Biomarkers may be correlated with x-ray change, worsening pain, or both, and some biomarkers may be correlated with specific structural alterations but not with global change. Biomarker studies need to include both symptom and structural data.

How many cases are needed to test a biomarker as it might relate to incident (new onset) OA? Dr. Felson's power analyses factor in errors in outcome measures (e.g., noise in joint-space narrowing measurements) and errors in biomarker measurement. To detect an odds ratio of approximately 1.75, Dr. Felson believes that about 150 incident cases are needed. More cases will be needed if the biomarker is measured with more error or if the odds ratio to be detected is lowered.

With respect to how many population study cases are available, Dr. Felson reported that of 11 U.S. population studies, only 5 have longitudinal radiographs obtained in the field (the original cohort of the Framingham Study, the Baltimore Longitudinal Study of Aging [BLSA], the Michigan Bone Health Study [MBHS], the Johnston County Study, and the Indiana University [IU] Study). Only three of these studies have serum or urine specimens (BLSA, MBHS, Johnston County); among these studies, the number of incident cases that have been published recently are BSLA: 29, MBHS: 9, Johnston County: not yet known (the study is currently in the field but this is expected to be a large number). Problems are caused by the studies' lack of consistent assessment of disease status, absence of longitudinal semiflexed films, inconsistent evaluation of pain and disability, and lack of other imaging data. The studies are also not representative of the U.S. population. There are two longitudinal European studies that would complement these studies, both of which have biomarker data from serum held for more than 10 years at -20 �C; however, it is not clear that they will produce useable data.

Because inefficiencies are caused by the absence or slow development of disease or progression in those at low risk, a population-based study of persons either at high risk of or who already have OA might be more efficient and offer a higher level of representativeness. Four clinical studies of knee OA (by Hernborg, Spector, Dieppe, and Ledingham) show some grade of progression. Although each study grades this progression differently, in general, two of the studies have higher rates of progression than those found in the Framingham Study and two do not.

Dr. Felson concluded that current population studies are not optimal for a study of prognostic biomarkers. If the results of current studies are pooled, the numbers will be driven by one large study, the one from Johnston County. A solution could be to enroll multiple current sites with followup and/or conduct a large prospective study of persons who are at high risk of or already have OA.

Within OA natural history studies, 900 subjects in 3 U.S. studies are currently enrolled and are being repeatedly followed with semiflexed films and other state-of-the-art tools. European studies are following approximately 540 subjects. Most studies are utilizing repeated standardized x-ray imaging and WOMAC assessments and have serum and urine specimen banks. In addition, other imaging data are available in some of the studies. An advantage of clinic-based natural history studies is their power to apply repeated measurements and continuously measured outcomes to the evaluation of biomarkers. Clinic-based studies also recruit persons at high risk of progression.

How many clinic-based subjects are needed to test correlation of change in biomarkers with a change in disease status? The scenario involves measuring both disease and biomarker at baseline and followup, testing to determine whether the biomarker changes when disease changes, and evaluating the correlation. The assumptions below assume 80 percent power. If the true correlation between the biomarker change and the actual disease measurement change is 0.6 and the noise of measurement (the error as proportion of population standard deviation [SD] is 0.1, and you wish to make sure the lower 95 percent confidence interval of the r between disease marker and disease change is no less than 0.3, you would need 197 subjects followed over 24 to 30 months. If the true r = 0.5, then doing a study in which the lower bound of the confidence interval for r is no less than 0.3 would require a sample of 820. If the measurement is noisier, more patients are needed. Dr. Felson recommended that a range of 800 to 1,760 patients be considered.

The existence of multiple studies with standardized assessments of knee OA makes a study of prognostic biomarkers feasible and valuable. However, such a study would be skewed to the types of patients found in clinical settings and is probably not generalizable to a larger pool of OA patients within the community, including those with earlier-stage disease. Another issue is the current and increasing belief that risk factors and possibly biomarkers may not be the same at different stages of disease; incident, early progressive, and late progressive OA may each be characterized by a different biology. If this is so, biomarker data from a clinical study would only provide an end-stage view of the processes involved.

Dr. Felson concluded that prognostic biomarkers have the most to offer the OA Initiative and recommended that information on these biomarkers be pooled from current clinical natural history studies. For population-based studies of a broader spectrum of disease, new, larger population studies of those with disease or at high risk are needed.

Perspectives on the Status of Current Cohort Studies (continued)

Michael Nevitt, Ph.D., M.P.H.
Department of Epidemiology and Biostatistics
University of California at San Francisco

Dr. Nevitt indicated that his presentation would look at some of the current longitudinal cohort data that are available for the hip in order to assess whether these studies can meet the needs of the OA Initiative. Existing studies may be most valuable for the identification of potential prognostic biomarkers, including predictors of progression and incidence, but this may also lead to potential surrogate biomarkers for use as outcomes in trials.

At least three current studies have sufficient information and numbers of cases to be potentially useful for studies of prognostic biomarkers of hip OA progression.

The ECHODIAH Study

The ECHODIAH study is a clinical study of 508 men and women with a clinical diagnosis of hip OA who were enrolled in a trial of Diacerhain. The average age of this study population is 60 years. Serum and urine were collected at baseline and at years 1, 2, 3, and 5 and stored at -20 �C. Weightbearing hip x-ray films were obtained at baseline and at years 1, 2, 3, and 5. In terms of patient outcomes, function was measured by means of the Lequesne pain index and by assessing joint-space width in each of the periodic radiographs. The occurrence of total hip replacement was also recorded.

Johnston County Study

The Johnston County Study is a population-based study in which nonweightbearing pelvis x-rays were obtained at baseline; from this assessment, approximately 675 individuals were identified as having radiographic hip OA, 50 percent of whom also had joint symptoms indicative of symptomatic OA. Serum was collected at baseline, and serum and urine are being collected in the ongoing year 5 exam; both types of specimen are stored at -80 �C. Repeat x-rays of the pelvis are being obtained in the ongoing year 5 exam. Patient outcomes were assessed through the radiographic measurement of joint-space width at baseline and at year 5, the evaluation of function by means of a health assessment questionnaire, and the measurement of pain.

Study of Osteoporotic Fractures

The Study of Osteporotic Fractures (SOF) is a population-based cohort study of elderly white women assessed with baseline nonweightbearing pelvis radiographs, the review of which identified 675 women, with a mean age of 71, who could be classified as having x-ray OA of the hip. About 40 percent of these women could also be classified as having symptomatic OA. Serum was collected at baseline and at years 2, 5, and 8; urine was collected at year 4; and both types of specimen were stored at -190 �C. Repeat pelvis radiographs were obtained at year 8. Patient outcomes were assessed through the measurement of joint-space width; the evaluation of function by means of a health assessment questionnaire; the measurement of pain at baseline and years 2, 5, and 8; and by recording the occurrence of total hip replacement.

Dr. Nevitt pointed out that it might be possible to pool the cases from these three studies to look at the progression of symptomatic hip OA. A simple power calculation can be based on the assumption that, of the individuals found to have hip pain and x-ray evidence of disease (about 1,100 individuals), 33 percent would show progression (clinical or radiographic) in anywhere from 3 to 8 years. For a continuous baseline biomarker, a pooled progression study could detect a difference between progressors and nonprogressors equal to a standardized effect size of 0.2 (marker difference between groups/SD of the marker) with a 90 percent power (alpha =0.05). In a worst-case scenario with a dichotomized biomarker having a prevalence of only 20 percent, a pooled study could detect an odds ratio of 1.7 for progression with a 90 percent power (alpha=0.05).

Although these are fairly good numbers, should a population-based sample of people from the community with radiographic OA be pooled with a clinical sample? One problem is that these two types of samples may have considerably different rates of progression. A review of the literature indicates that reductions in joint-space width of 0.2 to 0.5 millimeters per year are seen in clinical studies, whereas the annual change in the loss of joint-space width in the symptomatic hip OA cases of the population-based SOF was less than 0.1 millimeter. Additionally, 8 to 15 percent of the clinic-based samples underwent total hip replacement each year; in the SOF cohort, the annual rate was 2 to 4 percent. From a power perspective, it might make sense to combine clinic- and population-based studies of symptomatic hip OA progression; however, this raises concerns about validity. What is the appropriate target group-people in the community with OA, OA patients, or some combination of the two?

Dr. Nevitt indicated that it may be possible to use the two population-based studies (Johnston County and SOF) to look at prognostic biomarkers of hip OA incidence. Their combined samples would result in a potential incidence cohort of approximately 7,000 individuals who lack radiographic findings of hip OA at baseline and are eligible to develop incident OA. The main end point would probably be radiographic incident hip OA, although it may also be possible to look at symptomatic OA. Although the followup radiographs are not complete for the Johnston County Study, Dr. Nevitt estimates that the 2 cohorts combined will produce 290 to 300 incident cases of radiographic hip OA. For a continuous baseline biomarker, a pooled incidence study could also detect a difference between progressors and nonprogressors equal to a standardized effect size of 0.2 (marker difference between groups/SD of the marker) with a 90 percent power (alpha=0.05). For a dichotomized biomarker having a prevalence of only 20 percent, a pooled study could also detect an odds ratio of 1.7 for progression with a 90 percent power (alpha=0.05). Power will be somewhat less in a case cohort study design that measures a biomarker at baseline in all incident cases and in a sample of the controls (e.g., five controls per incident case) instead of the entire cohort.

In summary, Dr. Nevitt noted that it is clear that the individual studies themselves could identify potential prognostic biomarkers for both progression and x-ray incidence of disease. Pooling could increase the power of these analyses, although there are as yet unanswered questions regarding the appropriateness of this approach. If the value of these existing cohorts is to be realized, support will be needed for the additional analysis of stored specimens, pooled analyses, and meta-analyses. Additional funds would also be required if x-rays were to be read again to ensure standardization of reading between studies.

Although prognostic biomarkers are candidates for surrogate biomarkers, Dr. Nevitt indicated that the potential for using existing studies to validate the relationship between surrogate biomarkers and patient outcomes is very limited. The ideal study would measure multiple surrogates at various time points during the disease process, measure patient outcome data at multiple time points during the disease process, and assess biomarkers and specimens particular to a given stage or severity of disease at appropriate time points. The existing studies do not provide this information. Although the ECHODIAH study comes close to this design, the sample size is low (n=508).

Another consideration to address is whether change in radiographic joint-space narrowing is in fact a valid surrogate end point for patient outcomes, or whether it is best viewed as simply a separate outcome. This is not known at present, and changes in joint-space width over time need to be validated in terms of their relationship to patient function. One of the strengths of the ECHODIAH Study, Johnston County Study, and SOF is their potential to look at this issue in some detail.

Additional limitations of these existing cohort studies include the following:

The patient outcomes are not standardized, although the various instruments could potentially be cross-calibrated in retrospect.

The progression rate may differ in the clinical and population studies, limiting their capacity to be pooled.

In the two population-based studies, the outcomes are measured over a 5- or 8-year period, which may be too long a timeframe in which to assess certain disease process end points. For example, an 8-year study period could make it difficult to differentiate between risk factors for OA incidence and progression.

The SOF is a large study composed entirely of white women. This would tend to cause its contribution of cases to dominate the data obtained from the other studies.

No imaging modalities other than standard x-rays were used.

The measurement of markers is infrequent and variable.

The variety and quantity of the samples are limited.

Perhaps data from randomized controlled trials, particularly the more recent ones, can aid biomarker development efforts. Dr. Nevitt recommends that these be considered as components of the cohort studies used to generate hypotheses. In addition, are there ongoing studies that are not closed in which adding measurements could yield significant information? For example, the Health ABC Study holds the potential for serial MRIs, although additional funding would be required.

Dr. Nevitt concluded that answers to the OA Initiative's key questions on prognostic and surrogate OA biomarkers will probably require a new longitudinal study or studies. A parallel effort should seek to mine existing data for information from which to generate hypotheses about biomarkers. In designing a potential new cohort study, important issues to consider include the following:

Should the study target progression alone or should it include an incidence cohort? The identification of those at high risk of developing OA and the investigation of issues related to prevention will require an incidence cohort against which to evaluate markers.

Should the study cohort consist of a clinic-based sample, a population-based sample, or a combination of the two?

What definitions should be used for OA? This will depend, in large part, on the decision regarding whether to study OA in a clinic- and/or population-based sample.

Presentation of Straw Proposal on Epidemiology/Cohort Selection

Leena Sharma, M.D.
Assistant Professor of Medicine
Division of Rheumatology
Northwestern University

Dr. Sharma reported that her presentation was developed by the members of the Epidemiology Subcommittee to respond to the question of whether, in order to address needed areas of biomarker research in OA, the OA Initiative should sponsor a single study or a composite of several studies.

Several specific hypotheses fall under the umbrella of the goals of the Initiative, and a key question is what the primary hypotheses will be. Is the principal goal of the OA Initiative to identify, test, and validate: 1) a biomarker for disease initial development, presence/absence, progression, or stage; 2) a prognostic marker, i.e., a predictor of ultimate outcomes; 3) a biomarker that can identify homogeneous groups that may differ in terms of rate of progression and responsiveness to treatment; or 4) a biomarker of response to therapy, specific catabolic or anabolic processes, or side effects? The study design is driven by what the primary hypotheses are.

As in any study, validity is both an internal and external issue. Internal validity, the degree to which results are correct for the patients being studied, is determined by how well the study is carried out. External validity or generalizability is the degree to which the results of an observation hold true in other settings. A study with high internal validity may or may not be generalizable, and sampling bias occurs when conclusions based on a sample are generalized to dissimilar groups. In typical OA studies, participants are selected from a clinical setting on the basis of inclusion and exclusion criteria. Dr. Sharma noted that the results of these trials tend to be most applicable to the clinic patient group from which the subjects were recruited; they are less likely to be applicable to those from the population with symptomatic or asymptomatic OA. In addition, the emergence of new classes of disease-modifying treatment for OA will draw into the clinical trial population new sets of people with OA, e.g., persons who may not meet the usual definitions of symptomatic OA or had previously not pursued pharmacologic intervention.

Accordingly, the second key question that needs to be answered is, who is the target for emerging disease-modifying drugs: current clinical trial subjects, clinic patients, people with symptomatic OA, or all persons with OA? The cohort chosen influences the generalizability of results, and the OA Initiative needs to match the cohort choice to the ultimate target of disease-modifying OA drugs. Persons not represented in current clinical trial populations may constitute the majority of those with OA and be candidates for disease-modifying therapy. Patients not represented by current trial populations include certain ethnic groups, persons with asymptomatic OA, and those who have not sought medical attention for OA.

In this straw proposal, consideration of the following three cohorts is recommended:

Cohort A, a multicenter population-based cohort that would include both cases and noncases within a population. The people involved would be recruited from a population-based list and would not have to have a health care connection to be enrolled.
- Pros. This is the most all-inclusive cohort and would include subjects both with and without OA. It would also be possible to include people at very high risk of disease. The suggested cohort would be representative of subjects in and outside of the current trial population, maximizing generalizability. The design provides opportunities to address questions beyond the primary goals of the OA Initiative, includes disease-free controls, allows examination of alternative definitions of OA, and provides an opportunity to look at both disease incidence and progression. It also provides an opportunity to look at the natural history of OA from preradiographic stages and, if large enough, would allow for the examination of disease subsets. The greatest advantage offered by Cohort A is generalizability.
- Cons. A large number of subjects would need to be evaluated to generate a sufficient number of progressors. To ensure power, it may be necessary to perform the same evaluations in all diseased subjects. There is interest in being able to perform different sets of evaluations in different subgroups of OA, and this may be less feasible with this design. A greater cost is associated with this cohort, and the feasibility of performing long or complex x-ray and MRI protocols may be reduced. Potential ways to address these limitations include: 1) recruiting and focusing on subjects at high risk for disease, 2) focusing solely on progression and including only subjects with disease at baseline, and 3) performing a more extensive evaluation of subjects with disease at baseline. Dr. Sharma noted that the key concern with Cohort A is whether the study can be designed to ensure power.
Cohort B, a multicenter clinic/community cohort that, although not truly population based, would recruit patients with mainstream OA from clinics and by advertising in the community.
- Pros. This cohort would be representative of the usual trial subjects. (Depending on point of view, this could be considered a pro or a con.) The subjects may have a more rapid rate of progression than individuals with OA in the population. This protocol would provide a more efficient means of generating sufficient numbers of progressors at lower cost, and there would be opportunities to look at disease subsets. Power is less of a worry than in Cohort A, and unique evaluations could be performed in particular groups. Dr. Sharma pointed out that the greatest advantage offered by Cohort B is power.
- Cons. At best, this approach represents one subset and probably the minority of human OA cases-those who tend to see physicians and volunteer for studies. There are no built-in controls, and the design would provide no opportunity to examine incident disease. The focus of this approach would be narrower, with reduced potential for future applications, and there would be no opportunity to examine preradiographic OA. One way to address these disadvantages would be to recruit matched controls and follow them forward in time. A Cohort B study could be improved by being coupled with a Cohort C study. The key concern with the Cohort B design is the lack of generalizability.
Cohort C, a select high-risk cohort made up of patients who are almost certain to develop OA because of a specific insult, genetic predisposition, or other factor such as meniscectomy.

In selecting the cohort, consideration should be given to the fact that the scale of a larger study is not suitable for investigating:

Issues that require a long or complex evaluation.
Questions that require evaluations that are not at a stage of development suitable for inclusion in a large initiative.
Subgroups that may not be present, or present in sufficient numbers, in a larger study.

These components could be addressed through a subset of the larger prospective study or through cohorts established in currently existing studies.

The OA Initiative could use the cohorts of currently existing studies as future centers for the multicenter initiative. Although no current study was designed to achieve the goals of the Initiative, many existing studies offer a solid infrastructure that could be applied to a prospective study. The cohorts of currently existing studies could also be used as a means for examining questions not appropriate for inclusion in a larger study.

Dr. Sharma linked the OA Initiative goals to the above-mentioned cohorts on the basis of the recommendations of the members of the Epidemiology Subcommittee:

Potential OA Initiative Goal	Optimal Cohort
Biomarkers for the initial development of disease	A or C*
— for disease presence vs. absence	A or B (with controls)
Biomarkers for disease progression	A or B
— from preradiographic stage	A or C*
Biomakers for disease stage	A or B
— including preradiographic	A or C*
Prognistic markers, i.e., predictors of ultimate outcomes	A or B
Biomarkers to identify homogeneous groups that may differ in progression rate and responsiveness to treatment	A or B
— including preradiographic	A or C*
Biomarkers of specific catabolic or anabolic processes	A or B
— including preradiographic	A or C*

*For these options, there is concern regarding the applicability of the results to mainstream OA and the unpredictability of the time it takes to develop OA.

The potential goals of identifying biomarkers to quantify response to therapy or side effects of therapy are not included in the above list because of the lack of a treatment structure in the Initiative.

With respect to what definition of OA to use, Dr. Sharma indicated that none of the usual approaches (e.g., the presence of osteophytes) can identify the entire complement of individuals with OA. Consideration also needs to be given to the influence psychological factors can have on symptom and function self-reports. Accordingly, a useful byproduct of the OA Initiative could be a better definition of OA.

In conclusion, Dr. Sharma reported that the Epidemiology/Cohort Selection Subcommittee recommends the pursuit of answers to the two key questions that drive study design:

What are the primary hypotheses of the Initiative?
What groups are the targets for emerging disease-modifying OA drugs?

The answers to these questions will facilitate the selection of an optimal study design. At this point, it appears most likely that the design would involve a composite of studies. After the hypothesis is defined, one approach could be to perform studies involving Cohort B and Cohort C, generate data, refine the hypothesis, and proceed to Cohort A, with studies not appropriate for large-scale assessment conducted on a parallel track.

Administration

Introduction and Background

Gregory Downing, D.O., Ph.D.
Health Science Policy Advisor
Office of Science Policy
Office of the Director
National Institutes of Health

Dr. Downing noted that the public-private partnership model being established by the OA Initiative would create a template for the collaborative development of research resources that could be emulated by efforts targeting other disease areas. In the first half of this century, a great deal of scientific discovery was catalyzed by the collaborative efforts of the defense industry and the Federal Government. Since 1986, in the biomedical research arena, the Government has used a number of mechanisms to conduct collaborative research with nongovernment organizations such as industry. The OA Initiative represents a new iteration of this process.

Key administrative questions that need to be addressed in conjunction with the OA Initiative include the following:

How will the consortium be formed?
How will the project be managed?
What interactions will take place between the NIH, private sponsors, and project scientists?
How will access to repository samples be handled?
What is the true value of the OA Initiative?
How will intellectual property (IP) issues be handled?

To date, the management group's discussions of the Initiative's administrative aspects have considered the OA Initiative a joint international enterprise involving a variety of private- and public-sector organizations. The bridge between these sectors is the OA Initiative Public-Private Consortium. This consortium, which represents the groups underwriting the financial aspects of the Initiative, currently consists of several components of the NIH, other Federal agencies, and private sponsors. In the future, additional agencies and organizations may have liaison or ad-hoc relationships with the consortium.

The model recommended for the OA Initiative involves the NIH Foundation, a tax-exempt, nonprofit organization that has worked with numerous private sponsors to support training and other health-related research activities. OA Initiative funds would flow through the NIH Foundation to NIH institutes and centers, which-by means of the longstanding peer review system-would use these and their own resources to support the activities of a data coordinating center/repository and a series of contractors responsible for participant accrual and assessment.

Dr. Downing pointed out that a unique aspect of the OA Initiative is that the information and resources developed will be accessible to the public. The data and specimens generated by the Initiative would flow from the contractors to the data coordinating center/repository, through the NIH institutes and centers, and to the sponsors and the public through a variety of databases and mechanisms designed to appropriately control access.

Two important issues need to be worked out:

Who will have access to the finite specimens developed through the Initiative? Because in some cases there will not be enough specimens to satisfy the needs of everyone interested in using them, a system of prioritization needs to be developed.
How should the OA Initiative handle IP concerns? Resources brought in for validation may have IP attached to them, which raises a number of complex considerations. The Government is required by Bayh-Dole to give its contractors the rights to any new IP they develop using Federal funding. Although the primary OA Initiative projects (patient recruitment, core clinical and imaging data, public domain marker data, and markers that already have a filed IP) would be bound by the terms of Bayh-Dole, key independent projects-such as certain marker studies with complex IP issues-could be developed outside of the consortium. Although these independent projects would be able to apply for core resources, they would take place outside of the scope of the OA Initiative, which would allow private sponsors and collaborators to pursue licensing arrangements with the organizations involved.

Since the OA Initiative Steering Group was assembled in June 1999, the management group has interacted with the chairs of the various subcommittees to assess the state of the art in their respective fields, define key research questions, identify a broad administrative structure, and develop an overall research plan. Over the next few months, a plan for the OA Initiative Public-Private Consortium will be developed, and potential sponsors will be asked to decide whether they plan to participate. Scientific input will be coalesced, requests for proposals (RFPs) will be developed, and draft RFPs will be posted on the NIH Web site for public input from all corners of the scientific community. NIH Project Officers will be appointed to assist with the preparation and administration of the RFPs, which it is hoped will be released early in the summer, reviewed and scored in the fall, and awarded early in 2001.

Presentation of Straw Proposal on Administrative Model/Structure

Steven Stimpson, Ph.D.
Exploratory Discovery Head
Musculoskeletal Diseases
GlaxoWellcome, Inc.

Dr. Stimpson indicated that his presentation picks up at the point at which the awards for the OA Initiative have been made and project management begins in earnest. Following the appointment of a dedicated Project Officer, priority should be given to the development of the structure for the day-to-day operations of the Initiative. Key management and organizational elements include:

The OA Initiative Public-Private Consortium.
The OA Initiative Steering Committee, to be composed of the contractors and NIH staff representing the consortium (although other consortium members could attend meetings and provide input).
Additional committee structures charged with addressing important issues, such as resource allocation, publications and presentations, cohort recruitment, the distribution and use of samples, and IP.
The OA Initiative Coordinating Center, the role of which could include the development and updating of the protocol and other study materials; training clinical site staff; collecting and storing clinical data, imaging data, and biological specimens; analyzing and reporting data; establishing and maintaining central laboratories as required; and implementing quality assurance and quality control procedures.

Dr. Stimpson pointed out that the private sponsors of the OA Initiative would benefit from:

The creation of a valuable research resource that will facilitate future research and development activities targeting the diagnosis and treatment of OA and the utilization of new technologies.
The opportunity to participate in a unique collaboration involving OA experts from industry, academia, national organizations, the NIH, and the FDA.
The fact that funds provided to the NIH Foundation may be tax deductible.
Having the opportunity to provide input that could influence the direction of the research conducted through the Initiative.
Having potentially early access to OA Initiative data, radiographic images, and biological samples-information that could facilitate sponsors' plans for more efficient clinical programs and FDA presentations.
Having the potential opportunity to provide input into decisions on how the limited biological samples established by the Initiative will be used for future research.
Having the opportunity to gain access to core resources to pursue certain ancillary projects (such as marker studies with complex IP issues) with cross-licensing opportunities with other private sponsors.

Professional research and voluntary health organizations sponsoring the Initiative would gain from:

Being able to advance knowledge about OA disease pathogenesis and burden-of- illness measures.
Being better able to identify scientific opportunities for future research and strengthen the platforms on which clinical research and training are conducted.
Having potentially early access to OA Initiative data, radiographic images, and biological samples.
Having the potential opportunity to provide input into decisions on how the limited biological samples established by the Initiative will be used for future research.

Universities and research organizations participating as contractors would benefit from:

The receipt of funding for OA research.
Being able to develop new research cohorts.
The establishment of a multicenter network for OA studies.
Having access to data and radiographic studies and, through a defined evaluative process, the biological materials collected.
Being provided platforms on which to develop assays and measurement technologies.

Nonsponsoring regulatory agencies and health quality and reimbursement organizations would gain from:

Having access to information that facilitates science-based regulatory decision-making, including the assessment of biomarkers as surrogate end points in clinical trials, standardization of reporting, and the development of regulatory guidelines.
The development of an improved scientific base on which to evaluate outcomes.
Having access to information that facilitates health resources utilization planning.

In conclusion, Dr. Stimpson noted how impressed he has been with the collegial spirit he has observed in the broad spectrum of players involved in planning the OA Initiative.

Discussion

In response to a question about public funding of the Initiative, Dr. Katz noted that the NIH would be an equal sponsor in every respect. Because NIH cannot provide money to the NIH Foundation, Federal funds will be provided directly to the NIH components involved.

A participant pointed out that once the size of the trial has been determined, it will be important to decide what major question is to be answered and, therefore, how the trial will be powered. Will the trial be powered for clinical outcome, imaging outcome, or biochemical outcome? Vastly different implications are associated with each of these choices. Dr. Stimpson stated that a reasonable place to begin would be to consider aspects of proof of concept-proof that a hypothesis is worthwhile-in the early, Phase II components of the trial before moving on to more expensive late-stage trials.

The attendee also stressed the importance of ensuring that all OA Initiative clinical trial sites adhere to good laboratory practices (GLP), and Dr. Stimpson indicated that appropriate points of this nature could be included in the RFP description.

An attendee noted that important unresolved issues are associated with the question of IP and how it will impact panels of markers, markers coming together because of preexisting IP, or perhaps genetic markers that may also provide new disease linkages. For some of the sponsors, it will be important to know whether they will be in a position to help develop the markers toward commercial entities. Will the validation trials be conducted in a way that allows diagnostic markers to be licensed or registered? Dr. Stimpson stated that these questions reflect the need to keep the structure of the OA Initiative flexible enough to incorporate input that addresses these types of issues as the Initiative moves forward.

A participant pointed out the value of having the private sponsors of the OA Initiative incorporate marker studies in controlled clinical trials of agents they are currently testing or expect to test in the near future. This would add the double power of being able to relate markers to disease progression and to disease responsiveness to treatment, and could make very large numbers of samples available to the Initiative. Consideration should be given to how this could be facilitated without threatening companies' internal interests in terms of the agents they are testing. For example, companies might be able to immediately release access to samples from placebo groups and wait until their drugs have been registered before releasing test group samples. Pharmaceutical companies are much better at organizing clinical studies than academic institutions, and the OA Initiative should seek to utilize their expertise.

An attendee indicated that before making a commitment to sponsor the OA Initiative, his company would need to review a detailed scientific proposal and organizational structure. It would be helpful if a timeline were laid out that contained a description of the activities that need to be accomplished to reach a stage of quantitative, not just qualitative, commitment. Another participant agreed that a concrete commitment from any organization would require clarification regarding the study design, what all of the parties will get out of the study, what the cost will be, and what the timelines are. Dr. McGowan responded that the draft RFP will contain the details of what the Initiative is soliciting. Accordingly, between now and May 2000, the goal is to incorporate input from this meeting into a plan that can be presented to the management of the respective organizations. The draft RFP will solicit public comment via posting on the NIH Web site, the responses to which will be used to fine-tune its contents.

A participant noted that it was her understanding that OA Initiative funding from sponsoring private-sector organizations would not necessarily be derived from a given organization's clinical budget. Rather, as a potentially tax-deductible contribution, it could come from elsewhere in the company. Dr. Downing observed that a number of company leaders had previously indicated that this mechanism offers certain strategies that do not compromise current product development lines, e.g., by treating the funding as a charitable contribution.

In response to an attendee's query about whether sponsors would contribute equally,
Dr. Downing noted that decisions about threshold financial commitments, payment levels, and utilization of resources had not been made and were still open for debate. Such decisions should be made collaboratively by the members of the consortium.

A participant pointed out the importance of an interface with experts doing OA-related basic research outside the Initiative. Consider the existing, retrospective databases of radiographs that are now deemed to be of little use in a new prospective study. As
Dr. Poole indicated earlier, the specialists developing and extending these databases do not have the resources to test them appropriately. It is important that the members of the consortium continue to assess what is happening in the field and make every effort to keep people engaged in related research "in the loop." The strength of the consortium should be used to support the development and evaluation of new techniques and, as appropriate, promptly incorporate them into the protocol. The way the study design is currently laid out, the relevant emphasis appears to be on tiers 1 and 2, to the detriment of the important work that could come out of tier 3. In the long run, keeping the work of these other researchers right up front may help the Initiative avoid important problems and save considerable money. Perhaps a ratio could be established that would determine how the funding should be split amongst the three tiers. With respect to the difference between studies targeting optimal methods for following disease course and therapy and studies associated with the discovery of mechanisms of disease, it was noted that although the Initiative could prove very useful to studies into mechanisms, the primary purpose of the OA Initiative is to enable more efficient clinical trials through the development and validation of new markers. These will also provide new tools for basic science research.

Dr. Peterfy noted that the proposed three-tiered structure is intended to both capture the most reliable techniques known today and provide a mechanism that incorporates new knowledge as it develops. Unfortunately, the three tiers come across as being ranked, although that was not the intent. Another attendee pointed out that this perception is influenced by the fact that dollar amounts could be roughly computed for tiers 1 and 2, whereas tier 3 is so exploratory and contains so many unknown dimensions that it is difficult to attach a dollar amount to it. She pointed out that the three tiers appear to be staged not in terms of time or priority but in terms of sample size.

Dr. Stevens indicated that a primary issue for tier 1 is that it has to be robust enough for each of the sites to be able to do the methodology. Tier 2 may have a subgroup of clinical sites. Although statistical power is needed, the sites involved in these studies will have the expertise required to carry out more difficult or demanding requirements. Tier 3 studies are expected to take place in sites that are excellent in a very specific area of interest and have the research methodology and equipment required to perform the work involved.

An attendee pointed out that the Initiative represents a blend of study and repository resources that is unique in that it will probably have genetic material as well as other biological specimens and images. In some organizations, this will raise issues with institutional review control boards, particularly when one has to go back and re-consent individuals for the materials provided and address the issues associated with anonymous versus nonanonymous approvals. The members of the consortium need to think about what techniques the Initiative seeks to achieve and how to ensure it obtains the information needed in a way that will avoid future problems with confidentiality, accessibility, and so forth.

A participant commented that the consortium can make a reasonable attempt to deal with current biomarkers that show promise without necessarily engaging the whole program of research proposed. The work could be accomplished in an incremental, cost-effective fashion designed to build confidence in systems. Organizational backup and support systems could also be developed through an incremental confidence-building approach that gradually tackles the issues mentioned by the previous speaker, issues that it would be disastrous to try to address now. This participant noted that the Medical Research Council, which is the equivalent of the NIH for the United Kingdom, has just called for proposals for genetic storage for disease process assessment. However, international boundaries pose a greater ethical dilemma, and at this stage it would not appear that a consortium like the OA Initiative should seek to store genetic material. Dr. McGowan pointed out that many of the people involved in the consortium have been involved in studies that store and use genetic material and know how to write a proper consent form and ensure that all appropriate checks and balances have been addressed. Another attendee noted that the genetic materials issue does have an impact on the Initiative's ability to use data from existing clinical trials and ongoing clinical trials sponsored by the private sector.

Adjournment

The first day of the meeting was adjourned. Open forums were held during the evening to obtain feedback and invite discussion on the proposals on biochemical markers, imaging, epidemiology, and administration.

February 29, 2000

Clinical End Points

Virginia Kraus, M.D., Assistant Professor, Division of Rheumatology, Department of Medicine, Duke University and session moderator welcomed the participants to the session and introduced Drs. Freund and Bellamy.

Perspectives on Clinical End Points for Osteoarthtitis

Deborah Freund, Ph.D., M.P.H.
Vice Chancellor and Provost
Office of Academic Affairs
Professor of Public Administration
Syracuse University
(Representing the Arthritis Foundation)

Dr. Freund pointed out that although she was nominated to participate in the meeting by the Arthritis Foundation, the content of her presentation is based on her experience as principal investigator of the patient outcome research team (PORT) on knee arthritis and knee replacement at Indiana University (see Heck, D.A., R. Robinson, C.M. Partridge, R. Lubitz, and D.A. Freund, "Patient Outcomes after Knee Replacement", Clinical Orthopaedics and Related Research, Vol 356, pp. 1-18, 1998).

Her presentation's major take-home messages are that:

There are relatively inexpensive clinical outcome measures that can be used to obtain outcome data for whatever type of study is utilized by the OA Initiative, e.g., a single cohort study, many cohort studies, or a clinical trial with cohort studies.
There are methods that can be used to retrospectively attach outcome data to currently ongoing open cohort studies, providing the opportunity to answer-at least in a crude way-outstanding questions on the prognostic value of physiologic and other types of biomarkers and outcomes, particularly in clinical settings.

The goal of the PORT initiative was to determine the role and effectiveness of knee replacement in the treatment of knee OA, what the clinical indications for knee replacement were, and patient outcomes. The data was obtained primarily from clinical settings and the number of cases at any one time ranged from 40,000 to 350,000 individuals.

The population selected was all beneficiaries of Medicare, a national insurance program for persons in the United States who are 65 years of age or older. The team targeted all Medicare reimbursed knee replacements from 1985 to 1990 and endeavored to identify a comparison or control group of nonoperated patients and claims. Dr. Freund noted that this is difficult to do but can be facilitated by treating claims data obtained from billable or nonbillable services as patient encounters. Inclusion and exclusion criteria can be identified for claims data in the same way they would be in a clinical trial, enabling the population of interest to be narrowed down, albeit crudely, on the basis of clinical parameters. Claims were used as a sampling frame and permission was requested to access medical records; the positive response rate to the request for medical records was 80 percent.

General health status information was collected with the SF-36 and clinical function, pain, and other data was collected with the WOMAC. For the OA Initiative, the instrument selected should be based on the ultimate goals of the studies and the hypotheses to be tested; however, it is generally preferable to use both. The study design should also consider the inclusion of short form 8, a mechanism for tracking function over time that can be completed in approximately 30 seconds.

Dr. Freund reported that the methodology utilized by this PORT initiative can be used to:

- Follow a cohort of patients by identifying how much they are being treated, for what, and what the burden of disease is. For a clinical trial, patient records need to be attached.
- Use clinical outcome surveys to relate the biomarker information collected for a sample of people in an existing cohort to selected health outcomes.
- Obtain face validity by estimating the incidence of a given procedure and comparing it to the incidence of another. For example, the PORT study compared the incidence of osteotomy to incidence data on knee replacements within a population of claims.
- Estimate at a national level the incidence of a variety of outcomes of interest, e.g., gastrointestinal bleeds .
- Track general health and mental health.
- Obtain mortality data.
- Support adjunct studies.

On the basis of the PORT's research findings, Dr. Freund indicated that is it critical that the OA Initiative adequately incorporate the outcome measures of pain and function into its study design. The data for these measures can be obtained relatively easily and attached to virtually any type of study, allowing physiologic or other types of biomarkers to be correlated with final outcomes and other intermediate outcomes such as joint-space narrowing, measurements derived from MRI or PET scans, and so forth. Once a high correlation is identified, the Initiative will have a biomarker to compare with the most important factor, the prevention of disability.

Discussion

A participant noted that his institution had conducted a small, but similar, study of the hip that found that, while many persons initially said that their primary reason for seeking the intervention was pain, post-intervention they indicated that their primary reason was to engage in a particular activity such as playing golf. Dr. Freund agreed that people's expectations can be a moving target and change over time. However, patient expectations are a very important factor that should be assessed at baseline and over the period of a study, which allows changes in expectations to be assessed, anticipated, and adjusted for.

In response to a query on whether the study data had been compared to an age-matched non-OA population, Dr. Freund noted that no such comparison had been made because the study population consisted of people with OA who had progressed to a certain point.

An attendee asked what the percentage of patients with normal joint-space width was at the time of the intervention, and Dr. Freund responded that it was 20 to 30 percent of the patients. The attendee also asked if the microscopial appearance of the cartilage in the patients had been assessed at intervention. Dr. Freund indicated that very detailed interoperative records had been obtained for each patient but the data collected has yet to be full analyzed. Detailed information obtained from patient charts is not being analyzed because it was found to be almost worthless. For example, information on the type of prosthesis used-which is legally required-was absent from 50 percent of the charts.

A participant noted that because the films obtained in the PORT study were standing extended knee films, the assessment of joint-space width cannot be correlated with the cartilage the patients were actually living on. Dr. Freund pointed out that the importance of flexed knee films was not known when the study was planned, and it has yet to be determined whether joint-space width as measured by flexed knee radiographs is an good biomarker. The most important thing is to ensure that there is a strong correlation between the biomarker selected and the final outcome of interest.

Perspectives on Clinical End Points for Osteoarthritis (continued)

Nicholas Bellamy, M.D., M.Sc., F.A.C.P., F.R.C.P. (Glas, Edin), F.R.C.P.(C), F.A.F.R.M., F.R.A.C.P.
Director
Centre of National Research on Disability and Rehabilitation Medicine
Faculty of Health Sciences
The University of Queensland

Dr. Bellamy indicated that his presentation would focus on the utilization of the WOMAC and other instruments as techniques for measuring clinical end points within the OA Initiative.

Over the past 15 years, a number of OA measurement guidelines have been developed:

Guideline	Year
EULAR	1985
FDA	1988
SYSADOA	1993
OMERACT III	1996
OARSI	1996
FDA (draft)	1999/2000
RCI (Response Criteria Initiative)	2000

At the OMERACT III meeting, consensus was reached on a core set of clinical measures for OA clinical trials that consisted of pain, physical function, and patient global assessment-with imaging as an additional measure for studies lasting 1 year or longer. The meeting participants also strongly recommended that future studies include generic measures of health-related quality of life (HRQOL). This consensus was subsequently ratified by the Osteoarthritis Research Society.

Most recently, a task force of the Osteoarthritis Research Society International, in collaboration with representatives from industry, academia, and government, has developed an algorithm for adjudicating response versus nonresponse based on the core clinical variables of pain, physical function, and patient global assessment. The criteria are based on both percentage change and absolute change and appear to be class specific. The results of this initiative have been submitted to Osteoarthritis and Cartilage.

Efforts to measure clinical end points are complicated by different patterns of joint involvement; weak associations between nonclinical and clinical indicators; diverse biochemical events occurring in different joints that result in detectable markers in various body fluids; issues with the feasibility of multijoint imaging; the availability of multiple clinical indices, only some of which are joint specific; the poor predictability of structural progression; the occurrence of new joint involvement alongside progression of structural damage in already involved joints; and the effects of comorbidities and cotherapies.

Dr. Bellamy noted that measurement procedures could be categorized as follows:

Aggregated joint-specific multidimensional indices that use aggregation techniques to compress the results of a number of questions on different dimensions into a single score (e.g., the clinical severity indices). The method of weighting in orthopaedic indices can have a profound impact on the interpretation of the score. Dr. Bellamy favors explicit weighting techniques such as the PARIS Sectogram over implicit techniques based on index composition.
Segregated joint-specific multidimensional indices (e.g., WOMAC, AUSCAN, HAQ, AIMS, AIMS 2). Dr. Bellamy expressed a personal preference for this approach given the complexity of OA and the multidimensional nature of the symptomatology.
Generic HRQOL indices and utility-based measures (e.g., SF-36, EUROQOL, NHP, Health Utilities Index). These measures provide insight into the more general health implications of monoarticular and multiarticular forms of OA.

Conceptually, Dr. Bellamy sees OA as a multidimensional problem that requires a combination of joint-specific and generic HRQOL measurement approaches. Accordingly, joint-specific symptoms could be assessed with the WOMAC Index (hip and knee) or the AUSCAN Index (hand). To look at the overall impact of arthritis, measures such as HAQ, AIMS, or AIMS 2 could be included. For HRQOL assessment, measures such as the SF-36 and/or Health Utilities Index could be used. Before the appropriate battery of measures to be utilized can be identified, agreement regarding domains, dimensions, and instruments is needed. Consideration should be given to the nature of the concept being explored and the sample sizes required to detect change or associations for each given measure.

Dr. Bellamy gave an overview of the status of the WOMAC Index, which was first developed as his M.Sc. thesis in 1982 and has since undergone several revisions. The early versions (WOMAC and WOMAC 3.0) targeted hips and/or knees; WOMAC 3.1, the most recent version, targets individual joints. Comparative studies of some of the unidimensional measures (e.g., ROM, ICD, IMS) have shown the WOMAC to be more sensitive; studies comparing WOMAC with multidimensional, disease-specific measures (HAQ, FSI, and indices of clinical severity) have shown overall comparability in statistical efficiency; and three different studies comparing the WOMAC with the generic measure SF-36 have found that: 1) the two indices measure two distinct and important aspects of health, 2) the WOMAC is more sensitive in OA than SF-36, and 3) the WOMAC is more efficient than SF-36. Dr. Bellamy noted that, in general, disease-specific measures require lower sample sizes and are more restrictive in their dimensionality, whereas generic measures are less restrictive but require higher sample sizes. He noted that a combination of both types of measures provides considerable advantage in dissecting the dimensionality of the disease and its response to treatment.

In recent years, Dr. Bellamy has become concerned with index content modification occurring ad hoc, during person-to-person transmission between private users. This has resulted in wide variations in the number of questions and the response scales employed. He has recommended that users of instruments contact the originators directly before using their instruments, in clinical studies and clinical practice, in order to obtain authentic versions of the instruments. To help inform users about the WOMAC Index and the many language forms available (currently 55 translations for WOMAC 3.1), a Web site (www.womac.org) is currently under development. In addition, WOMAC User Guide IV, containing more than 250 references (most of which relate to WOMAC-based studies), has recently been completed. Other current initiatives include the development of parallel versions in VA and Likert for all WOMAC translations, the use of touch-screen technology and telephone administration of the WOMAC, the creation of a short form of the WOMAC, and the definition of minimum perceptible clinical improvements (MPCIs) and minimum clinically important differences (MCIDs) for the WOMAC Index. Of interest was the fact related by Dr. Bellamy that self- versus telephone-administered forms of the WOMAC agree within about 2.6 percent.

The WOMAC continues to be a popular instrument for health status measurement in knee and hip OA. Additional recently developed affiliated instruments include the WOMBAT 3.0 Index (a modification of the WOMAC containing a patient global question), the AUSCAN 3.0 Index (a self-administered OA hand index measuring pain, stiffness, and function and available in LIKERT format), OGI 8.0 (a battery of eight patient global assessment questions), and a developmental index termed 02MS 10.0 (a new attempt to deal with changing patterns of involvement over time, including the assessment of nontarget joint involvement and comorbidities in OA patients). The O2MS incorporates elements of the WOMAC, WOMBAT , OGI, and AUSCAN indices.

In conclusion, Dr. Bellamy recommended that the OA Initiative utilize measures that are valid, reliable, responsive, and relatively simple, brief, and easy to score. Flexible multimedia administration and the possibility of capturing information by telephone should be pursued. Depending on whether the OA Initiative is based in the United States or internationally, a number of alternative language forms will be needed. Dr. Bellamy suggested that thought be given to target-joint selection for imaging and synovial fluid analysis and instrument selection.

Dr. Bellamy's recommendations regarding instruments to include in an OA study were separated into three categories: joint-specific measures, general OA measures, and individual health-related quality-of-life measures. Specifically, he recommended that the WOMAC Index (knee and hip) and the AUSCAN Index (hand) be considered for joint-specific disease, that the HAQ or AIMS 2 be utilized to measure general OA, and that the SF-36 or HUI be used to determine patients' health-related quality of life or utility. For patient global assessments, the OA Initiative might consider using the OGI 8.0. Measures of comorbidity will likely also be necessary.

Discussion (Panel Members and Meeting Participants)

Dr. Kraus inquired whether actual objective physical function and testing were in any way additive relative to verbal questionnaires directed at function. Dr. Bellamy responded that the measurement of performance can be quite important but tends to occur in a somewhat artificial environment. There are some issues of interaction between the assessor and the patient, and depending on how the measure is taken, observer reliability demands consideration. Accordingly, he prefers-and believes the OA Initiative would be better served by- measures that seek to determine how patients feels about their ability to perform a task rather than tests that specifically measure how fast a given task is completed.

A meeting participant asked about the tool to evaluate the symptoms coming from nontarget joints that Dr. Bellamy is working on . It is likely that the goal of future studies will be to evaluate biological markers that can indicate the total amount of OA in the body of a patient. For example, a study has investigated the correlation between hyaluronic acid and the total amount of OA in a patient (as measured by the Lansbury Index). What does Dr. Bellamy think about the possibility of looking at the presence and amount of OA rather than the symptoms of OA? The participant felt that the evaluation of a systemic biological marker should incorporate a clinical tool that permits the measurement of the amount of OA, and it is unlikely that it will be possible to obtain an x-ray of all the joints at baseline. Dr. Bellamy pointed out that some aspects of this measurement are contained within the 02MS, but it is at a very early conceptual stage, and the O2MS has yet to be validated. The Lansbury approach may, in fact, be more relevant to OA than RA because there may be a higher correlation between the Lansbury weights and the surface area of the cartilage than between the Lansbury weights and the surface area of the synovial membrane. Although this approach could be used to count the number of involved joints, Dr. Bellamy does not know whether the capacity to measure the totality of OA at a biological level is adequate. The use of MRI should give considerable insight into structural change but still may not detect the earliest superficial changes in cartilage.

Asked what the minimum clinically meaningful difference would be compared with placebo in a controlled study, Dr. Bellamy indicated that 10 millimeters on a 0-100 scale has been identified as the MPCI. He pointed out that the MPCI is not necessarily the same value as the minimum clinically important difference (MCID). The MCID, which should be determined on the basis of the trial setting, the intervention, and other considerations, may need to be discussed between regulatory authorities and manufacturers in the case of interventional studies with input from instrument originators as needed. Dr. Bellamy also noted that the recent development of response criteria by the OARSI and that MCIDs in OA is on the agenda for the OMERACT V conference in May 2000. A Delphi exercise based on WOMAC data will be used to explore the MCID issue in hip and knee OA.

A participant noted that since the change from baseline is really a function of baseline, it seems the higher the baseline the more decrease can be achieved. Accordingly, how would the baseline issue be addressed and OA be defined on the basis of the WOMAC score? Dr. Bellamy indicated that based on the pain literature, there is a correlation between initial pain rating and subsequent pain relief. This is probably also true for function, although he has not looked at that specifically. When setting threshold criteria, OARSI guidelines give some guidance about some extreme values, and OARSI response criteria can provide a benchmark for minimum pain, minimum function, or minimum patient global assessment values at entry-likely in the order of 30 to 40 normalized units. Dr. Freund pointed out that the need to set threshold values depends on the primary hypothesis being tested. The PORT group found that the patients whose baseline pain and function were the worst responded the most. However, although the disease-specific and general health status measures used resulted in qualitatively similar information, variations in response were generated by the differences in what they measure.

Responding to a query about recall bias and changes in patients' perceptions of the importance of pain versus function, Dr. Freund noted that after the passage of time, the pure preference or utility measures used in her study detected a small amount of recall bias compared with baseline measurements. Although patients could remember whether they got significantly better in a qualitative way, they tended to underestimate their original pain. Dr. Bellamy indicated that he had not looked at the recall factor beyond a 2-week window but that the OA Initiative should be conducted prospectively using instruments whose timeframes are clearly defined. Dr. Freund agreed, noting that her point was intended to demonstrate the ability to reanalyze or retrospectively add data collection to an open-cohort study.

Summary of Open Forum on Biochemical Markers

Thasia Woodworth, M.D.
Senior Associate Director
Pfizer Central Research
Pfizer, Inc.

Dr. Woodworth reported that the previous evening's open forum on biochemical/ molecular markers generated the following recommendations:

Detailed criteria need to be identified for prospective study assay characterization, selection, and review, including the identification of collection and storage requirements, technical validation techniques, mechanisms for reviewing preliminary clinical data, and methods for reviewing each candidate assay before accessing the archives.
Because markers may perform differently in different populations, the OA Initiative should ensure that study cohorts include persons with well-characterized disease status and arthritis-negative controls. Well-characterized, age-matched, control populations are required to ensure that changes in a marker are reflective of disease status, not other factors such as aging.
Currently available biospecimen archives do not appear to meet the needs of the OA Initiative. They are retrospective, and the members of the Biochemical/Molecular Markers Subcommittee are not convinced that the specimens involved have been collected or stored in a way that ensures that they can be used to reliably assess particular markers. In addition, the frequency with which the specimens were collected relative to available progression data and data characterizing disease status does not appear to satisfy the Initiative's needs. However, these biospecimen archives should be thoroughly characterized for possible use in conducting initial marker validation.
Additional information should be sought on the utility of using hot bone scans versus MRI bone marrow edema to identify cohorts of patients at high risk of progression.
The OA Initiative should seek to determine whether ongoing NIH-sponsored intervention studies could provide retrospective and prospective material for assay assessment.
The subcommittee recommends that the Initiative sponsor several small prospective studies of well-characterized patient and control populations as opposed to one large study. The studies should include rapid progressers, early/moderate/late OA patients, and arthritis-free controls, all well characterized with radiographs and perhaps MRI data. Depending on the nature of the marker, samples should be collected on a monthly or quarterly basis. The assays utilized should meet set technical and preliminary clinical criteria and be conducted in a GLP manner to ensure that the data produced is reliable and auditable. The collection of specimens should also be backed by compliance with proper IRB and ethical committee informed consent procedures, and genetic specimens should be made anonymous relative to the database.
Technical assay criteria should ensure that analytes are well characterized. Because a limited amount of each biospecimen will be available, the analyte should be analyzable in a specimen of 200 microliters or less. Reagents should have established manufacturing specifications. The analyte should have a linear dilution curve with dilution consistency, be reproducible (diurnal with intrapatient variation characterized), be assessed in terms of spike recovery and matrix effects, and be stable and stored under known conditions. Depending on the nature of the marker and fluid being assessed, the patient standard deviation should not exceed 15 to 20 percent. The assays should be characterized in OA patients, subjects with other joint diseases, and normal controls. The specimens recommended are heparin (both EDTA and heparin plasma), serum, urine, and synovial fluid. If possible, specimens should be stored in aliquots at -70 �C.
Because it is important to conduct preliminary test validation to determine the acceptability of a validated assay for accessing prospective biospecimens, the Initiative should establish a well-characterized group of biospecimens that can be used to verify that an assay is performing in the manner developers say it does. Consideration should also be given to establishing a pooled serum or plasma bank, the specimens in which could be used as high and low standards for the various assays.
The OA Initiative should characterize a study to test which biomarkers correlate with clinical outcomes.
The Initiative should develop methods and procedures for obtaining access to biospecimens and clinical data.

Discussion

A participant pointed out that the open forum's assessment of the data available through current studies did not result in a review of all possible studies. What was described in Dr. Woodworth's presentation was the group's general impression of the studies as a whole; however, it is likely that a small minority of current studies would meet the criteria required. Dr. Woodworth agreed that this was a good point and noted that the consensus of the group was that they were concerned about collection conditions and storage procedures. She noted that the group strongly recommended several small studies of approximately 500 patients each to ensure the proper collection and archiving of the biospecimens and well-characterized linkages with the clinical data, components that could be lost in a much larger study.

An attendee noted that he did not agree with the need for studies of 500 patients. Some of the most powerful studies conducted to date are those that involved very well-defined clinical populations, were well characterized, and compared patients showing rapid progression with those with slow progression. These types of studies exemplify how extremely meaningful data can be obtained from well-defined cohorts of as few as 10 individuals. The subcommittee requested specific examples in order to allow the concepts involved to be applied to future planning.

A participant asked whether the group had discussed biomarkers relative to the types of assays that should be run and their robustness. Will the criteria established preclude the assessment of samples that may already be available? Dr. Woodworth responded that for particular markers that have been shown to be robust, a well-characterized description of specimens collected in retrospective studies might allow hypothesis generation. However, efforts to test the way markers reflect clinical end points need to be designed and conducted prospectively. Another participant asked the marker specialists attending whether they were aware of existing assays that are robust regardless of what time of day the samples were collected, whether the samples were immediately frozen, whether the samples were kept at -20 �C or less, or how long the samples were stored. An attendee responded that extremely robust assays stored at -20 �C are being used constantly in investigative work. Very well-defined patient populations exist that could-through the conduct of small, well-defined studies-provide the Initiative with valuable information.

A participant agreed that the Initiative might address some questions through relatively small and carefully designed cohorts. Perfectly acceptable scientific approaches can be applied to the calculation of sample size on the basis of the question being asked, what is known, and what is being measured. Accordingly, sample size could be calculated on the basis of the question to be answered, existing information about the variability of assays among patients, and published data on the patient cohort. The Initiative could then require that persons proposing a study defend the size of the cohort involved and demonstrate how their study will measure the range of biomarkers in the different joint tissues of interest, facilitating the Initiative's ability to pool this data with that generated by other studies. Although this information is not sufficient to provide the Initiative with the overall answers needed, it could provide pointers on how to design the next set of studies. A representative of the open forum on epidemiology noted that that group had dealt with the issue of sample size and what would be an acceptable biomarker from their point of view.

Dr. Hochberg stated that the proposal for multiple small studies conflicts with the conclusions reached by the epidemiology group. There are problems with variability in small studies of therapeutic agents or biomarkers because-even if they measure the same thing-each study is likely to characterize and measure disease in a different way. On the other hand, it would be very valuable if the Initiative evaluated the relative predictive value of multiple biomarkers. Such a task would be difficult to power and would require large numbers to distinguish between r's of 0.5 and 0.4 and r's of 0.4 and 0.3. He suggested that the best way to answer some of these important questions is to do a well-designed, very large study of well-characterized patients at perhaps a limited number of high-quality controlled sites. Another epidemiology session attendee noted that the markers that come into consideration in a larger study need to have fairly well-developed pedigrees, and smaller studies could serve an important role in their development. The elements of such pedigrees include reproducibility and an appreciation of age-related versus disease-related changes. In considering technical assay criteria, it would be helpful to think about what a minimum and maximum pedigree might be. Such pedigrees could also help ensure the robustness of the relationship of the marker to clinically important outcomes, not just biological processes.

An attendee asked whether studies to establish pedigrees would be a part of the consortium or whether the responsibility for bringing a pedigree forward would fall to the principal investigator with the marker. Dr. McGowan replied that this has yet to be determined. Although the possession of a great deal of preexisting work could be a rationale for giving samples to a certain group, how that preexisting work is supported is not an issue that is currently on the table for the OA Initiative. The assumption has been that the process of bringing a marker up to the point of using a large panel of samples is already going on in the private sector and at universities supported by other Federal agencies.

A participant pointed out that another biomarker can be used to follow up on the concepts of robustness and clinical validity. Measures of permeability and indentation can be used to assess the function of the diarthrodial joint and can likely be correlated to mechanical outcome. With microtechnology, these measures can be obtained with minimum invasiveness, probably by simply needling a joint. A great deal of information could be obtained in this way from a very small group of patients. Such a mechanical parameter offers a way to obtain an aggregate number that relates to all of the categories discussed-collagen, proteoglycan or aggrecan efficiency, and so forth-by assessing the function of the articular surface.

In response to a query on whether each marker would be tested in the laboratory in which it originated or by a central laboratory, Dr. Woodworth replied that although there were some dissenting opinions, the group felt that it would be most reliable if the tests were conducted in the laboratories where the samples were evaluated under GLP or GLP-like conditions.

A related question dealt with the proprietary value of existing assays. Has thought been given to developing a mechanism for ensuring that there is no profit making on the use of these assays? Dr. McGowan responded that this is an issue that the administrative group has been working on. The biochemical markers that would arise to be tested have had considerable preliminary work done on them. The problem lies not with the protection of intellectual property but with sharing information with the consortium. The decision to share the results of an existing assay will be at the discretion of the organization possessing the intellectual property, and this could present a difficult stumbling block. Another attendee pointed out that a practical solution to this problem could be to allow tax deductions for the donation of proprietary information.

A participant suggested that initial testing consist of a short test conducted on a set of samples that represent different types of diseases but are not necessarily well characterized. The performance data obtained through this test could serve as a form of qualifying round. The sample volume should not exceed 200 microliters; if that is not possible, the assay will need to be reworked. The suggested assay variability rate of 10 percent would result in a very shaky assay; it would be preferable to use a figure of 5 percent or lower, with 10 percent serving as the far upper limit. Another issue is that the markers are likely to be very sensitive and pick up very early changes before they can be detected by imaging or biomechanical measures; these early stages could be very interesting and should not be overlooked. To avoid excluding persons who are at a very early stage of disease from the cohort, patients could be recruited on the basis of pain or some other criterion and then monitored. It would be very interesting to then correlate the data obtained with mechanical measures.

Dr. Stevens noted that the issue of one central laboratory versus many laboratories is influenced by the finite and extremely valuable nature of the specimens. Dispersing small aliquots across multiple laboratories decreases the efficiency of the use of specimens, whereas having one or two central laboratories conduct all of the testing increases the efficiency of aliquot use. Accordingly, he believes that the OA Initiative should focus on using a very small number of central laboratories.

An attendee pointed out that the recommendation for several small studies versus one large study came out of the open forum on biochemical markers was based on the differences seen in the kinds of surrogate markers discussed and the purposes to which they would be applied, including the validation of potential interventions. It was felt that it was unreasonable to expect a single large study to cover all of the bases involved, both from a historical and interventional point of view.

The comment was made that in the bone field, extremely small studies can show that bisphosphonate reduces the biochemical markers of bone resorption. However, very large studies have yet to demonstrate that reduction in the biochemical marker of bone resorption has anything to do with the clinical outcome, fracture reduction. Hence the demonstration of process effect in very small studies can likely validate OA markers, but large studies will still be needed to assess their relationship to clinical outcomes.

Summary of Open Forum on Imaging

Charles Peterfy, M.D., Ph.D.
Chief Scientific Officer
Executive Vice President
Synarc, Inc.

Dr. Peterfy indicated that his summary would focus on changes to the imaging straw proposal generated by the previous evening's open forum.

The areas of the proposal that remain unchanged are the key questions the imaging component would focus on, the characteristics by which the markers would be compared, and the proposed three-tiered framework. There was considerable discussion about the type of cohort that would be applied to the study. Although the final decision will be directed principally by the input received from the epidemiology group, the imaging group gravitated toward Cohort B, the multicenter clinic or community cohort of OA-diagnosed patients and a control group, with the numbers and the specific imaging conducted to be determined by the questions being posed.

The core technique to be applied in tier 1 was revised to include radiographs of not only both knees but also of both hips (by means of a single pelvic view) and the right hand of all patients. The markers to be assessed would be joint-space width in the knees and hips, scoring of the joint space, and scoring of the osteophyte. All patients would also be imaged with conventional MRI, 1.5 tesla, with a fat-suppressed technique that allows for the quantification of articular cartilage thickness and volume and a second technique that allows for the whole-organ evaluation of all structures and the measurement of synovial fluid volume. This would be accomplished in 1 hour as originally proposed. The intervals for conducting the imaging were changed to baseline, 6 months, 1 year, 3 years, and 5 years.

The techniques for tier 2 did not change drastically, and its focus and radiography is still directed toward optimizing multicenter radiographic techniques by comparing nonfluoroscopic and fluoroscopic mechanisms for assessing joint-space width measurements and scoring. Alignment could potentially be evaluated in this tier as well. MRI of the knee would evaluate the merit of cartilage T2 relaxation measurements at 1.5 and 3 tesla and use high-resolution 3-D MRI to look at trabecular measures and detailed elements of cartilage morphology. Radiographic and MRI imaging of the knee would be conducted every 6 months throughout the 5-year period of the study. The hip MRI would assess total cartilage volume, provide whole-organ scoring, and be conducted at baseline, 6 months, 1 year, 3 years, and 5 years. Total imaging time would not exceed 1 hour.

Dr. Peterfy indicated that there was much discussion about which techniques to use in tier 3. The group agreed to keep imaging of the shoulder, spine, and TMJ in this tier. Radiography with microfocal views should be used to look more closely at subarticular bone texture in the knee, and routine views (AP and oblique shoulder and AP and lateral lumbosacral spine) should be taken of the shoulder and spine. No x-rays would be taken of the TMJ. MRI would be used to assess gadolinium DTPA diffusion into articular cartilage, as an ionic marker of glycosaminoglycan content in cartilage, and synovial gadolinium-DTPA enhancement rate as a marker of synovitis in OA. MRI would also be used for magnetization-transfer imaging, another marker of cartilage collagen content; sodium imaging and water diffusion, two different markers of proteoglycan integrity; and novel pulse sequences and different field strengths, including low field strength and dedicated extremity MRI magnets. The techniques originally proposed for imaging of the hip, shoulder, hand, spine, and TMJ in this tier remained unchanged. Scintigraphy, arthroscopy, optical coherence tomography, and other promising but less established OA imaging techniques would fall within this category, and imaging intervals would be derived from the question being asked. Dr. Peterfy noted that extensive discussion of scintigraphy had taken place during the open forum, and additional materials and reprints on this technique were distributed by Dr. Thomas Budinger.

Several unresolved questions remain. The most important is what the final cohorts will be, an issue that will guide the Initiative's final protocol. How the database will be structured has yet to be resolved, consequently the original proposal stands in terms of image-based content, imaging modality, technical specifications, the findings and the measures of imaging, and the performance metrics for the marker involved. Additionally, the standardization and quality assurance of the imaging and analysis methodologies utilized depend on the scale and scope of the study, which will be driven by the epidemiology

Discussion

A participant asked for the imaging group's perspective on the relative value of twice-a-year measurements versus once-a-year measurements over the course of the study. Although he understood the need for multiple early measurements to establish a solid baseline and frequent measurements in a population with an anticipated high rate of change, he requested that Dr. Peterfy provide a quantitative or semiquantitative rationale for taking measurements twice a year in a population-based study or a population of early OA patients. Dr. Peterfy reported that this sampling strategy-proposed for the knee in the original version of tier 2-targets cartilage T2 relaxation measurements, which are expected to change faster than actual cartilage loss. This rate of change was the rationale behind the suggested twice-a-year sampling interval. Additionally, in the tier 1 core technique there are also other markers captured by routine MRI (e.g., bone marrow edema-like changes and joint effusion volume) that change very rapidly. It would be valuable to learn how rapidly they change, what their clinical correlates are, and how they predict changes in other structural components of the joint, which supports the proposal for conducting the first three imaging measurements at baseline, 6 months, and 1 year, although the 6-month time interval does not necessarily have to be included at the beginning of the study. An alternative approach may be to delay it to the second or third years of the study, when other work conducted outside the Initiative may have rendered additional knowledge about the rate of change of these putatively rapid markers.

In response to a question from Dr. Helms about why the imaging intervals were unequally spaced in tier 1, Dr. Peterfy indicated that the group had gone back and forth on this issue and the current recommendation is based on an effort to obtain measurements over a 5-year period while containing costs.

An attendee asked whether the group had given attention to linking the imaging parameters with the biochemical markers, a concept that appears to be limited to tier 3 studies. Dr. Peterfy noted that the group had not discussed this issue, partly because of the lack of information available at this stage on how such linkages could be accomplished. It is possible that by the time the study actually begins, new information will be available that could be used to modify the structure of the protocol to ensure that it captures such information.

A participant pointed out that an issue that the imaging group did not discuss was the standardization of various types of analyses, e.g., MRI. Regional factors and regional analyses make a great deal of difference. We will have a large body of statistical data, but the biochemical marker data will be either systemic or from the fluid region, and their correlations have not been approached. The standardization of analysis of all of the parameters must be conducted in a GLP laboratory or standardized central laboratory. On the other hand, because these markers are relatively new, there is no gold standard to follow. In addition, the data needs to be available to a large number of people using different analysis schemes, further complicating the correlations. Dr. Peterfy noted that this issue is captured to some extent in the outstanding question category. In addressing it, considerations related to the shared database; database content, access, and structure; and standardized quality assurance and reading and analysis techniques must be taken into account. Dr. Peterfy also indicated that standardization of multicenter image acquisition and centralized data management and image analysis have been treated as a given in this study and were taken into consideration throughout the planning of the protocol.

In response to a query regarding whether he considers image-based and biochemical data to be independent, Dr. Peterfy responded that one distinction between the two types of data is that biochemical specimens can be used up whereas image data can be analyzed by an infinite number of different measurement algorithms. The acquisition of image data is driven to some extent by what the plans for analysis are, perhaps to a greater degree than the collection of biochemical marker specimens. Nevertheless, with imaging data, numerous different scoring methods can be applied and compared on an ongoing basis without having to worry about using up the raw data.

Dr. Poole emphasized the importance of integrating imaging work with biomarker studies, reinforcing the importance of joint-fluid analyses. At the same time, data can be compared with serum and urine analysis to see whether they truly reflect what is being imaged in a specific joint. He also asked that the Initiative seriously consider the important scintigraphy work pioneered by Paul Dieppe that has demonstrated striking relationships between disease progression and scintigraphy-reflected bone turnover. Dr. Poole would like to have scintigraphy considered on a par with other kinds of imaging because of the potential demonstrated by the work already done in this area. Scintigraphy is also considerably less expensive than MRI.

An attendee pointed out that the imaging work conducted-particularly in tier 2 and tier 3-must be integrated with the biochemical marker work at a higher project plan level. This is because a goal of a second- or third-generation surrogate end point for OA would be something that has widespread acceptance, is inexpensive, and is facile and common. If considerable cost is tied up in tier 3 imaging techniques, the selection process implies some prioritization of biochemical approaches, especially when the early imaging work from tier 1 and tier 2 can continue to be accessed. Accordingly, instead of going in costly new imaging directions, it is important to consider the use of scintigraphy and some of the other imaging modalities in tier 3.

A participant indicated that two studies have looked at the correlation of scintigraphy positivity with MRI bone marrow lesions and found that the correlation was very high. He felt it would be useful to conduct a substudy to test that correlation again, although it is likely that this information is already being obtained by means of MRI. Dr. Peterfy noted that this may be captured to some extent in the marker characterization in terms of its validity and responsiveness to change. Either imaging technique may prove more or less responsive and more or less precise, but both validity and responsiveness to change influence how a technique would be used in a particular clinical or scientific situation. Dr. Brandt indicated that an appendage to his group's doxycycline clinical trial utilized baseline scintigrams with internal standards and semiquantitation on approximately 220 subjects. These measures will be correlated with progression as defined radiographically with fluoroscopic positioning techniques. An attendee indicated that scintigraphy is not as reliable an indicator as has been suggested and puts an additional radiation load on the patient. Another participant disagreed and pointed out the value of whole-body scintigraphy, particularly to tier 1 studies.

Dr. Stevens noted that one of the problems encountered by the consortium is that to date each subcommittee has worked in relative isolation. All subcommittee members need to sit down together, negotiate the primary purpose of the trial and what its primary outcomes should be, and integrate all of the necessary components in a way that ensures the trial will be cost effective, robust, and reproducible.

A participant commented that the members of the consortium have not really established what they want to do. The Initiative provides a wonderful opportunity for determining the natural history of OA from the very first molecular events to clinical end points. The consortium needs to identify modes of data collection that will enable researchers to continue to analyze the data in various ways in the future. Sensitive assays should be used in different ways, and scintigraphy, mechanical testing, and so forth should be included if possible.

Summary of Open Forum on Epidemiology

MaryFran Sowers, Ph.D.
Professor
Department of Epidemiology
School of Public Health
University of Michigan

Dr. Sowers reported that participants in the open forum on epidemiology had engaged in two hours of lively discussion on the recommendations contained in the initial straw proposal. The members of the group agreed that the key question to address was as follows:

Should the Initiative be a single study or a combination of several studies that will address the topic of biomarker research in OA, with biomarker defined in its most global sense and not limited to biochemistry or to imaging?

The group also acknowledged that the primary goal of the Initiative should be to design, develop, and support a research project to identify, test, and validate biomarkers as surrogate endpoints for clinical trials in OA. Accordingly, although it is important to understand mechanisms and conduct natural history studies, the ultimate objective is to elucidate disease-modifying approaches. It is within that context that the Initiative should seek to capture information about either mechanisms or natural history.

After considering several potential primary hypotheses outlined in the original straw proposal, the group felt that the Initiative should focus on one, the examination of prognostic markers of outcome. The next step was to identify the measures that should be considered as outcomes. The participants determined that that it would be important to develop prognostic markers for three major types of outcome measures: 1) structure, 2) pain, and 3) function. Of the several measures of structure discussed, those considered to be the most suitable were measures of joint-space narrowing, recognizing the strengths and limitations of these measures and how they are imaged. Although they did not specifically address the question of which pain measure should be used or whether there should be a biochemical marker of pain, the members of the group felt that the ability to describe measures of function and disability was important.

From the several types of studies noted in the initial straw proposal, the participants worked to refine the cohorts perceived as being most likely to be able to identify important information in a relatively efficient fashion.

Proposal 1 is for a clinical cohort. In pursuing this, the Initiative should examine more closely the potential of existing cohorts in helping to define important questions. The group suggested that the existing cohorts that should be deemed eligible to enter this collaboration should be those that target clinically defined knee OA, possess uniform high-quality baseline data that optimally includes MRI scans, and have biologic specimens that are available to the consortium. Where MRI data are available, background information should be gathered on the collection techniques used and other relevant considerations. These existing cohorts should be followed for approximately 4 years, with an interim analysis conducted at 2 years. The cohorts should be powered for size based on joint-space narrowing and, for statistical purposes, a site variable.

The group proposed that a series of inclusionary criteria be established for the evolving biochemical markers and other markers and discussed the need for the following criteria for markers: 1) known coefficients of variation, 2) an understanding of what reproducibility is like under different conditions, and 3) an understanding of the sources of variation that might ultimately be defined as measurement errors, for example, diurnal variation, fasting, and so forth. These criteria are intended to help the consortium select the most appropriate samples for inclusion.

Proposal 2 stems from the group's belief that there is a great need for a population- or community-based study that would potentially be more representative of the general population. The three subpopulations considered important from a prognostic point of view were those that are disease free, those that are at high risk of developing OA, and those who have clinically defined OA. There was a consensus that the study should not deal with end-stage OA in this population. It is important to include the three groups described because they will provide a sense of how to interpret the data on a wide variety of populations.

The forum agreed that it would be more efficient to over-sample high-risk and clinical OA populations. There are a number of data sources that would allow this and enable the Initiative to set up criteria for the sampling schema, giving the consortium the ability to look at prognostic indicators more efficiency and create a better understanding of such issues as whether there is such a thing as "disease-free" and what the background noise is across the population being considered. The study cohort should be followed for 5 years with the understanding that, for some of these characteristics, there might be appropriate time extensions. The participants did not discuss how such a study would be followed over time and whether there would be minimal examinations. The group was acutely aware of the need to potentially power on the ability to examine outcomes related to the knee while including other joints in order to characterize the burden.

The suggested plan for proposal 2 considers that there is a role for simultaneously implementing the clinical and the population- or community-based studies. It also reflects the belief that the study of already-recruited clinical populations for whom there is at least minimal baseline data would make an important contribution to understanding how the consortium should determine the measures for the newly formed cohort. It is important to look at the stage of disease because certain markers may perform differently, depending on the stage involved. Exactly how stage will be defined, particularly with respect to outcome measures, is somewhat nebulous; however, the group recommended focusing on mild to moderate OA. The participants also felt that an alternative definition of OA might arise from this effort. The plan for proposal 2 would provide the Initiative with an opportunity to look at state-of-the art imaging techniques.

A number of issues were not resolved or full developed by the members of the group. There was concern that the proposals focus primarily on knee OA and therefore give the impression that hip OA is not important. The group as a whole would like to seek a way to capture information about hip OA efficiently, but the mechanisms and approaches for doing so are not yet well developed. The question also arose of whether the contributors to the study should be limited to those located in North America. No one wanted to throw away a data resource or access to information, but there are logistical and potentially IRB issues that the group felt should be handled at an administrative level. Members of the group urged that the Initiative consider looking beyond the studies listed in the white paper. For example, are there additional clinical studies that could be accessed efficiently that would help contribute to proposal 1, control groups in clinical trials, and so forth? There was particular interest in determining if there were groups whose measures have included MRI and the construct and context under which the scans were included. The participants also did not discuss DNA and genetic studies, issues that the Initiative should consider as it moves forward.

Discussion

One participant supported the concept of pursuing opportunities to assess current large-scale hip OA studies, noting that they were done quite well and would provide valuable data. He added that he himself and others involved in the open forum suspect that there are many other studies from industry that could provide longitudinal imaging and specimen data on well-characterized patients that would be very helpful if the organizations involved were willing to offer them to the Initiative.

Another participant agreed with the previous speaker, saying that he had seen a number of such studies and that even the negative control data would be helpful to the consortium. He then noted that there are two dimensions to the issue of generalizability in the Initiative's database. One is generalizability across patients-from asymptomatic to end-stage patients-and the second is generalizability to other joints of the body. One possibility would be to have investigators conduct careful examinations and make clinical judgments as to where OA appears to exist in other sites of the body, following the patients who do have OA in other sites more intensively in order to assess those additional sites. He also remarked that the function outcome encompasses a very long spectrum that should be followed all the way to whatever the ultimate outcome is determined to be.

Dr. McGowan pointed out that NIH cannot require that NIH-funded organizations provide the Initiative with information, samples, and images from their cohorts. However, NIH can facilitate the work of a public-private coalition of investigators interested in contributing to the Initiative. She was interested in knowing whether attendees who were the principal investigators of such cohorts saw this effort as being one that should include private partners, many of which have indicated their interest in making much more than a financial contribution to the Initiative.

Dr. Sowers reported that the consensus of the open forum was that there is no single entity that holds all of the cards, and great efficiencies and more rapid progress could be made if the Initiative used the information generated by existing clinical trial cohorts, particularly control groups. The consortium could potentially set guidelines and criteria that the people involved could examine prior to determining what their contribution might be. Issues regarding the sensitivity of the contribution require further discussion.

With respect to the population- or community-based study, an attendee asked if the Initiative was ready to decide how many patients would be needed to fill the three subgroups described in the presentation. If not, what additional information needs to be gathered before that decision can be made? Dr. Sowers indicated that although the Initiative was not yet ready to make this decision, she believes that information on powering the study is available. What is lacking is an appreciation of how broad the domain is and how far the net should be cast, and it would be helpful if members of the consortium would provide input on this issue.

A participant noted that representatives of managed care organizations, which are conducting some of the largest clinical trials ever accumulated, were not in attendance. The opportunity to properly manage their OA patients over a 10- to 15-year period could provide these organizations with an incentive for sharing their data with the consortium. In addition, managed care organizations, health cooperatives, unions, and similar organizations would probably be very interested in collaborating on a prospective study in which they would standardize some of their standard care practices over the next 4 to 5 years. Dr. Katz responded that NIH has had extensive interaction with managed care organizations on such issues as including patients from their clinical studies in NIH efforts, and the subject has turned out to be a very complicated one. Although many managed care organizations do these types of studies, access to their information is not always easy to get; in his view, few of them were interested in participating in this way. The attendee indicated that he felt that asking managed care organizations, unions, or collaboratives to contribute clinical data that is not necessarily public, perhaps just their normal cohorts, is no different than asking pharmaceutical companies to do so.

Dr. Sowers noted that there are a number of open questions that need to be resolved, and part of the response might depend on the type of data sought by the Initiative and whether the types of data and the collection techniques could be defined in a way that these organizations would find acceptable. A participant pointed out that the consortium was already quite complicated and that efforts to bring in additional organizations would greatly increase the amount of time required to make fundamental decisions. He suggested that the consortium move forward with the Initiative and, after it has been put into a form that is acceptable to the majority, seek to include other groups that could contribute to the research resource.

An attendee noted that the main weakness of clinical trials conducted by drug companies is their relatively short duration. He suggested that a follow-up study take advantage of the information collected by the Initiative over 1 or 2 years and then continue to follow these patients at 4, 5, and 10 years.

Would the population-based study be conducted solely in the United States or in other countries as well? Dr. McGowan responded that there was no impediment to the OA Initiative being an international study. Responses to the request for proposals can be submitted by any organization that can meet the requirements for access to participants, show it can conduct the necessary protocols, and ensure quality control.

Summary of Open Forum on Administrative Structure and Consensus

Gregory Downing, D.O., Ph.D.
Health Science Policy Advisor
Office of Science Policy
Office of the Director
National Institutes of Health

Dr. Downing pointed out that in considering issues related to administrative structure, it is important to focus on the overarching goal of the OA Initiative: to establish the research resources needed to examine the progressive development of OA in a longitudinal, prospective study. In addition, the administrative structure should be responsive to the specific aims of the Initiative, which are:

To identify markers of OA disease that can be used to monitor disease progression and response to therapy and will become acceptable as registrable endpoints in clinical studies evaluating disease-modifying agents;
To facilitate more efficient and effective clinical trials and a better understanding of the pathological mechanisms involved in the development and progression of OA;
To initiate a new paradigm in which registrable clinical endpoints are established in noninterventional studies; and
To establish a management framework for similar public/private partnerships targeting other diseases.

The Administration Subcommittee will study all of the information it has gathered over the past year and will draft and post a report on the Web site that incorporates these and other key issues identified by the subcommittee chairs. This document will be the mechanism for obtaining initial feedback from potential sponsors and the research community. What is fundamental to this effort is the development and dissemination of the scope of the project. The request for proposal and the project plan will be also be posted and made available for public comment.

Although the OA Initiative Steering Group will be dissolved in the near future and cease to participate in the specifics of the planning process, representatives of Initiative sponsors and ad-hoc members from interested communities will have an opportunity to provide input through the OA Initiative Public-Private Consortium. Dr. Downing envisions the formation of a variety of consortium subcommittees to deal with the specifics of several complicated issues that are expected to arise, such as the stratification of samples and publication rights. It is technically impractical to clarify all such questions at the outset, and the consortium and its subcommittees will serve as on ongoing forum through which to address these types of issues.

Dr. Downing concluded by pointing out that all of those involved in the consortium should keep the big picture-and the potential it holds-in mind, consider the perspectives of the other parties involved, and remember that the consortium is an inclusive entity to be viewed as being in everyone's interest.

Steven Stimpson, Ph.D.
Exploratory Discovery Head
Musculoskeletal Diseases
GlaxoWellcome, Inc.

Dr. Stimpson indicated that potential sponsors of the OA Initiative will need information on: 1) the Initiative's overall goal, timeframe, cost, and organization; 2) the key elements of the research plan, including access to data; 3) next steps (such as RFPs and further contracts); and 4) how and when major unresolved issues, such as those related to intellectual property rights, will be addressed.

The following diagram, which was developed collaboratively by several meeting presenters, subcommittee chairs, and NIH staff members, illustrates the proposed project structure of the OA Initiative:

The principles behind each of these efforts are widespread acceptance by relevant audiences (including regulatory agencies), availability, and access.

Discussion

An attendee asked for clarification on what will happen between now and when the responses to the RFP are received. What will the RFP contain? Should the proposals simply consist of executional information about how parts of the project will be implemented and at what cost or will they be expected to feed variations on the study protocol? Dr. McGowan replied that NIH cannot issue an RFP until funding has been secured. While NIH staff complete the tasks that proceed the issuance of an RFP, the summary of events at this meeting will be posted on the Web site, allowing all parties with an interest in the Initiative to provide feedback. The RFP will contain a clear explanation of the evaluation criteria on which the applications will be judged. The participant also asked if, upon the dissolution of the Steering Committee, NIH staff members would be the ones who would receive the feedback and modify the project. Dr. Stimpson responded that the solicitation would be drafted by NIH staff and made available for comment. Dr. McGowan pointed out that although RFPs by Federal agencies cannot be written by someone outside the Government, the preparation of NIH RFPs is usually informed by outside expert opinion. Activities related to the development and planning of the RFP will continue to seek out and incorporate such expert input prior to the release of the formal solicitation.

In response to a question about how responses to the RFP will be reviewed, Dr. McGowan explained that during the NIH peer review process every effort is made to be fair and unbiased. The process requires full and open competition, and potential applicants cannot be given information on what will be put in a specific solicitation ahead of time. Proposals are assessed and scored by Federal and private sector reviewers who are knowledgeable about, but have no conflict of interest with, the proposal project. A proposal submitted to the NIH is confidential, but there are precedents for getting an investigator's permission to share it with another organization, for example, one interested in potentially funding an unawarded proposal. Although the NIH cannot transfer the information, an investigator can choose to submit the proposal to the other interested group. Dr. Katz added that NIH has well-established systems for obtaining input from outside experts on issues related to contracts that do not preclude these experts from competing for associated contracts.

A participant pointed out that the measurement of genetic markers should be considered as part of the study design. For proposal 1 (existing cohorts) sibling pair cohorts exist that could be studied to help answer the questions being pursued. With respect to the new cohorts in proposal 2, DNA could be obtained not only from those in the cohort but also from first-degree relatives. This would not be particularly expensive and would not necessarily involve other phenotypes; the DNA alone should prove quite helpful to the Initiative's effort to test markers. Dr. Katz noted that he and others at the NIH have already been involved in many discussions regarding the importance of the use of genetic markers and will continue to consider this issue as the Initiative moves forward.

In response to a question about including sponsors of the Initiative in the group that will review submitted proposals, Dr. McGowan indicated that the NIH must adhere to its principle of having no one with a potential conflict of interest involved in the review. Even the NIH program person who will be responsible for the funded work is precluded from influencing the award process. The review group will not be an existing standing NIH review group, it will be an ad-hoc group whose members will be selected after proposals have been submitted to ensure that no one in the group has any connection to the proposals involved. Reviewers will be selected who have expertise in cohort development, biomarkers, imaging, biostatistics, coordinating center management, and data management.

An attendee asked how dissemination of the data compiled by the Initiative would be handled. Will those who made contributions to the Initiative have access to raw data? Who owns the data? Will those with funded proposals be limited in their ability to analyze the data? Dr. Stimpson responded that the Web-based database would be available to everyone. Another participant noted that it was his understanding that the sponsors would have the advantage of quick access to the full spectrum of data acquired and the ability to manipulate the entire database. Dr. McGowan indicated that the study investigators will also be critically involved in analyzing the data. The NIH can be expected to form a subcommittee composed of representatives of the Initiative's public- and private-sector partners that will look into questions related to publications, presentations, and requests for data. The NIH is committed to making the database publicly available as soon as possible. However, the data will first have to be cleaned up-and probably synthesized to some extent-before it is made public.

A participant asked how the consortium will handle the issue of access and rights to assays generated by the Initiative. Ms. Barbara McGarey of the NIH Office of Science Policy and Technology Transfer noted that the biologic materials, radiologic images, and so forth obtained through the contracts funded by the consortium will be in the public domain and will be made available to all interested parties. However, the possibility of allowing Initiative sponsors earlier access to the information within the database does exist. With respect to other studies, such as validation studies on markers that may have intellectual property protections, the private sector members of the consortium could either fund such studies or enter into third-party agreements with those owning the intellectual property. The private sector organizations could then cross-license each other or develop some other intellectual property arrangement. The NIH cannot broker these arrangements because of the limitations imposed by the Bayh-Dole Act and the fact that the property issue is separate from the development of the resource itself.

Next to be discussed was the question of what would happen if, after the Initiative collects the biological samples, a new assay comes along. Dr. Downing indicated that the assay could be brought to the consortium and the individual organizations would be able to decide whether or not to fund a validation of that assay. The funding organizations would then collaboratively determine whether the assay would be made available to others and by what means. A participant pointed out that what the Initiative is funding is the initial collection of data, the development of the database, and the collection of biological samples. The questions of who will fund new assays and who will have access to them will have to be negotiated separately.

This participant noted that another concern is that the samples will be a limited resource. Those contracted to study a current or future biomarker who use consortium materials to quantitate, validate, and explore what the biomarker does should be contractually obligated to allow access to their laboratory and methodology. A real problem would arise if a biomarker was found to be of great utility but was not available to the public or private sector because the sample had already been used. Another attendee expressed concern about the potential for greed, for example, a drug company purchasing an exclusive license that cuts other organizations out. Ms. McGarey indicated that the decision of who will able to use these finite public resources would be made on the basis of scientific merit through a mechanism that has yet to be determined. However, she did not anticipate that this would require anyone to relinquish their intellectual property rights. Dr. Stimpson noted that it would be helpful if a graphic representation of how the NIH plans to handle potential scenarios of this type could be developed and shared with the members of the Initiative.

Ms. Annette Levy of the NIH Office of the General Counsel pointed out that her office has discussed the general conditions for accessing the repository and the samples collected by the Initiative. What has not been addressed, but will be, is the question of fixed fees or brokering access to what are essentially licenses, which is new territory for the NIH.

An attendee commented that interactions between academia and industry have been successfully promoted in Canada over the last 15 years by a program in which grant applications from academic researchers are jointly funded by industry and the Government. Although all of the information generated is eventually put into the public domain, the advantage to industry is earlier access to the information and the ability to build collaborative relationships with academia that can be applied to other projects. The Canadian Arthritis Network, for example, funds research contract agreements between industry, academia, and the Government that partner project groups of up to eight principal investigators with one company. Intellectual property created under the contracts is appropriately protected and has never posed a problem. Dr. McGowan responded that in the United States many academic institutions have sponsored research agreements with industry partners, and the NIH has CRADAS. However, the Initiative seeks the participation of all three entities. Although the NIH would like to see industry sponsors collaborate with the academic side of the consortium, it is not in a position to play a directive role in this process.

Summary of Straw Proposals and Discussion: Clarification of Research Questions

Stefan Lohmander, M.D., Ph.D.
Professor
Department of Orthopedics
Lund University Hospital
University of Lund

Dr. Lohmander noted that the studies conducted in support of the OA Initiative should be hypothesis driven. The Initiative's primary hypothesis or goal is to test and validate biomarkers as surrogate end points for structure- or disease-modifying clinical trials in OA in order to make them faster, less expensive, and easy to perform, with the end result being the expeditious development of new and more effective OA therapies. While designing studies and cohorts, focus must be maintained on this central hypothesis.

With respect to study design, the following represents Dr. Lohmander's synthesis of the general consensus of the meeting participants.

Epidemiology Proposal 1 utilizes existing cohorts of clinically defined knee OA that:

Have uniform, high-quality baseline data.
Have biologic specimens that are available to the consortium.
Optimally have baseline data that include MRI scans.

These cohorts, which may be of variable size and recruited in different centers, should be followed for 4 years with an interim analysis at 2 years. A purpose of these cohorts would be to serve as 'test-beds' for the initial testing of the utility and power of proposed imaging protocols, prioritizing among molecular markers, etc. Experience gathered from these studies should be used to optimize the use of the finite and valuable resource created by the population-based cohorts in proposal 2 (below).

Epidemiology Proposal 2 consists of population-based cohorts that include disease-free individuals and an oversampling of persons in high-risk and clinical OA groups. The duration of the core study would be 5 years and optimally extended to 10 years. Although the data analysis would be powered on knee joint-space narrowing, data on other joints should also be collected to characterize the overall burden of OA and its impact on systemic markers.

The outcomes that need to be considered at baseline and at followup are structure, pain, and function. The patient-related outcome measures to be collected would include specific joint outcomes, more general OA outcomes, and health-related quality-of-life outcomes. This would be consistent with the OARSI guidelines for clinical trial data collection.

Dr. Lohmander indicated that for the purposes of this discussion, the term biomarker is meant in the widest generic sense and incorporates molecular biomarkers and most imaging. The types of biomarkers that have been discussed over the course of the meeting as being of interest are those that are diagnostic, severity, or prognostic surrogates; predictors of response to treatment; or surrogates to monitor outcomes. As the last two biomarkers listed require access to an as-yet-unavailable disease-modifying intervention, the Initiative is not in a position to design a study to validate them.

Most important to Dr. Lohmander are the prognostic biomarkers, the validation of which would form the hypothesis base for designing the epidemiology cohorts and for planning data collection efforts. For example, the validation of a prognostic biomarker would allow for the selection of patients for a Phase II study seeking proof of concept for new compounds being developed to protect joints from further destruction by OA. Validation of prognostic biomarkers would also serve to highlight biomarkers that could serve as outcomes in future trials that target the particular process involved. If and when disease-modifying treatment becomes available, prognostic biomarkers could be used to select patients who would most benefit from treatment, e.g., because they are at high risk for disease progression.

The objectives of a study targeting prognostic biomarkers would be to establish the structural and molecular determinants of the clinical outcomes of OA and integrate imaging and molecular markers for OA. The core techniques for tier one should be x-rays of both knees, both hips, and one hand and conventional MRI at 1.5 tesla of both knees. Dr. Lohmander pointed out that although a number of options have been proposed and discussed, there is currently no clear consensus on what the imaging intervals should be. The study should include tier one and tier two techniques, allowing for the introduction of new markers as the study progresses. The study should also identify criteria for imaging and molecular marker evaluation that consider all current data; entertain new proposals as appropriate; and ensure that the process is inclusive, progressive, and iterative, allowing for the later introduction of new technologies as appropriate.

The next steps of the Initiative process are to:

Finalize study designs.
- Determine cohort definition and numbers.
- Specify the initial imaging protocols and schedule.
- Specify what biological fluids will be collected, at what times, and in what amounts.
- Establish performance criteria and define the criteria or mechanisms by which new biomarkers can be considered.
- Resolve intellectual property issues.
Draft the RFP.
Obtain commitment from partners.
Release the RFP.
Finalize the Initiative's administrative structure.
Award contracts.
Do the work.

Dr. Lohmander concluded by noting how much progress has been made over the past 9 months of work on the Initiative, how remarkable the level of interaction has been between the members of the subcommittees and all of the meeting participants, and that the Initiative is an effort that can and should be pursued.

Next Steps

Stephen Katz, M.D., Ph.D.
Director National Institute of Arthritis and Musculoskeletal and Skin Diseases
National Institutes of Health

Speaking for himself and Dr. Hodes, Dr. Katz thanked everyone attending the meeting for their hard work. He also thanked the NIH Foundation for its support in facilitating this effort.

Dr. Katz noted that the meeting has given the participants a good chance to openly discuss-from many perspectives-all of the opportunities and issues associated with the OA Initiative. Although there are still many knowledge gaps, the challenge is to move forward and develop an Initiative that collects information in the most stringent way possible, drives the continued scientific development of this area, and is implemented within a framework that is capable of adapting to future changes.

It is now the NIH's responsibility to put together a draft OA Initiative prospectus and post it on the Web site for comment. The NIAMS and NIA are clearly committed to contributing-financially and otherwise-to this endeavor, and the Initiative is approaching the point at which budgets will be estimated and commitments from other organizations sought. Following the finalization of the prospectus, a draft RFP will be developed and posted on the Web site, the input received will be assessed, and the final RFP will be released by fall 2000.

Dr. Katz then opened the floor for questions.

General Discussion

The following key points were raised during the final discussion:

Prognostic Biomarkers

The differences between prognostic markers should be considered in the study design, and they will affect how the sample size is calculated. There are essentially two types of prognostic biomarkers. One is a marker measured at baseline that predicts a change in a person over time, which is valuable from a drug development perspective because it offers the opportunity to identify people at high risk of progression. The second might better be called a surrogate correlate of progression. In this case, disease is measured at baseline and followup and the marker is measured at baseline and followup. A strong correlation between the change in the marker and the change in the chosen gold standard measure of disease allows a potential therapeutic option to be tested against the marker to determine if the treatment affects disease. An additional strategy to consider is to check the correlation between the changes in the marker and the outcome after 3 or 4 years but, in the interim, to look at the changes that occur in the study marker over a shorter period of time-say, 6 months or 1 year.

Retention of Study Subjects

Subject retention over the period of a long trial is an important issue that should be addressed early in the planning process. There are lessons to be learned from long-term projects, such as the Framingham study and the Women's Health Initiative, and from the clinical trial experiences of the Initiative's private-sector partners, that should be applied to the Initiative's subject recruitment and retention strategy. The public will need to be educated about the importance of the Initiative, why it is critical that those who enroll stay in the trial, and the benefits to be gained from doing so. Serious consideration also has to be given to how frequently the subjects are to be seen and what level of burden can be expected of them at each visit, particularly when it can be expected that, now and in the future, researchers will want to add an infinite number of additional procedures to the study. For example, two measures that have yet to be discussed that would contribute valuable information in a relatively cost-effective way are economic measures and the ongoing collection of seminal comorbidity data.

It was noted that the imaging group had discussed the issue of subject retention at some length, and one of the benefits of the structured tier approach is that the core group will not have to go through all of the procedures. Effective training of individual study site study coordinators and physicians is also crucial to subject retention, and the demonstrated ability to retain participants should be factored into the selection of study sites.

Disease-Modifying Drugs

There is a "Catch-22" dimension to the Initiative in that even if a marker can be validated as being prognostic it cannot be validated with any certainty as a surrogate outcome measure in a disease-modifying clinical trial until a disease-modifying drug becomes available.

Study Cohorts

For the population-based sample that includes patients with OA, there are a number of ways to identify the OA patient cohort-e.g., by physically characterizing or screening the population, by taking individuals who indicate that their health care provider has told them that they have OA, or by ensuring that a goodly number of people in the cohort report chronic knee pain. The latter category should generate many people who may not yet have OA on radiograph but are at very high risk of developing OA. However, efforts should be made to screen out persons with knee pain caused by periarticular factors. The ultimate goal is ensure that early and moderate OA are well represented in the cohort.

The ability to draw generalizable conclusions from a population-based study requires the inclusion and monitoring of individuals without disease. One option is to oversample groups at high risk, sample persons who are not at high risk, and then compute what a generalizable sample would be. This would result in an enriched sample that contains all the information that is really needed. Regardless of what technique is used, it is important to identify the noise in the system and answer the natural history question. One reason to start with cohort A (patients) is that people already enrolled in ongoing longitudinal studies would presumably provide information sooner than the people who will be recruited for cohort B (nonpatients). If the results of the Initiative are to be generalizable to people with OA who access the health care system, such people must be included in the study.

Data Collection Intervals

There is a difference of opinion between the three groups with respect to how often the patients should be seen. The epidemiology group would like patients to be seen at 5 and 10 years, the biomarkers group talked about seeing them every quarter, and the imaging group could not decide, perhaps because of the mixing of A- and B-type studies.

Studies targeting markers of long-term prognosis require long-term followup, and over the course of followup there could be large variations in the markers that are irrelevant to their ability to serve as a prognostic indicator. It therefore makes sense to not see patients very frequently but ensure that-when patients are seen-good outcome assessments are made. If the goal is to use the marker as an indicator of disease pathology, the pathology may change on a much briefer time scale than overall disease progression, the pathology may not be uniform, or various pathologies may be undergoing different changes within the joint. In order to capture these factors, the biomarkers must be measured at least as frequently as the variations being sought.

The ability to pick likely winners from potential candidate drugs requires a biomarker that indicates disease pathology, the investigation of which represents a third group of studies that fall within an intermediate area between prognostic indicators and endpoint surrogates. The frequency with which patients are seen and the degree of detail the investigation pursues needs to be fitted to this group. For example, the group may undergo more frequent biomarker measurements but less frequent outcome measurement because the focus is on identifying the shorter-term biomarker variation. Accordingly, it is important that study proposals be clear with respect to which of these areas they are targeting and which types of assessment should be applied.

If the goal is to integrate an imaging marker with a biochemical marker, it is desirable to assess them at the same time. There should be at least three measures and, to fit slopes for rates of changes, assessments should be set at certain intervals.

OA Progression

The issues that have been discussed suggest that OA progression is not a linear phenomenon but a phasic process. If so, this would have an impact on the scheduling of future treatments (if and when there is a treatment), and specific ancillary studies or substudies should be conducted to explore the contribution of biomarkers to the phasic process.

Sponsorship and Funding

The people who are considering sponsoring the Initiative need to know, from a utility perspective, exactly what they will have access to. It would not be useful to receive information on a new marker and its potential applicability to clinical trials if in the end there is no public access to the marker. Companies participating in the Initiative need to know that a mechanism is in place that will allow access to the required expertise. As consideration continues to be given to the Initiative's legal framework, NIH would like the companies to provide additional feedback regarding their needs.

The OA Initiative will fund initial data collection, imaging data, sample collection, x-ray data, and clinical data on the selected cohort. Any further work would have to be done outside of the consortium because of IP issues.

Intellectual Property (IP) Issues

Although NIH and private sector firms have been involved in discussions of consortium-related IP concerns, there is much that would appear to fall outside what could be perceived as obvious IP issues. Although there are no IP concerns associated with basic clinical information and core imaging data, it may take some time for other IP issues to come to light. It would be desirable to structure the Initiative in a way that allows for an initial commitment that is contingent on the resolution of key IP issues.

NIH does not hold any IP, but expects that IP will be developed as a consequence of the Initiative. Although dialogue on the issues surrounding this area should continue, the emphasis should be on developing a resource that will favor the development of new IP. The Initiative cannot develop mechanisms that specifically limit the way IP is currently licensed and delivered.

Administrative Structure

It was suggested that a working group be assembled to resolve some of the issues identified during the meeting regarding the administrative structure of the OA Initiative. The administration group will work with the NIH to prepare a document that outlines the opportunities involved and the proposed structure of the Initiative that will serve as a basis for further discussion.

Adjournment

Gayle Lester, Osteoarthritis Initiative Project Coordinator, NIAMS, thanked all of the participants for an excellent meeting, noting that the discussions and interactions that had occurred over the course of the past 2 days had been both enlightening and challenging.