The Prostate Cancer Dilemma: Selecting Patients...
Current prostate cancer prognostic models are based on pre-treatment prostate specific antigen (PSA) levels, biopsy Gleason score, and clinical staging but in practice are inadequate to accurately predict disease progression. Hence, we sought to develop a molecular panel for prostate cancer progression by reasoning that molecular profiles might further improve current clinical models.
The Prostate Cancer Dilemma: Selecting Patients...
We analyzed a Swedish Watchful Waiting cohort with up to 30 years of clinical follow up using a novel method for gene expression profiling. This cDNA-mediated annealing, selection, ligation, and extension (DASL) method enabled the use of formalin-fixed paraffin-embedded transurethral resection of prostate (TURP) samples taken at the time of the initial diagnosis. We determined the expression profiles of 6100 genes for 281 men divided in two extreme groups: men who died of prostate cancer and men who survived more than 10 years without metastases (lethals and indolents, respectively). Several statistical and machine learning models using clinical and molecular features were evaluated for their ability to distinguish lethal from indolent cases.
Surprisingly, none of the predictive models using molecular profiles significantly improved over models using clinical variables only. Additional computational analysis confirmed that molecular heterogeneity within both the lethal and indolent classes is widespread in prostate cancer as compared to other types of tumors.
The determination of the molecularly dominant tumor nodule may be limited by sampling at time of initial diagnosis, may not be present at time of initial diagnosis, or may occur as the disease progresses making the development of molecular biomarkers for prostate cancer progression challenging.
The paramount clinical dilemma in prostate cancer management is how to treat the man with clinically localized disease because the natural history is favorable overall  and the benefit from radical treatment modest . Numerous studies have attempted to address this issue but the lack of data with long-term clinical outcomes precludes a definitive assessment. This problem is real and mounting. In 2008, it was estimated that 186,320 new cases of prostate cancer were diagnosed in the United States with the vast majority being clinically localized . The majority of these men are predicted to survive despite prostate cancer for 5 or 10 years regardless of the type of treatment they initially receive . This would suggest that expectant management for localized prostate cancer might be an important modality to deal with this common malignancy. This approach would potentially gain more widespread acceptance if we could sort out those men that were at the greatest risk of disease progression at time of initial diagnosis.
We reasoned that by performing high-throughput expression profiling of transurethral resection of the prostate (TURP) samples from a large cohort of men on a Watchful Waiting cohort, we would identify a molecular profile predictive of prostate cancer disease progression. We further reasoned that employing a combination of novel technology and a well-defined clinical cohort should yield a strong lethal prostate cancer signature.
Limitations of prior prostate cancer expression profiling studies have included small sample size, restriction of populations to surgical cohorts, short follow up time, and the use of surrogate endpoints such as PSA biochemical recurrence to define disease progression. To overcome these limitations, we designed a study using prostate cancer samples prospectively registered as part of a Watchful Waiting protocol from two regions in Sweden. Up to 30 years of clinical follow up information was available on these men. All of the cases were detected incidentally in a pre-Prostate Specific Antigen (PSA) screening era.
The present study is nested in a cohort of men with localized prostate cancer diagnosed in the Örebro (1977 to 1994) and South East (1987 to 1999) Health Care Regions of Sweden. Eligible patients were identified through population-based prostate cancer quality databases maintained in these regions (described in Johansson et al., Aus et al., and Andren et al. [1, 8, 9]) and included men who were diagnosed with incidental prostate cancer through (TURP) or adenoma enucleation, i.e. stage T1a-b tumors. In accordance with standard treatment protocols at the time, patients with early stage/localized prostate cancer were followed expectantly ("watchful waiting"). No PSA screening programs were in place at the time.
The study cohort was followed for cancer-specific and all cause mortality until March 1, 2006 through record linkages to the essentially complete Swedish Death Register, which provided date of death or migration. Information on causes of death was obtained through a complete review of medical records by a study end-point committee. Deaths were classified as cancer-specific when prostate cancer was the primary cause of death.
Since our overarching aim was to identify signatures predicting a lethal or an indolent course of prostate cancer, we maximized efficiency by devising a study design that included men who either died from prostate cancer during follow up (lethal prostate cancer cases) or who survived at least 10 years after their diagnosis (men with indolent prostate cancer). We thus excluded men with non-informative outcomes, namely those who died from other causes within ten years of their prostate cancer diagnosis or had been followed for less than 10 years with no disease progression (n = 595). All men with samples in which high-density tumor regions (defined as more than 90% tumor cells) could be identified were included (n = 381). We excluded from the indolent group men who had received any type of androgen deprivation treatment during follow up (n = 79), since some of these had potentially lethal disease that was deferred by therapy. Twenty-one men were further excluded due to poor sample quality. In total, 281 men (116 with indolent disease and 165 with lethal prostate cancer) were included in the analyses (see Figure 1). The study design was approved by the Ethical Review Boards in Örebro and Linköping. The clinical and pathologic demographics of these of 281 men with prostate cancer are presented in Additional File 1, Table S1.
Study design. From 1256 men of a Watchful Waiting Cohort, we selected the "Extreme" cases: those who died of prostate cancer or men who lived more than 10 years without signs of progression. We also filtered out some patients based on tumor tissue availability, sample quality or because they were treated. Finally, we randomly divided the patients in a Learning and Validation sets, ensuring that similar proportions of lethals and indolents are present in the two groups.
An array of 6100 genes (6K DASL) was designed for the discovery of molecular signatures relevant to prostate cancer by using four complementary DNA (cDNA)-mediated annealing, selection, ligation, and extension (DASL) assay panels (DAPs) [11, 12]. Details of this procedure can be found in Additional File 1 and also at Gene Expression Omnibus (GEO: ) with platform accession number: GPL5474. This data set is also available at GEO with accession number: GSE16560.
In order to identify and evaluate a predictive molecular signature, six supervised classification models were implemented: k-Nearest Neighbor (kNN) , Nearest Template Prediction (NTP) , Diagonal Linear Discriminant Analysis (DLDA), Support Vector Machine (SVM), Neural Network (NN), and Logistic Regression (LR). Their performances were evaluated and compared through a split-sample validation procedure. Specifically, the entire data set was randomly split into a Learning and a Validation sets, with approximately equal proportion of men with lethal and indolent prostate cancer (Figure 1). The Learning set is utilized to create the models and select the best classifier, whose performance is evaluated on the Validation set by means of the Area under the Receiving Operating Curve (AUC). This procedure enables the unbiased estimation of the performance of a classifier since the evaluation is performed on an independent data set . To optimize the classifiers and select the best model, we adopted an iterative cross-validation procedure within the Learning set. The rationale is that results of this procedure enable the identification of the best model which is then used to build a classifier (using the whole Learning set) that is finally evaluated on the Validation set. Specifically, a stratified 10-fold cross-validation split the Learning set in 10 disjoint partitions, test i (i = 1..10), with approximately equal proportion of lethal and indolent cases each. Given a partition test i , classifiers were created using the cases not in that partition, i.e. training i , and evaluated on test i . This procedure was repeated 10 times and the final results are averaged across the 10 iterations. Moreover, to avoid potential biases in the selection of the 10 partitions, the entire procedure was repeated 100 times resulting in 1000 different partitions. The best model was then identified by comparing the results obtained on the 100 iterations.
We explored for biological heterogeneity (and its converse, homogeneity) in this prostate cancer data set and compared our findings with other tumor tissues. We defined heterogeneity in terms of the molecular signature by evaluating the "distance" between patients belonging to the same group, e.g. lethals, to that of patients belonging to different groups, e.g. indolents. Clearly, in homogeneous tissues, biopsy sampling is not an issue and patients belonging to the same group should be molecularly "closer" to each other than to those belonging to different groups. On the other hand, heterogeneous tissues should not show a clear separation as the molecular profiles of samples in both groups intermingle (Figure 2b - left panel). 041b061a72