| |
Section II
Methods
Hybridization of microarrays
Hybridization solutions from our previous U95A study had been
stored at -80oC since their initial use. These solutions were thawed
at 45oC, then microcentrifuged for 2 minutes to remove any insoluble
material from the mixture. The hybridization solutions were added to
U133A chips and allowed to hybridize for 16 hours at 45oC. At the end
of the incubation period, the hybridization solution was removed from
each U133A chip and refrozen. Subsequently, the hybridizations were thawed
and hybridized to the U133B chip.
A non-stringent wash buffer (6X SSPE, 0.01% Tween 20) was added to each
chip cassette after the hybridization solution was removed and the cassette
allowed to equilibrate to room temperature. The microarray cassettes
were then placed on the fluidics station and the antibody amplification
protocol performed. The arrays were washed at 25oC with the non-stringent
buffer followed by a more stringent wash at 50oC with 100 mM MES, 0.1M
NaCl2, 0.01% Tween 20. The arrays were then stained with Streptavidin
Phycoerythrin (SAPE, Molecular Probes, Eugene, OR) for 10 minutes at
25oC. Following another non-stringent wash, the arrays were hybridized
for 10 minutes at 25oC with an antibody solution (100 mM MES, 1 M [Na+],
0.05% Tween 20, 2 mg/ml BSA, 0.1 mg/ml goat IgG, and 3 mg/ml biotinylated
antibody). This solution was removed and the cassettes restained with
the SAPE solution.
Arrays were scanned on a laser confocal scanner (Agilent, Palo Alto,
CA) and then analyzed with Affymetrix Microarray Suite 5.0 (MAS 5.0).
Detection values (present, marginal or absent) were determined by default
parameters, and signal values were scaled by global methods to a target
value of 500. After completing the scans, the arrays were visually inspected
for defects and Affymetrix internal controls were utilized to monitor
the success of hybridization, washing, and staining procedures.
Statistical methods
The chi-square metric and the k-NN and ANN supervised learning algorithms
have been previously described. For more information see http://www.stjuderesearch.org/data/ALL1/.
The SVM supervised learning algorithm that was used in this study is
available as part of the software package Rv 1.6.0.
To determine the performance of each model using ANN, a confidence threshold
was built for each diagnostic subtype utilizing a modification of the
method described by Khan et al.2 Models were built based on a decision
tree format where each level of the decision tree contains only two possible
distinctions – class and non-class (for example, T verses non-T).
At each level, using only samples in the training set, 3 ANN models were
built by 3-fold cross validation. The training set samples were then
shuffled and 3 additional ANN models were built. This model building
process was repeated for a total of 100 times at each step of the decision
tree. Then an empirical probability distribution for the ANN output node
value was built only for subtype under study, for example, T-ALL at the
first step of the decision tree. Only nodal values greater than 0.5 for
each subtype were included. For each individual sample in the training
set, the 100 validation subtype node values were averaged and compared
to threshold. Individual samples were assigned to the subtype under study
only when its average subtype nodal value was greater than the 95% confidence
threshold. For samples in the test set, subtype nodal values are averaged
from all models generated in the 3-fold cross validation. A sample is
assigned to the class under study when the average subtype nodal value
is greater than the 95% confidence level defined on the training set.
A sample not assigned to the subtype will progress to the next level
of the decision tree, where the entire process is repeated.
back to table of contents
|