Tsc.uc3m.es

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS Therapeutic Drug Monitoring of Kidney Transplant Recipients using Profiled Support Vector Machines Gustavo Camps-Valls, Member, IEEE, Emilio Soria-Olivas, Juan J. P´erez-Ruixo, Fernando P´erez-Cruz, Member, IEEE, Antonio Art´es-Rodr´ıguez, Member, IEEE, N. V´ıctor Jim´enez-Torres Abstract— This work proposes a twofold approach for thera-
transplantation. At present, despite the appearance of new peutic drug monitoring (TDM) of kidney recipients using Support
formulations, 90% of therapeutic guidelines are based on CyA Vector Machines (SVM), for both predicting and detecting
and, consequently, costs continue to rise year after year 1.
Cyclosporine A (CyA) blood concentrations. The final goal is
Recently, important advances in dose formulation, therapeutic to build useful, robust, and ultimately understandable models
for individualising the dosage of CyA.

drug monitoring (TDM) and guidelines, and the emerging We compare SVM with several neural network models, such as
role of CyA-based combined therapies have resulted in a the multilayer perceptron (MLP), the Elman recurrent network,
substantial improvement in clinical outcomes in renal trans- FIR/IIR networks, and Neural Network ARMAX approaches.
plant recipients [1]. Nevertheless, CyA is generally consid- In addition, we present a profile-dependent SVM (PD-SVM),
ered to be a critical dose drug. Underdosing may result in which incorporates a priori knowledge in both tasks. Models
are compared numerically, statistically, and in the presence of

graft loss and overdosing causes kidney damage, increases additive noise. Data from fifty-seven renal allograft recipients
opportunistic infections, systolic and diastolic pressure, and were used to develop the models. Patients followed a standard
cholesterol. Moreover, the pharmacokinetic behaviour of CyA triple therapy and CyA trough concentration was the dependent
presents a substantial inter- and intra-individual variability, which appears to be particularly evident in the earlier post- The best results for the CyA blood concentration prediction
were obtained using the PD-SVM (mean error of 0.36 ng/mL and
transplantation period, when the risk and clinical consequences root-mean-square-error of 52.01 ng/mL in the validation set) and
of acute rejection are higher than in stable renal patients appeared to be more robust in the presence of additive noise.
[2]. Several factors such as clinical drug interactions and The propose PD-SVM improved results from the standard SVM
patient compliance can also significantly alter blood CyA and MLP, specially significant (both numerical and statistically)
concentrations and, thus, intensive TDM of CyA becomes in the one-against-all scheme. Finally, some clinical conclusions
were obtained from sensitivity rankings of the models and

necessary; however, it influences the patient's quality of life distribution of support vectors. We conclude that the PD-SVM
and the cost of the care.
approach produces more accurate and robust models than neural
Since the trough blood concentration has traditionally been networks. Finally, a software tool for aiding medical decision-
used to monitor CyA therapy, mathematical models that are making including the prediction models is presented.
capable of predicting the future concentration of CyA and Index Terms— Cyclosporine, therapeutic drug monitoring,
adjusting the optimal dosage become necessary. Population neural networks, support vector machines, sensitivity analysis,
pharmacokinetic models and Bayesian forecasting have been used to predict CyA blood concentrations, but their per-formance was not optimal [3], [4]. These models predict plasma drug concentrations based on theoretical models ofdrug distribution and elimination but they often fail when Cyclosporine A (CyA) is still the cornerstone of immuno- the underlying principles are not sufficiently understood or suppression in renal transplant recipients. This immunosup- known to be encoded into a set of relationships [5]. In fact, pressive drug shortens average hospital stays after kidney despite convincing results in many areas, few attempts have Manuscript received December 2004; been made to use neural networks to predict drug behaviour G. Camps-Valls and E. Soria-Olivas are with Grup de Processament Digital [6]–[8]. A different approach to TDM of kidney recipients has de Senyals, GPDS. Dept. Enginyeria Electr onica. Escola T ecnica Superior recently been presented [9], in which the goal is the detection d'Enginyeria. Universitat de Val encia. C/ Dr. Moliner, 50. 46100 Burjassot(Val encia) Spain. E-mail: [email protected].
of subtherapeutic and toxic levels. This could aid physicians F. P´erez-Cruz and A. Art´es-Rodr´ıguez are with Departmento de Teor´ıa de la by providing "alarm signals" for risks that threaten patient Se ˜nal y Comunicaciones, Universidad Carlos III de Madrid, 28911 Legan´es, evolution. Nevertheless, poor results were achieved regarding Madrid, Spain.
J. J. P´erez-Ruixo was with the Pharmacy Service at the Dr. Peset University subtherapeutic levels detection, which could lead to dramatic Hospital, Val encia (Spain), when this paper was prepared. Currently, he is with kidney rejection processes.
the Advanced PK/PD Modeling & Simulation. Global Clinical Pharmacoki- All these limitations, in both prediction and classification netics and Clinical Pharmacodynamics Division. Johnson & Johnson Pharma-ceutical Research & Development, a Division of Janssen Pharmaceutica N.V.
methods, have led us to try to solve the problem of TDM using Beerse (Belgium). E-mail: [email protected].
modern neural networks and Support Vector Machines (SVM), Prof. N. V. Jim´enez-Torres is with the Pharmacy and Pharmaceutical Technology Department, Universitat de Val encia (Spain) and with the Phar-macy Service at the Dr. Peset University Hospital, Val encia (Spain). E-mail: 1The total cost in 1997 of CyA in Spain was 52 million and 12 billion IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS Decision Support System Fig. 1. Learning scheme considered in this paper. Three main blocks are developed to obtain a validated decision support system (DSS). In the pre-processingstage, data are previously "whitened" through Principal Component Analysis (PCA). Afterwards, two approaches are considered for modeling; time seriesprediction of the CyA blood concentration and identification of subtherapeutic and toxic levels. After developing the models, we follow a methodology forextracting valid, novel and potentially useful knowledge from the model by using sensitivity analysis and by inspecting support vectors. In addition, we comparemodels through accuracy and robustness tests. which have shown good results in other fields. Therefore, two and receiver-operating curves (ROC).
main objectives can be distinguished in this work: In addition, we present the so-called Profile-Dependent SVM • Prediction of CyA trough levels. This approach tries (PD-SVM) to incorporate a priori knowledge in both the to predict values of CyA blood concentration from the prediction and the classification models. The PD-SVM gives previous values by following a time-series methodology.
different confidence weights to different parts of the training This same approach was followed in previous communi- data to focus the training on the a priori most important cations [10]–[12] but only short time series and a reduced regions, the most recent samples for time series prediction, and population were available. In the present paper, we extend samples near the decision thresholds for levels classification.
these works by developing more neural and kernel pre- Finally, we perform sensitivity analyses on the MLP and dictors using a larger database. We compare SVM with inspect the distribution of the support vectors in order to extensively used neural networks in other fields, such as gain knowledge about the problem. This paper constitutes the multilayer perceptron (MLP), Elman recurrent net- the natural extension of the works [12], [13], in which only work, FIR/IIR networks, and Neural Network ARMAX a limited number of prediction approaches was carried out (NNARMAX) approaches. Comparison is carried out in and no attention was paid to the structure of the "black-box" terms of accuracy by evaluating classical bias-variance models. In this paper, a common methodology consisting of trade-off measurements, and in terms of robustness by three basic steps is followed to develop a Decision Support analysing performance when noise is introduced at input System (DSS) for TDM based on modern techniques. A general learning scheme for our proposal is illustrated in Fig.
• Prediction of CyA levels class. Due to the high inter- and 1. The basic characteristics of a general DSS are ensured intra-subjects variability, non-uniform sampling, and non- in our case study as follows: (a) the system is robust, as it stationarity of the time series, the difficulty of prediction incorporates models that show good performance in noisy task is well known. In fact, intensive TDM tries to keep environments; (b) the system is accurate and unbiased, as CyA blood levels in the therapeutical range (usually models show good bias-variance trade-offs; and (c) the system established in the range 150-400 ng/mL) by making incorporates a priori knowledge (profiled model) and allows adjustments in the patient's drug regimen. Therefore, an the user to extract rules for dosage adjustment (sensitivity alternative approach to the time-series prediction method- analysis). Rather than providing the clinician with dosage ology consists of identifying future toxic (>400 ng/mL) prediction, the system gives an estimation of future CyA blood and subtherapeutical levels (150 ng/mL). Two different concentration, alarm signals for subtherapeutic or toxic levels, schemes are used for that purpose: one-against-all clas- feature ranking, and follow-up statistical information.
sifiers and multi-classifiers. We compare performance of The rest of the paper is organized as follows. Data collection SVM and MLP in both schemes with recognition rates is introduced in Section II. Section III presents a review of the CAMPS-VALLS ET AL.
predictive techniques used with special emphasis on SVM. The [19]. The obtained results showed a high degree of results are presented in Section IV. A discussion is provided in concordance with NONMEM and linear modeling [20], Section V along with some concluding remarks and a proposal [21]. From this study, we obtained a reduced subset of for further work.
eleven relevant patient factors that, along with dosage,CyA blood concentration, and post-transplantation days, II. PATIENTS AND DATA COLLECTION were used to build the models. Some basic populationstatistics of these factors are shown in Table I. It is worth Sixty-seven renal allograft recipients treated in the Nephrol- noting that almost all variables have significant non- ogy Service of the Dr. Peset University Hospital in the city normal distributions (z-values for skewness and kurtosis of Val encia (Spain) were initially included in this study.
greater than ±3.08, p < 0.001). This problem was The exclusion criteria took into account patients with grave addressed applying suitable transformations to raw data affection of other vital organs, active neoplasia or metastasis risk, active infections, presence of HIV virus, presence of uri-nary or vascular abnormalities, and patients older than 70. In Two-thirds of the patients were used to train the models addition, patients who did not fulfil the prescribed posology or and the rest were used for their validation using the hold out who received metabolic inducers or inhibitors were excluded method. Population was randomly assigned to two groups: 39 because they modify the pharmacokinetic profile of CyA.
patients (665 patterns) were used for training the models and Patients received a standard immunosuppressive regimen 18 patients (427 patterns) for their validation. This process was (triple therapy basis) with a microemulsion formulation of repeated until three basic conditions were met: (1) variables CyA (Sandimmun Neoral), mycophenolate mofetil (2 g/d), should take variations in mean and variance between training and prednisone (0.5-1 mg/kg/d). The initial oral dose of CyA and validation sets lower than 15%; (2) each subset should (5 mg/kg b.i.d) was reduced according to the measured CyA contain an approximately similar proportion of male and blood concentration and the desired target range (150-300 female patients; and (3) patients who were monitored for ng/mL) [14]. Steady state blood samples were withdrawn 12- longer period of time were assigned to the validation set. When 14 hours after dose administration. CyA blood levels were the three conditions could not be fulfilled simultaneously, measured by a specific monoclonal fluorescence polarisation condition (3) was adopted and (2) relaxed. This three-step immunoassay (Abbott, TDx), with inter- and intra-assay vari- randomising methodology ensures balanced datasets to avoid ation coefficients of less than 7.5% [15].
population differences that could bias the models [23].
The study collected many potentially relevant variables of Concentration (ng/mL) all monitored patients: • Anthropemetrical factors: weight (Kg), age (years), and Early post−transplantation days: high inter−subjects variability.
• Biochemical factors: urea (mg/dL), creatinine (mg/dL), creatinine clearance (mL /min), total protein and albumin Stationary state after the thirdpost−operatory month. (g/dL), bilirubin (mg/dL), cholesterol and triglicerides Therapeutic target range: 150−300 ng/mL • Hepathical enzymes: aspartate aminotransferase (IU/L), transpeptidase (IU/L), and alkaline phosphatase (IU/L).
• Hemathological factors: hematocrit (%), haemoglobin (g/dL), and leukocites (U /mm3).
• Clinical factors: systolic and diastolic arterial pressure Post−transplantation days (d) At this point, two analyses were performed: Distribution of blood concentration (ng/mL) of CyA. The solid line represents a specific patient profile. Dashed lines indicate the desired 1) We detected 10 patients who contained more than 10% (therapeutic) target range (150–300 ng/mL).
of statistical outliers (more than two standard deviationsfrom the mean) in the original data distribution of many The health care team in the hospital decides daily the variables. In fact, when considering all available descrip- next dose to administer by assessing the patients' factors and tors, 23% of samples were outliers. These patients were their evolution. This protocol, nevertheless, produces three withdrawn for developing the models and hence, a final undesired features in the time series: cohort of fifty-seven patients was considered.
1) High variability. High inter-subject variability is found 2) At the same time, we analyzed the available data us- (coefficient of variation, CV=31%), especially remark- ing classical statistical analyses (correlation analysis, able in the early post-transplantation days (PTD) (CV normality plots, higher-order statistics), Principal Com- = 43%). In this period, it becomes necessary to raise ponent Analysis (PCA), Self-Organising Maps (SOM) or lower dosage while closely monitoring the patient's [16], Classification and Regression Trees (CART) [17], concentration, as shown in Fig. 2.
and Multivariate AutoRegressive Splines (MARS) [18] 2) Non-stationarity. Therapy tries to keep CyA levels in in order to get a preliminary subset of relevant features the target range in order to avoid nephrotoxicity or IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS CHARACTERISTICS OF PATIENTS IN THE STUDY FOR THE TRAINING AND THE VALIDATION (IN BRACKETS) SETS. RESULTS ARE PRESENTED AS MEAN ± STANDARD DEVIATION (SD) AND THE RANGE, EXCEPT FOR THE GENDER WHICH IS GIVEN AS THE NUMBER OF SUBJECTS. THE SHAPE DESCRIPTORS (KURTOSIS AND SKEWNESS) ARE CALCULATED OVER THE WHOLE POPULATION.
21 (14) male and 18 (4) female 22.79 - 68.21 (28.82 - 69.43) -0.26 (-3.49) 2.29 (15.42) 1 - 434 (2 - 426) 1.08 (14.58) 3.07 (20.74) CyA concentration (ng/mL)
55.69 - 615.98 (45.79 - 664.17) 0.95 (12.80) 4.33 (29.21) Daily dosage/Weight (mg/Kg/d)
0.71 - 12.04 (1.88 - 9.45) 0.75 (10.09) 2.90 (19.54) Urea (mg/dL)
17 - 332 (22 - 279) 2.25 (30.42) 8.44 (56.95) 0.70 - 10.60 (1 - 10.20) 3.27 (44.14) 15.38 (103.74) Creatinine clearance (ml/min)
7.77 - 117.42 (7.77 - 107.03) 2.56 (17.28) Alkaline phosphatase (IU/L)
62 - 683 (65 - 724) 1.96 (26.49) 8.25 (55.65) 0.22 - 6.10 (0.30 - 2) 3.66 (49.37) 24.33 (164.12) Hematocrit, HTO (%)
22 - 55 (22 - 52) 0.32 (4.26) 2.61 (17.58) The z values are derived by dividing the statistics by the corresponding standard errors of 6/N (skewness) and Significant at the 0.001 level.
transplant rejection. This provokes the presence of non- model, to make use of spatially-converted temporal patterns.
stationary processes in the time series.
This approximation was previously followed in [13], [26].
3) Non-uniform sampling. The individualisation procedure Neural Network ARMAX modeling is intimately related to directly affects the sampling of the time series.
the latter approach. In a NNARMAX model [27], given a pair To deal with these problems, some issues should be taken into of input-output discrete time series, a multilayer perceptron account. For example, non-stationarity was treated efficiently (MLP) is used to perform a mapping between them, in which with dynamic neural networks. The problem of non-uniform past inputs, past outputs and past residuals can be fed into sampling was initially addressed using the classical strategy of the input layer. Selecting a model structure is much more interpolation and resampling, but this produced overoptimistic difficult in the nonlinear case than in the linear case (classical results. Therefore, we decided to alleviate the problem by in- ARMA modeling). Not only is it necessary to choose a set of corporating post-transplantation days into the model globally, regressors but also a network architecture is required. There- i.e. no time-series differences were thus undertaken. Finally, fore, several regression structures are available. In this work, some modifications on the training algorithms of the models we use the ones that best fit to our interests; NNARMAX2 were carried out, as will be shown in the next section.
(the regression vector is formed by past inputs, past outputsand past residuals), NNSSIF (the regressor is in the form ofstate space innovations), and NNOE (the regression vector is III. METHODS AND EXPERIMENTAL SETUP formed by past inputs and estimations). All these models are A. Static and dynamic neural networks extensively described in [27]. The use of NNARMAX modelsin control applications and nonlinear system identification A common approach to time series prediction is the AutoRe- has expanded in the past years. Its main advantage is the gressive Moving Average (ARMA) model. However, ARMA use of a non-linear regressor (usually an MLP) working on models are not suitable to our problem due to the non-linearity, a fully tailored "state" vector [28]. This makes the model non-uniform sampling, and non-stationarity of the time series.
specially well suited to our problem because we can design Hence, many researchers have turned to the use of non-linear the endowed input state vector to accommodate our non- models, such as neural networks, in which few assumptions stationary dynamics. This can be done, for example, by adding must be made. The multilayer perceptron (MLP) is the most more "memory" in the form of error terms if we observe that commonly used neural network, which is composed of a prediction error is not completely exploited.
layered arrangement of artificial neurons in which each neuron In addition, there are two more approaches for introducing of a given layer feeds all the neurons of the next layer. This dynamic capabilities into a static neural network: model forms a complex mapping from the n-dimensional inputto the binary output, ψ : Rn −→ {0, 1}. For regression 1) Synapses as digital filters. To substitute the static synap- purposes, the MLP mapping has the form ψ : Rn −→ R.
tic weights for dynamic connections, which are usually However, it is a static mapping; there is no internal dynamics linear filters. The FIR neural network models each [24], [25]. This problem can be easily addressed by including synapsis as a Finite Impulse Response (FIR) filter [29].
an array of unit-delay elements, called a tapped-delay line There are striking similarities between this model and CAMPS-VALLS ET AL.
the MLP. Notationally, scalars are replaced by vectorsand multiplications by vector products. These simple analogies carry through when comparing standard back-propagation for static networks with temporal backprop-agation for FIR networks [24]. FIR neural networksare appropriate to work in non-stationary environments or when non-linear dynamics are observed in the timeseries because time is treated naturally in the synapsis The Optimal Decision Hyperplane (ODH) in a linearly separable itself, that is, the networks have internal dynamics. In problem. Maximizing the margin is equivalent to minimizing w. Only
fact, they have demonstrated good results in problems support vectors (stars) are necessary to define the ODH.
with those characteristics, such as speech enhancement[30], and time series prediction [31].
such as speech and language processing. Additionally, In addition to the FIR network, we have used the gamma Elman networks can result in efficient models to both network [32], a class of Infinite Impulse Response detect or generate time-varying patterns [35], hence, its (IIR) filter-based neural network which includes a local suitability to our problem.
feedback parameter. In this structure, the FIR synapsisthat uses the standard z-Transform delay operator z −1 Despite the fact that dynamic neural networks have been is replaced by the so-called gamma operator extensively used in areas such as signal processing and control,its use in biomedical engineering and medicine in general, G(z) = and in TDM in particular, has received little attention. These z − (1 − µ) , networks are introduced here to deal more efficiently with where µ is a real parameter which controls the memory time-varying patterns. However, some serious difficulties have depth of the filter. As pointed out in [33], gamma filters been found in its application to our problem. The main limi- are theoretically superior to standard FIR filters in terms tations to the use of these networks are the need for long time of number of parameters required to model a given series and for an unconstrained number of filter parameters dynamics. The filter is stable if 0 < µ < 2, and G(z) through the networks. Also, the use of these methods to predict reduces to the usual delay operator for µ = 1. This CyA blood concentration becomes more complicated since filter also provides an additional advantage: the number we develop individual models rather than population ones.
of degrees of freedom (order K) and the memory This forces us to change the adaptation rules of the network depth remain decoupled, as shown in [33]. A proposed when data coming from a new patient is presented to the measurement of the memory depth of a model, which networks in each iteration (epoch). In these situations, we allows us to quantify the past information retained, is updated the corresponding internal states (contextual neurons given by K/µ and has units of time. Hence, values of or filter coefficients in FIR/IIR synapses) of the network to µ lower than the unit increase the memory depth of the the same parameters the patient had in the previous epoch filter. The use of the gamma structure in a neural network and then we applied the usual updating rules. This procedure can be two-fold: we could substitute each scalar weight could be interpreted as a patient-based batch learning. This in an MLP with a gamma filter, or we could use a gamma process produces oscillation of the training error, which was unit delay line as the first layer of a classical MLP, which alleviated with a correct choice of initialisation parameters, yields the so-called focused gamma network. The latter and, in some cases, by using high values of the momentum is the adopted approach in our work since it is more simple and allows us to scrutinize the needed memorydepth for the problem by analysing the weights of this B. Support Vector Machines layer. The use of this network has focused attention forde-noising, communications, and time series prediction Support vector machines (SVMs) are state-of-the-art tools [34]. In general, the gamma network can deal efficiently for linear and nonlinear input-output knowledge discovery with complex dynamics and lower number of network [36], [37]. The SVM was first proposed to solve nonlinear parameters, which could make it a priori well-suited to binary classification in [38]. Since then it has been extended to our problem since patient dynamics are local in the early Regression [39] or to multiclass problems [36], among others.
In the following, we revise the implementations used in this 2) Recurrent networks. To construct loops in the connec- tions between neurons or layers of the network. The 1) Support Vector Classifiers: Given a labeled training data Elman recurrent network is a simple recurrent model set {(x1, y1), . ., (xn, yn)}, where xi ∈ RN and yi ∈
with feedback connections around the hidden layer. In {−1, +1}, and a nonlinear mapping φ(·), usually to a higher
this architecture, in addition to the input, hidden and (possibly infinite) dimensional (Hilbert) space, φ : RN −→ H,
output units, there are also context units, which are the SVM method solves: only used to memorize the previous activations of the hidden units [35]. The application of recurrent neural networks has traditionally been linked to applications IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS where now yi ∈ R, and ξi, ξ∗i and C are, respectively, positiveslack variables to deal with training samples with a prediction yi(φT (xi)w + b) 1 − ξi
∀i = 1, . . , n error that is larger than ε (ε > 0) and the penalisation applied ∀i = 1, . . , n to them. The usual procedure for solving SVRs introduces where w and b define a linear classifier in the feature space.
the linear restrictions using Lagrange multipliers, computes The non-linear mapping function φ is performed in accordance
the Karush-Kuhn-Tucker conditions and solves Wolfe's dual with Cover's theorem, which guarantees that the transformed problem using quadratic programming procedures [45], [46].
samples are more likely to be linearly separable in the resulting 3) The profile-dependent SVM (PD-SVM): It is a common feature space (see Fig. 3). The regularization parameter C practice in classification problems with unbalanced classes to controls the generalization capabilities of the classifier and it set different penalisation factors for each class, which are must be selected by the user, and ξ usually proportional to the class priors [47], according to i are positive slack variables enabling to deal with permitted errors.
fuzzy-rules [48], or posteriors [49]. This approach combats Due to the high dimensionality of vector variable w, primal
the presence of false positives, which obviously produces function (2) is usually solved through its Lagrangian dual "unbalanced" models. This problem is specially relevant in problem, which consists of solving bioengineering applications and could prevent models from their use in a real clinical environment. This way, the SVM learns to classify patterns independently from the class they belong to (we have control over its learning); something that is not possible when using an overall penalisation C. In the field constrained to 0 ≤ αi ≤ C and of time series prediction, this approach can be extended by i αiyi = 0, i = 1, . . , n, where auxiliary variables αi are Lagrange multipliers corre- considering that the most recent samples contain, in principle, sponding to constraints in (3). It is worth noting that all φ
more information. Therefore, problems with non-stationary mappings used in the SVM learning occur in the form of inner processes can be alleviated using a different penalisation factor products. This allows us to define a kernel function K: (or insensitivity zone) for each training sample t according toa certain confidence function Ct on the samples. This allows the regression machine to follow, in principle, the probability and then a non-linear SVM can be constructed using only density function variations over time [50], [51]. In this paper, the kernel function, without having to consider the mapping we tailor specific profiles for each problem: φ explicitly. Then, by introducing (6) into (5), the dual
• Prediction of CyA trough levels. This approach was problem is obtained. After solving this dual problem, w =
previously presented in [51] for CyA level prediction.
i=1 yiαiφ(xi), and the decision function implemented by
In [13], profiles were defined in terms of clusters rather the classifier for any test vector x is given by
than fixed a priori for the same problem and, in [52], we designed profiles for another complex pharmacokinetic f (x) = sgn
problem. These experiences suggested tailoring profiles iαiK(xi, x) + b
based on exponential memory decay functions, which hasalso been observed by other authors [53], [54]. Therefore, where b can be easily computed from the αi that are neither a good practice is to consider an exponential memory nor C, as explained in [40].
decay based on the confidence of past samples: The SVM extension for multi-class problems is far from being unique and none of them seems to be superior to the Ct = λtn−ti, λ ∈ [0, 1] others [41]. In this paper, we have used the one-against-allimplementation and one-against-one approach. In the one- where tn is the actual time sample and ti is the time against-all approach, each class is compared with all the others instant for sample i. This profile reduces the penalisa- together [42]. In the one-against-one scheme, the problem can tion parameter and enlarges the ε-insensitive region of be casted directly as a generalisation of the binary classifica- previous samples as new samples are obtained.
tion scheme [36], [43], [44].
• Prediction of CyA levels class. The same approach can 2) Support Vector Regressor (SVR): The Support Vector be used in a classification task by increasing penalisation Regressor (SVR) is the support vector implementation for near the decision borders (150 ng/mL and 400 ng/mL) to regression and function approximation. Following the previous avoid false detections. With regard to the formulation, one notation, SVR methods, find the minimum of: has only to substitute the standard penalisation parameter C with a time-dependent penalisation Ct, as follows: (ξi + ξ∗ 150,t+1 = 150 2Ct + k1Ct−1 + ko] with respect to w, ξ
i, ξ∗ i and b, subject to: C400,t+1 = 400 k2Ct + k1Ct−1 + ko] yi − φT (xi)w − b ≤ ε + ξi, ∀i = 1, ., n
With this approach, we intuitively increase the penalisa- φT (xi)w + b − yi ≤ ε + ξ∗i, ∀i = 1, ., n
tion of errors as we approach the decision border. The ξi, ξ∗i ≥ 0, ∀i = 1, ., n additional penalisation factors ki can be fixed a priori or CAMPS-VALLS ET AL.
computed in an adaptive way by taking advantage of the from 1 to 3. Additionally, we left weights unaltered if the Iterated Re-Weighted Least Squares (IRWLS) procedure committed error for a pattern was below an error threshold [50]. In our application, we only considered a heuristic (ε < 25 ng/mL). If not, we proceeded to the application approach in which several combinations were tested.
of the typical equations of the on-line back-propagation (BP) Further work will consider refined updating rules.
learning algorithm. In terms of statistical learning, the latter The inclusion of a temporal confidence function in the SVM approach can be referred to as the quadratic ε-insensitive cost formulation offers some advantages. Essentially, the overall function. These modifications produced higher recognition number of SV remains constant through time and better results rates than the MLP trained with the standard BP and, in turn, are obtained when abrupt changes appear in the time series, as reduced the computational burden.
demonstrated in [50], [54]. In addition, note that the computa- The training process for the FIR network was difficult tional complexity of the proposed method is the same than for because of its complexity since the number of free parameters the standard SVM formulations, since the functional and the increases geometrically with the number of inputs, as shown in number of constraints is the same. The only shortcoming in [29]. In order to obtain accurate models, a great many sweeps the PD-SVM is the design of the confidence function (Eq.
were performed, varying the number of hidden neurons (from (12) for regression or Eqs. (13)-(14) for classification). In 2 to 25), the number of taps per synaptic connection in a our previous experience, the inclusion of a priori meaningful layer (from 1 to 2) and the learning rate (typically between profile functions alleviated this restriction, resulting in more 0.0001 and 0.01). Initialisation of the weights depending on elegant solutions to the problem [52], [55], [56].
the structure of the net was also used, as proposed by manyauthors [25], [29]. Models with few taps (<4) and long training (number of epochs >10000) were necessary to attain satisfactory results. The training of the Elman network depends This section is organized as follows. First, we include to a large extent on the learning rate and the number of context a detailed description about pattern building, which serves neurons. In fact, the network must be trained using high values equally well both for prediction and classification. Then, we of the momentum term (α > 0.8) because this prevents weight show and discuss results for the prediction of CyA levels. Af- oscillations in the training process.
terwards, we compare MLP and SVM for the identification of In the case of SVMs, we tested linear, polynomial and Ra- subtherapeutic and toxic levels. Finally, we perform sensitivity dial Basis Function (RBF) kernels to obtain the SVR solutions.
analysis of the MLP and analyze the SV distribution.
There are many reasons to select the RBF kernel a priori:it has less numerical difficulties than linear, polynomial or A. Building the input-output data sigmoid kernels, and only the Gaussian width has to be tuned.
In order to develop a model, the input patterns from the In addition, sigmoid kernels are non-positive definite kernels in available data for each patient must be previously built. Both all situations, which precludes their practical application [40], for the problem of time series prediction and classification, we [57], [58]. Note that one or more free parameters must be built the input pattern following a time series methodology.
previously settled in the nonlinear kernels. We used exponen- For this purpose, we tested several sizes of the time window tially increasing sequences of C (C = 102, 101, · · · , 106), (from one to five post-operatory days) [19]; however, given and σ (σ = 1, · · · , 50). The tube size ε was tuned linearly that prediction in the early post-transplantation period is (ε = 0, · · · , 0.75). The polynomial order d was varied in the strictly necessary, time-series were lagged by only one sample.
range 1 to 8, as suggested in the literature [59]. An additional Therefore, an input pattern for a given patient contains two parameter for the PD-SVR was λ for the exponential memory samples from each variable in Table I (except for the gender) decay, which was varied from 0.70 to 1 in steps of 0.05. During at time t and t − 1, resulting in a total input dimension of 21 the development of the models, the data were pre-processed (10×2 + 1). This scheme allows a prediction in the first four to give zero mean and unit variance. All models were de- post-operatory days as a mean and, as a result, yields a fixed veloped in MATLAB environment (Mathworks, Inc). Since unique model.
the computational burden was high, m-files were translated to The prediction task consisted on estimating the value of MEX-files and the programs run on a Pentium III (1.8GHz) CyA blood concentration at time t + 1 from the input pattern.
with 256MB RAM.
The classification task used the same input pattern to identify The criterion used to select a candidate model for the final the range of concentration at t+1. More details on this scheme system was based on the model predictive performance in are given in Section IV-C.
the validation data set. Bias was measured using the meanerror (ME). The root-mean-square error (RMSE) was used asa measure of precision. In addition, we measured blood levels B. Prediction of CyA levels accurately predicted (%BLAP) if an error margin of 20% is 1) Model development: With regard the MLP and NN- fixed, as proposed in [3], [4]. We used the mean of the absolute ARMAX models, we varied the number of hidden neurons prediction error to compare the precision of the models using (< 20 to avoid overfitting), the initialisation of weights and the one-way ANOVA method. The results were also assessed the learning rate (between 0.01 and 1) in order to determine by inspecting the correlation coefficient (r) as a measure of the best topology. We also penalized committed errors larger goodness-of-fit. The model accuracy was tested by using the than 50 ng/mL by a penalization factor P , which was varied one-way analysis of variance (ANOVA) method.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS BEST RESULTS FOR THE CYA BLOOD CONCENTRATION PREDICTION OF ALL THE MODELS. THE ROOT-MEAN-SQUARE ERROR (RMSE), THE MEAN ERROR (ME) AND THE CORRELATION COEFFICIENT (r) ARE GIVEN.
BLOOD LEVELS ACCURATELY PREDICTED (20%BLAP) ARE ALSO SHOWN. 95% CI ARE GIVEN IN BRACKETS, WHICH WERE CALCULATED USING BOOTSTRAP METHODS FOR THE CASE OF RMSE.
Residuals (ng/mL) Desired concentration (ng/mL) Predicted concentration (ng/mL) Predicted concentration (ng/mL) PD-SVR performance in the validation set for the CyA blood concentration prediction. The solid line represents the line of identity, and the dotted one is the regression line. (a) Predicted versus observed dosages and (b) predicted versus residuals, in which the slope of the linear regression is a measure of the expected systematic concentration-related deviation in the patients and perform similarly in the early post-transplantation period. Nevertheless, three patients have very poor predictions (RMSE > 60 ng/mL) which could be due to errors in drug dosage administration, to the inter- and intrasubject variability in the drug absorption process, to the recording of blood sampling times, or abrupt changes in each patient's clinical condition. A linear dependency on interindividual variability could explain those results. In fact, patients with highly biasedestimations are precisely those with high values of coefficient The distribution was divided into three monitoring regions (period of variation (CV>30% on the whole series and CV>45% in 1: 0–60 days, period 2: 60–160 days, period 3: 160–400 days) and a different PD-SVR was applied in each of them. Both ε and C were the early post-transplantation month). When these patients are tuned using an exponential memory decay: λ1 = 0.98, λ2 = 0.995, not considered, results improve drastically (r=0.78, ME=0.27 λ3 = 0.998.
ng/mL, RMSE=49 ng/mL, BLAP20%=75%).
As an example of these situations, we show evolution (CyA dosage and blood concentrations) of three patients with good 2) Numerical analysis: Table II shows results in the pre- (Fig. 5a), acceptable (Fig. 5b) and poor (Fig. 5c) predictive diction task for all models in the validation set. Results do performances. In the same figure, we show CyA concentration not yield significant numerical differences between models in predictions of MLP, FIR network, and the SVR method. We terms of accuracy (RMSE) or goodness-of-fit (r) (see Table usually obtain good predictions (RMSE<40 ng/mL) in patients II). A better performance is achieved with PD-SVR, NNOE, under 50 years old, with total body weight higher than 50 Kg NNSSIF, and FIR models regarding accuracy and success and with low number of subtherapeutic or toxic CyA blood rates. The PD-SVR method performs better than MLP, FIR, levels (<10%). We have observed average over-estimations and the standard formulation of the SVR. The less biased mod- (ME>0) in patients with accurate predictions even in the early els are the Elman recurrent network (0.30 ng/mL), SVR (0.38 post-transplantation days or when receiving moderate doses.
ng/mL), and PD-SVR (0.36 ng/mL) methods. In addition, the Abrupt changes in the time series produce appreciable loss size of the difference between CI95% is similar for all models, in the prediction, as shown in Fig. 5b, where in two weeks, but SVR and PD-SVR keep them symmetrically distributed.
a toxic level is reached with a moderate dosage. Even with An ANOVA test shows that no statistical differences in bias these difficulties, models efficiently capture the dynamics in (F=0.6292, p=0.7540) or accuracy (F=0.3912, p=0.9259) are the first post-operatory month. The poor results obtained in Fig. 5c can be mainly due to the presence of abrupt changes Figure 4 shows predicted-versus-observed and predicted- in the first post-transplantation month. These poor predictions, versus-residuals plots in the validation set for the PD- even with moderate initial doses (<8 mg/Kg), could be due to SVR, which yields the best compromise between accuracy the higher variability in the absorption and disposition process and bias of the estimations. Good determination coefficients of CyA during the first four weeks of post-transplantation. In (r2 = 0.78) and negatively biased estimations (linear re- fact, under-prediction in this period is a common characteristic gression; slope±IC95%: 1.112±0.097, intercept ± IC95%: - of the models for almost all the patients, and the prediction 27.524±24.755) are observed. Residuals do not indicate any of CyA blood concentration presents serious difficulties. This trend. As the figure shows, all the models capture abrupt problem is lessened as long as blood levels become more changes in the time series of CyA blood concentration in the stable, which is when slight over-predictions are obtained (see CAMPS-VALLS ET AL.
blood concentration (ng/mL) Post-transplantation days (d) Standard deviation of the Gaussian noise, Standard deviation of the Gaussian noise, σ Evaluation of the (a) (absolute) mean error and the (b) RMSE Daily dose (mg/Kg/d) 3 measurement when additive Gaussian noise with zero mean and standard deviation σ is introduced in the predictive models. Results refer to the Post-transplantation days (d) validation set and were repeated 100 times, which represents a reasonableconfidence margin for the measurements.
patients in Fig. 5a and 5b).
Since no numerical or statistical differences were observed between the neural and kernel models, we decided to test their robustness by introducing additive noise at models inputs. This blood concentration (ng/mL) 100 can simulate situations such as blood sampling errors, patient compliance and the sensitivity of the model to exact input Post-transplantation days (d) values. This process was tested with the most precise and unbiased models (ELMAN, FIR, PD-SVR) and the classical MLP network. In Fig. 6, we show the performance (bias and accuracy) in models when different levels of noise variance (σ) are introduced. Both measurements increase as the noise level is increased. However, as σ is increased, PD-SVR shows an Daily dose (mg/Kg/d) 4 excellent behaviour regarding bias and accuracy. We conclude that PD-SVR offers excellent robustness capabilities when Post-trasplantation days (d) low noise levels are introduced (σ <0.05), which indicates less sensitivity to exact input values in normal situations.
Certainly, regularisation not only provides smoother solutions but also improves stability of predictions. This issue has been extensively demonstrated in the literature in general, and in our problem in particular.
blood concentration (ng/mL)CyA C. Levels identification Post-transplantation days (d) Even though the previous forecasting models are accurate, they do not capture abnormal CyA levels (only 5% of toxic levels and 4% of subtherapeutic levels are correctly predicted), and thus they would not aid in preventing nephrotoxicity or transplant rejection. An alternative approach consists in pre- dicting whether the next CyA blood level increases (decreases) Daily dose (mg/Kg/d) 2 to a toxic (subtherapeutic) level. With such an approach, the prediction task becomes a classification problem with three Post-transplantation days (d) classes (CyA levels <150, [150, 400] or >400 ng/mL) andthus Nc = 3. For this purpose, we developed two schemes: Plots of evolution of three individual patients showing (a) good (RMSE=34.8 ng/mL), (b) acceptable (RMSE=42.9 ng/mL), and (c) unsatisfac- (1) the one-against-all classification scheme, in which each of tory (RMSE=73.1 ng/mL) predictive performance. For each patient, the upper the three binary classifiers is trained to distinguish the samples panel shows observed (thick solid line) and predicted CyA blood concentration in a given class from the samples in the two remaining classes; (ng/mL) using PD-SVR (thin solid line), SVR (thinner solid line), FIR (thindashed line), and MLP (thin dotted line). In each bottom panel, the oral dose and (2) the one-against-one scheme, in which N c(Nc − 1)/2 (mg/Kg/d) versus post-operatory day is represented for proper analysis.
binary classifiers are developed to distinguish a pair of classes.
1) Model development and comparison: In this task, two approaches were considered; one-against-all and one-against- IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS than MLP (size of the difference is 11.54%) but it yields a lower recognition rate (size of the difference is 5.88%) CONFUSION MATRICES IN THE VALIDATION SET OF THE MLP (IN when predicting subtherapeutic levels. In any case, sensitivity BRACKETS), SVM (ITALICS) AND PD-SVM (BOLD FACE) MODELS FOR is much better than in a previous work by Hirankarn et al.
PREDICTING TOXIC AND SUBTHERAPEUTIC LEVELS USING [9], in which accuracy in subtherapeutic ranges was about ONE-AGAINST-ALL (TOP) AND ONE-AGAINST-ONE SCHEMES (BOTTOM).
62%. This issue has been addressed by using the PD-SVM, ONE-AGAINST-ALL CLASSIFICATION SCHEME
which improves results of the standard SVM and MLP models, Actual CyA levels [ng/mL]
especially significant for subtherapeutic level detection.
24 24 (21)
78 123 (92)
1 1 (4)
264 194 (244)
2 2 (2)
1 1 (1)
42 67 (48)
13 13 (14)
Actual CyA levels [ng/mL]
150 ng/mL
25 24 (24)
143 150 (144)
1 2 (2)
184 170 (173)
1 1 (1)
0 0 (0)
57 64 (67)
16 16 (15)
The best results for the PD-SVM were obtained using k2=1, k1=0.3, ko=0 for identifying toxic levels, k2=-1, k1=-0.4, ko=600 for subtherapeutic levels, and k % False Alarms (100-Specificity) % False Alarms (100-Specificity) 2=1, k1=0.12, ko=0.05 in the one- against-one scheme.
Receiver operating characteristic (ROC) curve of the MLP and the SVM methods in the validation set when used as dedicated prediction modelsof (a) subtherapeutic levels and (b) toxic levels. The decision limit γ was one classifiers. With regard to the MLP models, we varied varied throughout the output range [–1,+1] to obtain this curve. Circles (MLP),crosses (SVM) and stars (PD-SVM) represent the origin of the decision limit the number of hidden neurons (< 20 to avoid overfitting), the weight initialisation range, and the learning rate (between 0.01and 3) in order to determine the best topology. We tested linear, In order to assess models' performance, we also calculated polynomial, and Radial Basis Function (RBF) kernels to obtain the area under a receiver operating characteristic (ROC) curves the SVM solutions. The same ranges as the ones presented in for the dedicated classifiers of toxic and subtheratpeutic blood Section IV-B.1 were used. An additional parameter for the levels (Fig. 7). The plot of false alarms versus hits provides a PD-SVM was ki, which were heuristically tuned.
useful way to compare models for a wide range of ‘levels of In a multiclass problem, one usually optimizes a global discrimination', and thus it has become a traditional method measurement of model performance, such as the overall for model assessment. In our models, the area under the success rate (SR[%]). However, in unbalanced datasets, it ROC curve (AUC) is higher with the PD-SVM method in is more useful to pay attention to the sensitivity/specificity detecting both subtherapeutic (MLP150: 76.47%, SVM150: in order to avoid skewed results. In our case, models were 82.77%, PD-SVM150: 87.43%) and toxic levels (MLP400: selected by evaluating the average of the sensitivity (SE) and 89.83%, SVM400: 95.11%, PD-SVM400: 97.45%). In ad- specificity (SP) factors obtained in the classes of interest dition, the PD-SVM reduces the number of false positives (toxic and subtherapeutic levels), since the final goal is to for both models (see Table III[top]) and the number of SVs obtain a highly sensitive, robust classifier capable of provid- (28% against 35% of the whole training set for the standard ing ‘alarm signals' for patient's monitoring. The sensitivity formulation). An additional comment refers to the distance determines the percentage of true results that are correctly between discrimination levels. When large jumps from one classified, e.g. CyA levels greater than 400 ng/mL correctly level of discrimination to another are found in these curves, classified, and specificity determines the percentage of false a lack of knowledge of the classifier's behaviour in that area results that are correctly classified, e.g. greater than 150 ng/mL is present. In this sense, it is worth noting that very similar correctly classified. Since the distribution of classes is highly curves are observed for toxic level models (Fig. 7b), but unbalanced (4.95% of the cases are subtherapeutic levels and (slightly) lower confidence can be obtained by using an MLP 5.77% of the cases are toxic levels), committed errors were for predicting subtherapeutic levels (Fig. 7a). Similar results systematically penalized according to the number of cases in are observed between the SVM and its profiled version. The each class [47]. In the case of the PD-SVM, the former priors- latter conclusions can also be observed by inspecting the false based penalisation multiplies the penalisation provided by the alarm rate in Table III[top]. In general, the ROC curves relative specific profile (Eqs. (13) and (14)).
to the PD-SVM are above and on the left compared with the 2) Numerical comparison: Table III[top] shows one- ROC curves relative to the other models.
against-all models performance. High rates of the (SE+SP) Table III[bottom] shows the one-against-one confusion ma- criterion (> 81%) were obtained for all classifiers. This result trix. Several conclusions can be drawn: (1) The PD-SVM enables its use as sensitive models for the real clinical practice.
method yields better results than the rest, specificially the Specifically, the SVM classifier yields a higher sensitivity ratio MLP, with a raise of 2.23% in terms of SR[%] and 6.25% CAMPS-VALLS ET AL.
in terms of SE+SP; (2) a dramatic error in classifying toxic RANKING OF INPUT VARIABLES ACCORDING TO THE DELTA ERROR (DE), patterns is committed by the MLP; (3) the PD-SVM improves AVERAGE GRADIENT (AG) AND AVERAGE ABSOLUTE GRADIENT (AAG) sensitivity of detection of subtherapeutic levels and drastically MEASUREMENTS FOR THE BEST MLP. THE MOST RELEVANT INPUT reduces the misclassification rate of therapeutic levels; and VARIABLES ARE DAILY DOSAGE (DD), CYA BLOOD CONCENTRATION (C), (4) once again model complexity is reduced with the profile- CREATININE (CR), POST-TRANSPLANTATION DAYS (PTD), AND dependent technique, by which we obtain a mean reduction of HEMATOCRIT (HTO) FOR POST-TRANSPLANTATION DAYS t AND t − 1.
4% of SVs per class.
3) Statistical comparison: As we did in the prediction approach, we have analyzed the numerical but also statistical differences among classifiers and schemes. For this purpose, we have computed the statistical pairwise comparison of two classifiers through Z-scores [60]. In general, SVM-based models give better performance in terms of sensitivity, but MLP is (slightly) better in specificity. In particular, PD- SVM yields better (SE+SP) scores so it is more sensitive,specific and balanced classifier in all schemes. Statistical testsyielded Z scores higher than 1.96 for all classifiers, and thus past value of dosage, CyA blood concentration and creatinine results are significant and classifications are better than random level. In a second level of relevance, we find the post- choice. An interesting result is that the PD-SVM and MLP are transplantation days along with past hematocrit and creatinine preferred statistically when working in one-against-all schemes clearance levels. By analysing the sign of DE and AAG we (ZPD-SVM = 5.38, ZSVM = 3.00, ZMLP = 4.12) than in one- can conclude that, on average, an increase in past dosage against-one schemes (ZPD-SVM = 3.00, ZSVM = 2.60, ZMLP = produces an increase in future CyA blood concentration, which 2.62), in which no appreciable differences appear. Performing indicates that model captures correctly this issue. On average, pairwise statistical comparisons, one can conclude that only lower creatinine levels are associated to an increase in CyA the PD-SVM is significantly different than the other classifiers blood concentration. These results agree with those obtained in one-against-all scheme, and no statistical differences appear when other machine learning approaches [19] and NONMEM in the one-against-one schemes. These results match the ones modeling (ANOVA and univariate analysis methods) [20], [21] shown in [61], in which the authors pointed out that, in some ocassions, a one-against-all scheme can be as accurate as any 2) Distribution of the Support Vectors: SVMs have demon- strated to be well-suited techniques in classification and re-gression tasks. An additional advantage also arises from theiruse: the solution is expressed as a linear combination of some D. Models analysis instances and, thus, their analysis offers some knowledge gain Knowledge discovery is defined as "the process of identify- about the problem. Indeed, the final model is a good com- ing valid, novel, potentially useful, and ultimately understand- promise between accuracy (almost 80% of predictions with able structure in data" [62]. The scientific community is not errors under 20%) and simplicity (24% of samples become only searching for methods that provide accurate estimations support vectors). Support vectors are mainly scattered around of the underlying system function, but for methods that also CyA blood levels of 320 ng/mL (±100, p > 0.05) and in explain those complex, and often non-linear, relationships patients who weigh more than 45 Kg and who are over 50 from the input-output mapping performed by the models. In this paper, sensitivity analyses for the MLP and insight on SV We also compared distributions of the entire training set distribution for the PD-SVR have been used in order to gain and the obtained support vectors using Principal Component knowledge about the problem.
Analysis (PCA). After their respective diagonalisation and 1) Sensitivity analysis: Sensitivity analysis is used to study standardising, we evaluated the scatter degree in every subset the influence of input variables on the dependent variable and as a measure of the distance, d(i, j), among the eigenvectors, consists of evaluating the changes in training error that would vi and vj, weighted by their eigenvalues, λi and λj:
result if an input were removed from the model. This measure, commonly known as delta error in the literature, produces d(i, j) = λivi − λjvj a valuable ranking of the relevance of the variables. Two This distance was averaged over all possible pairs (i, j) of additional sensitivity measures, which are based on perturbing different eigenvectors. No geometrical differences were found an input and monitoring network outputs, can be computed: between the original ( ¯ DT ± σDT: 6.22±3.14) and the SV
the Average Gradient (AG) and the Average Absolute Gradient data set (6.44±3.64), which suggests that SVs scatter in a (AAG). All these measurements are extensively described in way similar to that for the whole data set and, consequently, reveals that a robust solution has been achieved.
In Table IV, different rankings in accordance with these measurements are shown for the MLP. Only the top seven V. DISCUSSION AND CONCLUSIONS relevant inputs are shown. Several conclusions can be drawn.
In this paper, we have presented time series prediction The most informative variables considered by the model are and classification approaches for a complex TDM problem.



IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS We have compared state-of-the-art support vector machinesand neural networks. Model comparison has been carried outin terms of accuracy and robustness. A novel kernel-basedapproach has been presented, which allows the incorporationof a priori knowledge and improves results in both approaches.
Finally, we have analyzed model structure by performingsensitivity analyses on the best MLP model and inspecting thedistribution of the support vectors on the best SVM model.
These methods not only provide a ranking of relevant vari- ables, but also constitute a methodology for model assessment.
The prediction of immunosuppressive blood concentrations Windows of the application for predicting the CyA blood concentra- is a challenging issue and leads to difficulties in selecting the optimum dose drug to avoid graft rejection and minimizeadverse effects. Intensive drug monitoring is necessary in order and tailored model.
to keep blood concentrations within the proper range. In this With regard to the classification problem, both one-against- context, we have presented the formulation of state-of-the-art all and one-against-one schemes have been attempted. Models models that could help to individualize the CyA posology.
could be useful for clinicians in designing dosage regimens Blood concentration models have been built to achieve ac- to avoid toxic and subtherapeutic cyclosporine ranges and, in curacy and robustness. By predicting concentration instead of turn, to reduce costs to the Health Care System. The SVM dosage we achieve objectivity and usefulness since the latter is method performs slightly better than the MLP in both schemes based on a certain protocol of dosage administration that could regarding sensitivity rates and AUC. Once again, designing disturb the final goal of TDM. In fact, this approach attempts specific penalisation profiles have produced better results.
to assist the health care team in dosage individualisation since Based on these outcomes, the application of SVM in the physicians could take the blood concentration estimation as context of TDM can become a clinically useful tool. In this a helpful guide for dose administration. In [10], [12], we sense, we implemented the best model in an easy-to-use presented a scheme of two chained models where the CyA computer program in order to aid in the individualisation of blood concentration predicted by the concentration prediction dosage and pharmacotherapeutical attention (Fig. 8), which model constituted an input to the dosage prediction model.
in turn brings state-of-the-art models closer to clinicians This system could serve as a dosage guide to the clinician, [65]. The main limitation encountered in this work is due but it presented two basic problems. First, dosing follows to the group's location. Since patients were all from the a therapeutic guideline, which makes predicting dosage an same nephrology units, they had a series of characteristics in indirect way of predicting doctors protocol. Second, from our common. For example, the treatment guidelines and protocol particular point of view and experience, a direct translation administration for the patients were similar, which means that of concentration to dosage could influence, or even replace, extrapolations to other centres should be treated with caution.
the doctor's own decision. A decision-support system can be Furthermore, a strict test should be performed before using defined as "a computer-based algorithm that assists a clinician the application in new situations. This, however, should not with one or more component steps of the diagnostic process" prevent the use of SVM methodology in other nephrology [64], and thus, it should only aid doctors, rather than influence units, where they should be implemented taking into account them or substitute them. In contrast, the scheme presented the local population characteristics and dosing protocols.
in this paper provides the clinician with different signals, Further studies are necessary in order to explore statis- rankings, and follow-up information.
tical differences between methods, the influence of clinical Despite the fact that the results obtained for the blood covariates, and the expansion of the predictive performance concentration prediction are acceptable, they are inferior to up to long-term follow-up. However, there is no doubt that those for classification purposes. Nevertheless, its joint consid- the appearance of new protocols based on two-hour post- eration could be a valuable help in TDM. The best results for dosing monitoring (C the CyA blood concentration prediction were obtained using 2h) constitutes the new cornerstone in CyA TDM. At present, there are only limited C the PD-SVR, where an ME of 0.36 ng/mL and a RMSE of 2h data in our hospital, but in a few years, we expect a substantial amount 52.01 ng/mL were observed in the validation set. Our results to be collected. The poor preliminary results obtained with clearly improve a previous work that followed the time series neural networks and ARMA modeling [66] encourage the use methodology [26], in which an MLP with lagged inputs was of SVM in this new application.
used to predict CyA levels in renal allograft recipients andthe results were not optimal (bias: 25 ng/mL, precision: 74ng/mL in the test set). From a statistical point of view, there are no significant differences between the neural and kernel The authors want to express their gratitude to Prof. An- models developed in our work. However, from our analysis tonio J. Serrano-L´opez (Universitat de Val encia, Spain) for of model robustness, we can conclude that although dynamic his useful discussions and references on Decision-Support neural models give good results, PD-SVM yields a more robust Systems, to Sergio S´anz (TISSAT S.A., Spain) for his valuable solution, which is a direct consequence of using a regularized help in software development, to Prof. ´ Angel Navia-V´azquez CAMPS-VALLS ET AL.
(Universidad Carlos III, Spain) for his useful comments on [19] G. Camps-Valls, "Redes neuronales y m´aquinas de vectores soporte para recurrent networks for uneven sampling problems, and to la predicci ´on y modelizaci ´on de la concentraci ´on valle de ciclosporinaA (CyA) en pacientes con trasplante renal," Ph.D. dissertation, Depar- Dr. Bego˜na Porta-Oltra (Pharmacy Service of the Dr. Peset tament d'Enginyeria Electr onica. Universitat de Val encia, July 2002, University Hospital of Val encia, Spain) for the stimulating clinical discussions and careful data collection.
[20] B. Porta, J. J. P´erez-Ruixo, N. V. Jim´enez, A. Sancho, and L. M.
Pallard ´o, "Individualizaci ´on posol ´ogica de ciclosporina en pacientes contrasplante renal: propuesta de un modelo farmacocin´etico en predicci ´on." Farmacia Hospitalaria, vol. 22, no. 4, pp. 181–187, 1998.
[21] B. Porta, "Modelado farmacocin´etico de ciclosporina en pacientes con [1] P. Belitsky, "Neoral used in the renal transplant recipient," Transplant- trasplante renal," Ph.D. dissertation, Departament de Farm acia Hospita- ation Proceedings, vol. 32, no. 3A Suppl. Review., pp. S10–S19, May laria i Gal enica. Universitat de Val encia., 2002.
[22] J. F. Hair, R. E. Anderson, R. L. Tatham, and W. C. Black, Multivariate [2] L. A., "Factors influencing the pharmacokinetics of cyclosporine in Data Analysis, 5th ed. New Jersey, U.S.A.: Prentice-Hall International, man," Therapeutic Drug Monitoring, vol. 13, no. 6, pp. 465–477, Nov [23] B. D. Kahan, W. G. Kramer, C. A. Wideman, S. M. Flechner, M. Lorber, [3] J. Parke and B. G. Charles, "NONMEM population pharmacokinetic and C. T. van Buren, "Demographics factors affecting the pharmacoki- modeling of orally administered cyclosporine from routine drug mon- netics of cyclosporine estimated by radioinmunoassay," Transplantation, itoring data after heart transplantation," Therapeutic Drug Monitoring, vol. 41, pp. 459–464, 1986.
vol. 20, no. 3, pp. 284–293, Jun 1998.
[24] A. S. Weigend and N. A. Gershenfeld, Time Series Prediction. Forecast- [4] B. Charpiat, I. Falconi, V. Br´eant, R. W. Jellife, J. M. Sab, C. Ducerf, ing the Future and Understanding the Past. Proceedings of the NATO N. Fourcade, A. Thomasson, and J. Baulieux, "A population pharmacoki- Advanced Research Workshop on Comparative Time Series Analysis held netic model of cyclosporine in the early postoperative phase in patients in Santa Fe, New Mexico, May 14–17, 1992. Proceedings Volume XV.
with liver transplants, and its predictive performance with Bayesian Addison–Wesley, 1994, vol. XV.
fitting," Therapeutic Drug Monitoring, vol. 20, pp. 158–164, 1998.
[25] S. Haykin, Neural Networks: A Comprehensive Foundation, 3rd ed.
[5] M. E. Brier, J. M. Zurada, and G. R. Aronoff, "Neural network predicted New Jersey, U.S.A.: Prentice Hall, 1999.
peak and trough gentamicin concentrations," Pharmaceutical Research, [26] M. E. Brier, "Empirical pharmacokinetic predictions for cyclosporine vol. 12, no. 3, pp. 406–412, 1995.
using a time series neural network," Pharmaceutical Research, vol. 12, [6] A. S. Hussain, R. D. Johnson, N. N. Vachharajani, and R. W. A., no. Suppl. S363, 1995.
"Feasibility of developing a neural network for prediction of human [27] M. Nørgaard, O. Ravn, and N. Poulsen, "NNSYSID & NNCTRL – pharmacokinetic parameters from animal data," Pharmaceutical Re- tools for system identification and control with neural networks," IEE search, vol. 10, no. 3, pp. 466–469, Mar 1993.
Computing & Control Engineering Journal, vol. 12, no. 1, pp. 29–36, [7] P. Veng-Pedersen and N. Modi, "Application of neural networks to pharmacodynamics," Journal of Pharmaceutical Sciences, vol. 82, pp.
[28] L. Ljung, System Identification. Theory for the user, 2nd ed.
918–926, 1993.
Jersey, U.S.A.: Prentice-Hall International, Inc., 1999.
[8] A. E. Gaweda, A. A. Jacobs, M. E. Brier, and J. M. Zurada, "Pharma- [29] E. A. Wan, "Finite Impulse Response neural networks with appli- codynamic population analysis in chronic renal failure using artificial cations in time series prediction," Ph.D. dissertation, Department of neural networks––a comparative study," Neural Networks, vol. 16, no.
Electrical Engineering. Stanford University, November 1993, available 5-6, pp. 841–845, 2003.
[9] S. Hirankarn, C. Downs, W. Street, and R. A. Herman, "Prediction of [30] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme two ranges of cyclosporine level (subtherapeutic and toxic) using feature recognition using time-delay neural networks," IEEE Transactions on subset selection and artificial neural networks," in AAPS Annual Meeting, Acoustics, Speech, and Signal Processing, vol. 37, pp. 328–339, 1989.
vol. 2 (2), Orlando, U.S.A., 2000, abstract 1274.
[31] E. A. Wan, "Modeling nonlinear dynamics with neural networks: Exam- [10] G. Camps-Valls, E. Soria-Olivas, B. Porta-Oltra, J. J. P´erez-Ruizo, J. D.
ples in time series prediction," in Proceedings of the Fifth Workshop on Mart´ın-Guerrero, A. J. Serrano-L ´opez, and N. V. Jim´enez-Torres, "A Neural Networks: Academic/Industrial/NASA/Defense, WNN93/FNN93., neural approach to cyclosporine dose prediction," World Congress on San Francisco, U.S.A., November 1993, pp. 327–332, available at Medical Physics and Biomedical Engineering, July 2000.
[11] G. Camps-Valls, E. Soria-Olivas, J. D. Mart´ın-Guerrero, J. J. P´erez- [32] B. de Vries and J. C. Principe, "The Gamma model – a new neural Ruixo, and N. V. Jim´enez-Torres, "Neural networks ensemble for model for temporal processing," Neural Networks, vol. 5, no. 4, pp.
cyclosporine concentration monitoring," in International Conference on 565–576, 1992.
Artificial Neural Networks, vol. 2130.
Vienna, Austria: Lecture Notes [33] J. C. Principe, B. deVries, and P. G. deOliveira, "The gamma filter in Computer Science. Springer–Verlag., Aug 2001, pp. 706–711.
– A new class of adaptive IIR filters with restricted feedback," IEEE [12] G. Camps-Valls, B. Porta-Oltra, E. Soria-Olivas, J. D. Mart´ın-Guerrero, Transactions on Signal Processing, vol. 41, no. 2, pp. 649–656, Feb A. J. Serrano-L ´opez, J. J. P´erez-Ruixo, and N. V. Jim´enez-Torres, "Pre- diction of cyclosporine dosage in patients after kidney transplantation [34] J. C. Principe, B. de Vries, J. Kuo, and P. Guedes-de Olivera, "Mod- using neural networks," IEEE Transactions on Biomedical Engineering, eling applications with the focused Gamma net," in Neural Informa- vol. 50, no. 4, pp. 442–448, April 2003.
tion Processing Systems, NIPS, 1991, pp. 143–150, available from [13] G. Camps-Valls, E. Soria-Olivas, J. P´erez-Ruixo, A. Art´es-Rodr´ıguez, F. P´erez-Cruz, and A. Figueiras-Vidal, "Cyclosporine concentration [35] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14, pp.
prediction using clustering and Support Vector Regression methods," 179–211, 1988.
IEE Electronics Letters, vol. 38, no. 6, pp. 568–570, June 2002.
[14] M. Oellerich, V. W. Armstrong, B. Kahan, L. Shaw, D. W. Holt, [36] V. N. Vapnik, Statistical Learning Theory.
Wiley, 1998.
R. Yatscoff, A. Lindholm, P. Halloran, K. Gallicano, and K. Wonigeit, [37] B. Sch ¨olkopf and A. Smola, Learning with kernels.
MIT Press, 2002.
"Lake Louise consensus conference on cyclosporin monitoring in or- [38] B. Boser, I. Guyon, and V. N. Vapnik, "A training algorithm for optimal gan transplantation: report of the consensus panel," Therapeutic Drug margin classifiers," in Proc. 5th Ann. Workshop on Computational Monitoring, vol. 17, pp. 642–654, Dec 1995.
Learning Theory, D. Haussler, Ed.
ACM Press, 1992, pp. 144–152.
[15] T. A. S. Assays, "Manual analitique," Rundix Cedex, France: Laborato- [39] V. N. Vapnik, S. Golowich, and A. Smola, "Support vector method for ries ABBOTT, Division Diagnostic, XII-CYCLO-MONO-13.
function approximation, regression estimation, and signal processing," [16] T. Kohonen, Self-Organizing Maps, 3rd ed.
Springer Series in Infor- in Neural Information Processing Systems, M. Mozer, M. Jordan, and mation Sciences, Vol. 30, 2001.
T. Petsche, Eds.
Cambridge, MA: M.I.T. Press, 1997, pp. 169–184.
[17] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and [40] B. Sch ¨olkopf and A. Smola, Learning with Kernels – Support Vector Monterey, CA: Wadsworth and Brooks, 1984.
Machines, Regularization, Optimization and Beyond. MIT Press Series, [18] A. Abraham and D. Steinberg, "Is neural network a reliable forecaster on earth? a MARS query!" in Connectionist Models of Neurons, Learning [41] C.-W. Hsu and C.-J. Lin, "A comparison of methods for multiclass sup- Processes, and Artificial Intelligence, J. M. A. P. (Eds.), Ed.
port vector machines," IEEE Transaction on Neural Networks, vol. 13, Notes on Computer Science. LNCS2084. Springer-Verlag, 2001.
no. 2, 3 2002.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS [42] B. Sch ¨olkopf, K.-K. Sung, C. J. Burges, F. Girosi, P. Niyogi, T. Poggio, de Ingenier´ıa Biom´edica, CASEIB2000, Cartagena, Spain, Sep 2000, pp.
and V. N. Vapnik, "Comparing support vector machines with Gaussian kernels to radial basis function classifiers," IEEE Transaction on Signal [66] G. Camps-Valls, A. J. Serrano-L ´opez, B. Porta-Oltra, J. D. Mart´ın- Processing, vol. 45, no. 11, pp. 2758–2765, Nov. 1997.
Guerrero, E. Soria-Olivas, and N. V. Jim´enez-Torres, "Neural networks [43] J. Weston and C. Watkins, "Multi-class support vector machines," in for C2h cyclosporine concentration modelling," in 32nd European ESANN, 1999.
Symposium on Clinical Pharmacy, ESCP 2003., Val encia, Spain, Sep [44] U. H. G. KreBel, Pairwise classification and support vector machines, In: Advances in Kernel Methods: Support Vector Learning. Cambridge,MA, U.S.A.: The MIT Press, Cambridge, MA, 1999.
[45] V. N. Vapnik, Statistical Learning Theory.
New York: John Wiley & [46] B. Sch ¨olkopf, P. L. Bartlett, A. Smola, and R. Williamson, "Shrinking the tube: a new support vector regression algorithm," in Advances inNeural Information Processing Systems 11, M. S. Kearns, S. A. Solla,and D. A. Cohn, Eds.
Cambridge, MA: MIT Press, 1999, pp. 330 – [47] Y. Lin, Y. Lee, and G. Wahba, "Support Vector Machines for classi- fication in nonstandard situations," University of Wisconsin-Madison,Department of Statistics TR 1016, 2000.
[48] C.-F. Lin and S.-D. Wang, "Fuzzy support vector machines," IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 464–471, 2002.
[49] Q. Tao, G.-W. Wu, F.-Y. Wang, and J. Wang, "Posterior probability support vector machines for unbalanced data," IEEE Transactions onNeural Networks, vol. 16, no. 6, pp. 1561–1573, 2005.
[50] F. P´erez-Cruz, "M´aquina de vectores soporte adaptativa y compacta," Ph.D. dissertation, Dpto. Teor´ıa de la Se ˜nal y Comunicaciones. Univer-sidad Carlos III de Madrid., Dec. 2000.
[51] G. Camps-Valls, E. Soria-Olivas, J. P´erez-Ruixo, A. Art´es-Rodr´ıguez, F. P´erez-Cruz, and A. Figueiras-Vidal, "A profile-dependent kernel-basedregression for cyclosporine concentration prediction," in Neural Infor-mation Processing Systems, NIPS 2001. Workshop on New Directionsin Kernel-Based Learning Methods, December 2001.
[52] J. D. Mart´ın-Guerrero, G. Camps-Valls, E. Soria-Olivas, A. J. Serrano- L ´opez, J. J. P´erez-Ruixo, and N. V. Jim´enez-Torres, "Dosage individ-ualization of erythropoietin using a profile-dependent support vectorregression," IEEE Transactions on Biomedical Engineering, vol. 50,no. 10, pp. 1136–1142, June 2003.
[53] A. N. Refenes, Y. Bentz, D. W. Bunn, A. N. Burgess, and A. D. Zapranis, "Financial time series modeling with discounted least squares back-propagation," Neurocomputing, vol. 14, p. 123 –138, 1997.
[54] F. E. H. Tay and L. J. Cao, "Modified support vector machines in financial time series forecasting," Neurocomputing, vol. 48, pp. 847–861, 2002.
[55] G. Camps-Valls, A. Chalk, A. Serrano-Lopez, J. D. Martin-Guerrero, and E. Sonnhammer, "Profiled support vector machines for antisenseoligonucleotide efficacy prediction," BMC Bioinformatics, no. 5, p.
135, available in OpenAccess: http://www.biomedcentral.com/1471-2105/5/135.
[56] G. G ´omez-P´erez, G. Camps-Valls, J. Guti´errez, and J. Malo, "Perceptual adaptive insensitivity for support vector machine image coding," IEEETransactions on Neural Networks, vol. 16, no. 6, pp. 1574–1581, 2005.
[57] S. S. Keerthi and C.-J. Lin, "Asymptotic behaviors of support vector machines with gaussian kernel," Neural Computation, vol. 15, no. 7,pp. 1667–1689, 2003.
SVM and the training of non-PSD kernels by SMO-type meth-ods," National Taiwan University, Department of Computer Sci-ence and Information Engineering, Tech. Rep., 2003, available athttp://www.csie.ntu.edu.tw/cjlin/papers/tanh.pdf.
[59] C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, pp. 273 – 297, 1995.
[60] R. Congalton and K. Green, Assessing the Accuracy of Remotely Sensed Data. Principles and Practices, 1st ed.
U.S.A.: CRC Press, 1999.
[61] R. Rifkin and A. Klautau, "In defense of one-vs-all classification," Journal of Machine Learning Research, vol. 5, no. 1, pp. 101–141, 2004.
[62] P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, "Mathematical pro- gramming for data mining: formulations and challenges," MathematicalProgramming Technical Report 98-01, Computer Sciences Department,University of Winsconsin, WI, Tech. Rep. MSR-TR-98-04, Jan 1998.
[63] G. B. Orr and K.-R. M ¨uller, Neural Networks: Tricks of the Trade.
Springer-Verlag, Berlin, Heidenberg, 1998.
[64] E. S. Berner, Clinical Decision Support Systems. Theory and Practice, New-York: Springer–Verlag, 1999.
[65] S. S´aez, E. Soria-Olivas, G. Camps-Valls, J. D. Mart´ın-Guerrero, A. J. Serrano-L ´opez, and N. V. Jim´enez-Torres, "Aplicaci ´on inform´aticabasada en redes neuronales temporales para problemas de farma-cocin´etica cl´ınica." in XVIII Congreso Anual de la Sociedad Espa˜nola

Source: http://www.tsc.uc3m.es/~fernando/IEEESMC.pdf

info.aee.net

the business voice of advanced energy ILLINOIS' ELECTRIC POWER SYSTEM AND THE CLEAN POWER PLAN The U.S. Environmental Protection Agency (EPA) will soon release the final rule for carbon emissions from existing power plants, called the Clean Power Plan (CPP). The rule represents the next step in the process of carbon regulation that began with the Supreme Court's determination in 2007 that carbon dioxide (CO )

ibc7.net

Open Access REPORT ON NEGATIVE RESULT Bacterial Hash Function Using DNA-Based XOR Logic Reveals Unexpected Behavior of the LuxR PromoterBrianna Pearson1,‡, Kin H. Lau1,‡, Alicia Al en2, James Barron1,3, Robert Cool2, Kel y Davis4, Wil DeLoache1, Erin Feeney1, Andrew Gordon2, John Igo5, Aaron Lewis5, Kristi Muscalino4, Madeline Parra4, Pal avi Penumetcha1, Victoria G. Rinker1,6,