## Tsc.uc3m.es

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
Therapeutic Drug Monitoring of Kidney Transplant
Recipients using Profiled Support Vector Machines
Gustavo Camps-Valls,

*Member, IEEE, *Emilio Soria-Olivas, Juan J. P´erez-Ruixo,
Fernando P´erez-Cruz,

*Member, IEEE, *Antonio Art´es-Rodr´ıguez,

*Member, IEEE, *N. V´ıctor Jim´enez-Torres

**Abstract****— This work proposes a twofold approach for thera-**
transplantation. At present, despite the appearance of new

**peutic drug monitoring (TDM) of kidney recipients using Support**
formulations, 90% of therapeutic guidelines are based on CyA

**Vector Machines (SVM), for both predicting and detecting**
and, consequently, costs continue to rise year after year 1.

**Cyclosporine A (CyA) blood concentrations. The final goal is**
Recently, important advances in dose formulation, therapeutic

**to build useful, robust, and ultimately understandable models**

for individualising the dosage of CyA.
drug monitoring (TDM) and guidelines, and the emerging

**We compare SVM with several neural network models, such as**
role of CyA-based combined therapies have resulted in a

**the multilayer perceptron (MLP), the Elman recurrent network,**
substantial improvement in clinical outcomes in renal trans-

**FIR/IIR networks, and Neural Network ARMAX approaches.**
plant recipients [1]. Nevertheless, CyA is generally consid-

**In addition, we present a profile-dependent SVM (PD-SVM),**
ered to be a critical dose drug. Underdosing may result in

**which incorporates ***a priori ***knowledge in both tasks. Models**

are compared numerically, statistically, and in the presence of
graft loss and overdosing causes kidney damage, increases

**additive noise. Data from fifty-seven renal allograft recipients**
opportunistic infections, systolic and diastolic pressure, and

**were used to develop the models. Patients followed a standard**
cholesterol. Moreover, the pharmacokinetic behaviour of CyA

**triple therapy and CyA trough concentration was the dependent**
presents a substantial inter- and intra-individual variability,
which appears to be particularly evident in the earlier post-

**The best results for the CyA blood concentration prediction**
**were obtained using the PD-SVM (mean error of 0.36 ng/mL and**
transplantation period, when the risk and clinical consequences

**root-mean-square-error of 52.01 ng/mL in the validation set) and**
of acute rejection are higher than in stable renal patients

**appeared to be more robust in the presence of additive noise.**
[2]. Several factors such as clinical drug interactions and

**The propose PD-SVM improved results from the standard SVM**
patient compliance can also significantly alter blood CyA

**and MLP, specially significant (both numerical and statistically)**
concentrations and, thus, intensive TDM of CyA becomes

**in the one-against-all scheme. Finally, some clinical conclusions**

were obtained from sensitivity rankings of the models and
necessary; however, it influences the patient's quality of life

**distribution of support vectors. We conclude that the PD-SVM**
and the cost of the care.

**approach produces more accurate and robust models than neural**
Since the trough blood concentration has traditionally been

**networks. Finally, a software tool for aiding medical decision-**
used to monitor CyA therapy, mathematical models that are

**making including the prediction models is presented.**
capable of predicting the future concentration of CyA and

**Index Terms****— Cyclosporine, therapeutic drug monitoring,**
adjusting the optimal dosage become necessary. Population

**neural networks, support vector machines, sensitivity analysis,**
pharmacokinetic models and Bayesian forecasting have been
used to predict CyA blood concentrations, but their per-formance was not optimal [3], [4]. These models predict
plasma drug concentrations based on theoretical models ofdrug distribution and elimination but they often fail when
Cyclosporine A (CyA) is still the cornerstone of immuno-
the underlying principles are not sufficiently understood or
suppression in renal transplant recipients. This immunosup-
known to be encoded into a set of relationships [5]. In fact,
pressive drug shortens average hospital stays after kidney
despite convincing results in many areas, few attempts have
Manuscript received December 2004;
been made to use neural networks to predict drug behaviour
G. Camps-Valls and E. Soria-Olivas are with Grup de Processament Digital
[6]–[8]. A different approach to TDM of kidney recipients has
de Senyals, GPDS. Dept. Enginyeria Electr onica. Escola T ecnica Superior
recently been presented [9], in which the goal is the detection
d'Enginyeria. Universitat de Val encia. C/ Dr. Moliner, 50. 46100 Burjassot(Val encia) Spain. E-mail:

[email protected].

of subtherapeutic and toxic levels. This could aid physicians
F. P´erez-Cruz and A. Art´es-Rodr´ıguez are with Departmento de Teor´ıa de la
by providing "alarm signals" for risks that threaten patient
Se ˜nal y Comunicaciones, Universidad Carlos III de Madrid, 28911 Legan´es,
evolution. Nevertheless, poor results were achieved regarding
Madrid, Spain.

J. J. P´erez-Ruixo was with the Pharmacy Service at the Dr. Peset University
subtherapeutic levels detection, which could lead to dramatic
Hospital, Val encia (Spain), when this paper was prepared. Currently, he is with
kidney rejection processes.

the Advanced PK/PD Modeling & Simulation. Global Clinical Pharmacoki-
All these limitations, in both prediction and classification
netics and Clinical Pharmacodynamics Division. Johnson & Johnson Pharma-ceutical Research & Development, a Division of Janssen Pharmaceutica N.V.

methods, have led us to try to solve the problem of TDM using
Beerse (Belgium). E-mail:

[email protected].

modern neural networks and Support Vector Machines (SVM),
Prof. N. V. Jim´enez-Torres is with the Pharmacy and Pharmaceutical
Technology Department, Universitat de Val encia (Spain) and with the Phar-macy Service at the Dr. Peset University Hospital, Val encia (Spain). E-mail:
1The total cost in 1997 of CyA in Spain was 52 million and 12 billion
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS

*Decision Support System*
Fig. 1.

*Learning scheme considered in this paper. Three main blocks are developed to obtain a validated decision support system (DSS). In the pre-processingstage, data are previously "whitened" through Principal Component Analysis (PCA). Afterwards, two approaches are considered for modeling; time seriesprediction of the CyA blood concentration and identification of subtherapeutic and toxic levels. After developing the models, we follow a methodology forextracting valid, novel and potentially useful knowledge from the model by using sensitivity analysis and by inspecting support vectors. In addition, we comparemodels through accuracy and robustness tests.*
which have shown good results in other fields. Therefore, two
and receiver-operating curves (ROC).

main objectives can be distinguished in this work:
In addition, we present the so-called

*Profile-Dependent *SVM

*• Prediction of CyA trough levels. *This approach tries
(PD-SVM) to incorporate

*a priori *knowledge in both the
to predict values of CyA blood concentration from the
prediction and the classification models. The PD-SVM gives
previous values by following a time-series methodology.

different confidence weights to different parts of the training
This same approach was followed in previous communi-
data to focus the training on the

*a priori *most important
cations [10]–[12] but only short time series and a reduced
regions, the most recent samples for time series prediction, and
population were available. In the present paper, we extend
samples near the decision thresholds for levels classification.

these works by developing more neural and kernel pre-
Finally, we perform sensitivity analyses on the MLP and
dictors using a larger database. We compare SVM with
inspect the distribution of the support vectors in order to
extensively used neural networks in other fields, such as
gain knowledge about the problem. This paper constitutes
the multilayer perceptron (MLP), Elman recurrent net-
the natural extension of the works [12], [13], in which only
work, FIR/IIR networks, and Neural Network ARMAX
a limited number of prediction approaches was carried out
(NNARMAX) approaches. Comparison is carried out in
and no attention was paid to the structure of the "black-box"
terms of accuracy by evaluating classical bias-variance
models. In this paper, a common methodology consisting of
trade-off measurements, and in terms of robustness by
three basic steps is followed to develop a Decision Support
analysing performance when noise is introduced at input
System (DSS) for TDM based on modern techniques. A
general learning scheme for our proposal is illustrated in Fig.

*• Prediction of CyA levels class. *Due to the high inter- and
1. The basic characteristics of a general DSS are ensured
intra-subjects variability, non-uniform sampling, and non-
in our case study as follows: (a) the system is robust, as it
stationarity of the time series, the difficulty of prediction
incorporates models that show good performance in noisy
task is well known. In fact, intensive TDM tries to keep
environments; (b) the system is accurate and unbiased, as
CyA blood levels in the therapeutical range (usually
models show good bias-variance trade-offs; and (c) the system
established in the range 150-400 ng/mL) by making
incorporates

*a priori *knowledge (profiled model) and allows
adjustments in the patient's drug regimen. Therefore, an
the user to extract rules for dosage adjustment (sensitivity
alternative approach to the time-series prediction method-
analysis). Rather than providing the clinician with dosage
ology consists of identifying future toxic (

*>*400 ng/mL)
prediction, the system gives an estimation of future CyA blood
and subtherapeutical levels (

*≤*150 ng/mL). Two different
concentration, alarm signals for subtherapeutic or toxic levels,
schemes are used for that purpose: one-against-all clas-
feature ranking, and follow-up statistical information.

sifiers and multi-classifiers. We compare performance of
The rest of the paper is organized as follows. Data collection
SVM and MLP in both schemes with recognition rates
is introduced in Section II. Section III presents a review of the
CAMPS-VALLS ET AL.

predictive techniques used with special emphasis on SVM. The
[19]. The obtained results showed a high degree of
results are presented in Section IV. A discussion is provided in
concordance with NONMEM and linear modeling [20],
Section V along with some concluding remarks and a proposal
[21]. From this study, we obtained a reduced subset of
for further work.

eleven relevant patient factors that, along with dosage,CyA blood concentration, and post-transplantation days,
II. PATIENTS AND DATA COLLECTION
were used to build the models. Some basic populationstatistics of these factors are shown in Table I. It is worth
Sixty-seven renal allograft recipients treated in the Nephrol-
noting that almost all variables have significant non-
ogy Service of the Dr. Peset University Hospital in the city
normal distributions (

*z*-values for skewness and kurtosis
of Val encia (Spain) were initially included in this study.

greater than

*±*3

*.*08,

*p < *0

*.*001). This problem was
The exclusion criteria took into account patients with grave
addressed applying suitable transformations to raw data
affection of other vital organs, active neoplasia or metastasis
risk, active infections, presence of HIV virus, presence of uri-nary or vascular abnormalities, and patients older than 70. In
Two-thirds of the patients were used to train the models
addition, patients who did not fulfil the prescribed posology or
and the rest were used for their validation using the hold out
who received metabolic inducers or inhibitors were excluded
method. Population was randomly assigned to two groups: 39
because they modify the pharmacokinetic profile of CyA.

patients (665 patterns) were used for training the models and
Patients received a standard immunosuppressive regimen
18 patients (427 patterns) for their validation. This process was
(triple therapy basis) with a microemulsion formulation of
repeated until three basic conditions were met: (1) variables
CyA (Sandimmun Neoral), mycophenolate mofetil (2 g/d),
should take variations in mean and variance between training
and prednisone (0.5-1 mg/kg/d). The initial oral dose of CyA
and validation sets lower than 15%; (2) each subset should
(5 mg/kg b.i.d) was reduced according to the measured CyA
contain an approximately similar proportion of male and
blood concentration and the desired target range (150-300
female patients; and (3) patients who were monitored for
ng/mL) [14]. Steady state blood samples were withdrawn 12-
longer period of time were assigned to the validation set. When
14 hours after dose administration. CyA blood levels were
the three conditions could not be fulfilled simultaneously,
measured by a specific monoclonal fluorescence polarisation
condition (3) was adopted and (2) relaxed. This three-step
immunoassay (Abbott, TDx), with inter- and intra-assay vari-
randomising methodology ensures balanced datasets to avoid
ation coefficients of less than 7.5% [15].

population differences that could bias the models [23].

The study collected many potentially relevant variables of
Concentration (ng/mL)
all monitored patients:

*• Anthropemetrical factors: *weight (Kg), age (years), and
Early post−transplantation days:
high inter−subjects variability.

*• Biochemical factors: *urea (mg/dL), creatinine (mg/dL),
creatinine clearance (mL /min), total protein and albumin
Stationary state after the thirdpost−operatory month.
(g/dL), bilirubin (mg/dL), cholesterol and triglicerides
Therapeutic target range: 150−300 ng/mL

*• Hepathical enzymes: *aspartate aminotransferase (IU/L),
transpeptidase (IU/L), and alkaline phosphatase (IU/L).

*• Hemathological factors: *hematocrit (%), haemoglobin
(g/dL), and leukocites (U /mm3).

*• Clinical factors: *systolic and diastolic arterial pressure
Post−transplantation days (d)
At this point, two analyses were performed:
Distribution of blood concentration (ng/mL) of CyA. The solid
line represents a specific patient profile. Dashed lines indicate the desired
1) We detected 10 patients who contained more than 10%
(therapeutic) target range (150–300 ng/mL).

of statistical outliers (more than two standard deviationsfrom the mean) in the original data distribution of many
The health care team in the hospital decides daily the
variables. In fact, when considering all available descrip-
next dose to administer by assessing the patients' factors and
tors, 23% of samples were outliers. These patients were
their evolution. This protocol, nevertheless, produces three
withdrawn for developing the models and hence, a final
undesired features in the time series:
cohort of fifty-seven patients was considered.

1)

*High variability. *High inter-subject variability is found
2) At the same time, we analyzed the available data us-
(coefficient of variation, CV=31%), especially remark-
ing classical statistical analyses (correlation analysis,
able in the early post-transplantation days (PTD) (CV
normality plots, higher-order statistics), Principal Com-
= 43%). In this period, it becomes necessary to raise
ponent Analysis (PCA), Self-Organising Maps (SOM)
or lower dosage while closely monitoring the patient's
[16], Classification and Regression Trees (CART) [17],
concentration, as shown in Fig. 2.

and Multivariate AutoRegressive Splines (MARS) [18]
2)

*Non-stationarity. *Therapy tries to keep CyA levels in
in order to get a preliminary subset of relevant features
the target range in order to avoid nephrotoxicity or
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
CHARACTERISTICS OF PATIENTS IN THE STUDY FOR THE TRAINING AND THE VALIDATION (IN BRACKETS) SETS. RESULTS ARE PRESENTED AS MEAN

*±*
STANDARD DEVIATION (SD) AND THE RANGE, EXCEPT FOR THE GENDER WHICH IS GIVEN AS THE NUMBER OF SUBJECTS. THE SHAPE DESCRIPTORS
(KURTOSIS AND SKEWNESS) ARE CALCULATED OVER THE WHOLE POPULATION.

21 (14) male and 18 (4) female
22.79 - 68.21 (28.82 - 69.43)
-0.26 (-3.49

*∗*)
2.29 (15.42

*∗*)
1 - 434 (2 - 426)
1.08 (14.58

*∗*)
3.07 (20.74

*∗*)

**CyA concentration (ng/mL)**
55.69 - 615.98 (45.79 - 664.17)
0.95 (12.80

*∗*)
4.33 (29.21

*∗*)

**Daily dosage/Weight (mg/Kg/d)**
0.71 - 12.04 (1.88 - 9.45)
0.75 (10.09

*∗*)
2.90 (19.54

*∗*)

**Urea (mg/dL)**
17 - 332 (22 - 279)
2.25 (30.42

*∗*)
8.44 (56.95

*∗*)
0.70 - 10.60 (1 - 10.20)
3.27 (44.14

*∗*)
15.38 (103.74

*∗*)

**Creatinine clearance (ml/min)**
7.77 - 117.42 (7.77 - 107.03)
2.56 (17.28

*∗*)

**Alkaline phosphatase (IU/L)**
62 - 683 (65 - 724)
1.96 (26.49

*∗*)
8.25 (55.65

*∗*)
0.22 - 6.10 (0.30 - 2)
3.66 (49.37

*∗*)
24.33 (164.12

*∗*)

**Hematocrit, HTO (%)**
22 - 55 (22 - 52)
0.32 (4.26

*∗*)
2.61 (17.58

*∗*)
The

*z *values are derived by dividing the statistics by the corresponding standard errors of
6

*/N *(skewness) and
Significant at the 0.001 level.

transplant rejection. This provokes the presence of non-
model, to make use of spatially-converted temporal patterns.

stationary processes in the time series.

This approximation was previously followed in [13], [26].

3)

*Non-uniform sampling. *The individualisation procedure
Neural Network ARMAX modeling is intimately related to
directly affects the sampling of the time series.

the latter approach. In a NNARMAX model [27], given a pair
To deal with these problems, some issues should be taken into
of input-output discrete time series, a multilayer perceptron
account. For example, non-stationarity was treated efficiently
(MLP) is used to perform a mapping between them, in which
with dynamic neural networks. The problem of non-uniform
past inputs, past outputs and past residuals can be fed into
sampling was initially addressed using the classical strategy of
the input layer. Selecting a model structure is much more
interpolation and resampling, but this produced overoptimistic
difficult in the nonlinear case than in the linear case (classical
results. Therefore, we decided to alleviate the problem by in-
ARMA modeling). Not only is it necessary to choose a set of
corporating post-transplantation days into the model

*globally*,
regressors but also a network architecture is required. There-
i.e. no time-series differences were thus undertaken. Finally,
fore, several regression structures are available. In this work,
some modifications on the training algorithms of the models
we use the ones that best fit to our interests; NNARMAX2
were carried out, as will be shown in the next section.

(the regression vector is formed by past inputs, past outputsand past residuals), NNSSIF (the regressor is in the form ofstate space innovations), and NNOE (the regression vector is
III. METHODS AND EXPERIMENTAL SETUP
formed by past inputs and estimations). All these models are

*A. Static and dynamic neural networks*
extensively described in [27]. The use of NNARMAX modelsin control applications and nonlinear system identification
A common approach to time series prediction is the AutoRe-
has expanded in the past years. Its main advantage is the
gressive Moving Average (ARMA) model. However, ARMA
use of a non-linear regressor (usually an MLP) working on
models are not suitable to our problem due to the non-linearity,
a fully tailored "state" vector [28]. This makes the model
non-uniform sampling, and non-stationarity of the time series.

specially well suited to our problem because we can design
Hence, many researchers have turned to the use of non-linear
the endowed input state vector to accommodate our non-
models, such as neural networks, in which few assumptions
stationary dynamics. This can be done, for example, by adding
must be made. The multilayer perceptron (MLP) is the most
more "memory" in the form of error terms if we observe that
commonly used neural network, which is composed of a
prediction error is not completely exploited.

layered arrangement of artificial neurons in which each neuron
In addition, there are two more approaches for introducing
of a given layer feeds all the neurons of the next layer. This
dynamic capabilities into a static neural network:
model forms a complex mapping from the

*n*-dimensional inputto the binary output,

*ψ *: R

*n −→ {*0

*, *1

*}*. For regression
1)

*Synapses as digital filters. *To substitute the static synap-
purposes, the MLP mapping has the form

*ψ *: R

*n −→ *R.

tic weights for dynamic connections, which are usually
However, it is a static mapping; there is no internal dynamics
linear filters. The FIR neural network models each
[24], [25]. This problem can be easily addressed by including
synapsis as a Finite Impulse Response (FIR) filter [29].

an array of unit-delay elements, called a tapped-delay line
There are striking similarities between this model and
CAMPS-VALLS ET AL.

the MLP. Notationally, scalars are replaced by vectorsand multiplications by vector products. These simple
analogies carry through when comparing standard back-propagation for static networks with

*temporal backprop-agation *for FIR networks [24]. FIR neural networksare appropriate to work in non-stationary environments
or when non-linear dynamics are observed in the timeseries because

*time *is treated naturally in the synapsis
The Optimal Decision Hyperplane (ODH) in a linearly separable
itself, that is, the networks have internal dynamics. In
problem. Maximizing the margin is equivalent to minimizing

**w**. Only

fact, they have demonstrated good results in problems
support vectors (stars) are necessary to define the ODH.

with those characteristics, such as speech enhancement[30], and time series prediction [31].

such as speech and language processing. Additionally,
In addition to the FIR network, we have used the gamma
Elman networks can result in efficient models to both
network [32], a class of Infinite Impulse Response
detect or generate time-varying patterns [35], hence, its
(IIR) filter-based neural network which includes a local
suitability to our problem.

feedback parameter. In this structure, the FIR synapsisthat uses the standard

*z*-Transform delay operator

*z −*1
Despite the fact that dynamic neural networks have been
is replaced by the so-called gamma operator
extensively used in areas such as signal processing and control,its use in biomedical engineering and medicine in general,

*G*(

*z*) =
and in TDM in particular, has received little attention. These

*z − *(1

*− µ*)

*,*
networks are introduced here to deal more efficiently with
where

*µ *is a real parameter which controls the memory
time-varying patterns. However, some serious difficulties have
depth of the filter. As pointed out in [33], gamma filters
been found in its application to our problem. The main limi-
are theoretically superior to standard FIR filters in terms
tations to the use of these networks are the need for long time
of number of parameters required to model a given
series and for an unconstrained number of filter parameters
dynamics. The filter is stable if 0

*< µ < *2, and

*G*(

*z*)
through the networks. Also, the use of these methods to predict
reduces to the usual delay operator for

*µ *= 1. This
CyA blood concentration becomes more complicated since
filter also provides an additional advantage: the number
we develop individual models rather than population ones.

of degrees of freedom (order

*K*) and the memory
This forces us to change the adaptation rules of the network
depth remain decoupled, as shown in [33]. A proposed
when data coming from a new patient is presented to the
measurement of the memory depth of a model, which
networks in each iteration (epoch). In these situations, we
allows us to quantify the past information retained, is
updated the corresponding internal states (contextual neurons
given by

*K/µ *and has units of time. Hence, values of
or filter coefficients in FIR/IIR synapses) of the network to

*µ *lower than the unit increase the memory depth of the
the same parameters the patient had in the previous epoch
filter. The use of the gamma structure in a neural network
and then we applied the usual updating rules. This procedure
can be two-fold: we could substitute each scalar weight
could be interpreted as a patient-based

*batch *learning. This
in an MLP with a gamma filter, or we could use a gamma
process produces oscillation of the training error, which was
unit delay line as the first layer of a classical MLP, which
alleviated with a correct choice of initialisation parameters,
yields the so-called

*focused gamma network*. The latter
and, in some cases, by using high values of the momentum
is the adopted approach in our work since it is more
simple and allows us to scrutinize the needed memorydepth for the problem by analysing the weights of this

*B. Support Vector Machines*
layer. The use of this network has focused attention forde-noising, communications, and time series prediction
Support vector machines (SVMs) are state-of-the-art tools
[34]. In general, the gamma network can deal efficiently
for linear and nonlinear input-output knowledge discovery
with complex dynamics and lower number of network
[36], [37]. The SVM was first proposed to solve nonlinear
parameters, which could make it

*a priori *well-suited to
binary classification in [38]. Since then it has been extended to
our problem since patient dynamics are local in the early
Regression [39] or to multiclass problems [36], among others.

In the following, we revise the implementations used in this
2)

*Recurrent networks. *To construct loops in the connec-
tions between neurons or layers of the network. The

*1) Support Vector Classifiers: *Given a labeled training data
Elman recurrent network is a simple recurrent model
set

*{*(

**x**1

*, y*1),

*. .*, (

**x***n, yn*)

*}*, where

**x***i ∈ *R

*N *and

*yi ∈*
with feedback connections around the hidden layer. In

*{−*1

*, *+1

*}*, and a nonlinear mapping

**φ**(

*·*), usually to a higher

this architecture, in addition to the input, hidden and
(possibly infinite) dimensional (Hilbert) space,

**φ **: R

*N −→ H*,

output units, there are also context units, which are
the SVM method solves:
only used to memorize the previous activations of the
hidden units [35]. The application of recurrent neural
networks has traditionally been linked to applications
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
where now

*yi ∈ *R, and

*ξi*,

*ξ∗i *and

*C *are, respectively, positiveslack variables to deal with training samples with a prediction

*yi*(

**φ**T (

**x***i*)

**w **+

*b*)

*≥ *1

*− ξi*
*∀i *= 1

*, . . , n*
error that is larger than

*ε *(

*ε > *0) and the penalisation applied

*∀i *= 1

*, . . , n*
to them. The usual procedure for solving SVRs introduces
where

**w **and

*b *define a linear classifier in the feature space.

the linear restrictions using Lagrange multipliers, computes
The non-linear mapping function

**φ **is performed in accordance

the Karush-Kuhn-Tucker conditions and solves Wolfe's dual
with Cover's theorem, which guarantees that the transformed
problem using quadratic programming procedures [45], [46].

samples are more likely to be linearly separable in the resulting

*3) The profile-dependent SVM (PD-SVM): *It is a common
feature space (see Fig. 3). The regularization parameter

*C*
practice in classification problems with unbalanced classes to
controls the generalization capabilities of the classifier and it
set different penalisation factors for each class, which are
must be selected by the user, and

*ξ*
usually proportional to the class priors [47], according to

*i *are positive slack variables
enabling to deal with permitted errors.

fuzzy-rules [48], or posteriors [49]. This approach combats
Due to the high dimensionality of vector variable

**w**, primal

the presence of false positives, which obviously produces
function (2) is usually solved through its Lagrangian dual
"unbalanced" models. This problem is specially relevant in
problem, which consists of solving
bioengineering applications and could prevent models from
their use in a real clinical environment. This way, the SVM
learns to classify patterns independently from the class they
belong to (we have control over its learning); something that is
not possible when using an overall penalisation

*C*. In the field
constrained to 0

*≤ αi ≤ C *and
of time series prediction, this approach can be extended by

*i αiyi *= 0,

*i *= 1

*, . . , n*,
where auxiliary variables

*αi *are Lagrange multipliers corre-
considering that the most recent samples contain, in principle,
sponding to constraints in (3). It is worth noting that all

**φ**
more information. Therefore, problems with non-stationary
mappings used in the SVM learning occur in the form of inner
processes can be alleviated using a different penalisation factor
products. This allows us to define a kernel function

*K*:
(or insensitivity zone) for each training sample

*t *according toa certain

*confidence function Ct *on the samples. This allows
the regression machine to follow, in principle, the probability
and then a non-linear SVM can be constructed using only
density function variations over time [50], [51]. In this paper,
the kernel function, without having to consider the mapping
we tailor specific profiles for each problem:

**φ **explicitly. Then, by introducing (6) into (5), the dual

*• Prediction of CyA trough levels. *This approach was
problem is obtained. After solving this dual problem,

**w **=

previously presented in [51] for CyA level prediction.

*i*=1

*yiαi***φ**(

**x***i*), and the decision function implemented by

In [13], profiles were defined in terms of clusters rather
the classifier for any test vector

**x **is given by

than fixed

*a priori *for the same problem and, in [52], we
designed profiles for another complex pharmacokinetic

*f *(

**x**) =

*sgn*
problem. These experiences suggested tailoring profiles

*iαiK*(

**x***i, ***x**) +

*b*
based on exponential memory decay functions, which hasalso been observed by other authors [53], [54]. Therefore,
where

*b *can be easily computed from the

*αi *that are neither
a good practice is to consider an exponential memory
nor

*C*, as explained in [40].

decay based on the confidence of past samples:
The SVM extension for multi-class problems is far from
being unique and none of them seems to be superior to the

*Ct *=

*λtn−ti, λ ∈ *[0

*, *1]
others [41]. In this paper, we have used the one-against-allimplementation and one-against-one approach. In the one-
where

*tn *is the actual time sample and

*ti *is the time
against-all approach, each class is compared with all the others
instant for sample

*i*. This profile reduces the penalisa-
together [42]. In the one-against-one scheme, the problem can
tion parameter and enlarges the

*ε*-insensitive region of
be casted directly as a generalisation of the binary classifica-
previous samples as new samples are obtained.

tion scheme [36], [43], [44].

*• Prediction of CyA levels class. *The same approach can

*2) Support Vector Regressor (SVR): *The Support Vector
be used in a classification task by increasing penalisation
Regressor (SVR) is the support vector implementation for
near the decision borders (150 ng/mL and 400 ng/mL) to
regression and function approximation. Following the previous
avoid false detections. With regard to the formulation, one
notation, SVR methods, find the minimum of:
has only to substitute the standard penalisation parameter

*C *with a time-dependent penalisation

*Ct*, as follows:
(

*ξi *+

*ξ∗*
150

*,t*+1 = 150 2

*Ct *+

*k*1

*Ct−*1 +

*ko*]
with respect to

**w**,

*ξ*
*i*,

*ξ∗*
*i *and

*b*, subject to:

*C*400

*,t*+1 =
400

*k*2

*Ct *+

*k*1

*Ct−*1 +

*ko*]

*yi − ***φ**T (

**x***i*)

**w ***− b ≤ ε *+

*ξi, ∀i *= 1

*, ., n*
With this approach, we intuitively increase the penalisa-

**φ**T (

**x***i*)

**w **+

*b − yi ≤ ε *+

*ξ∗i, ∀i *= 1

*, ., n*
tion of errors as we approach the decision border. The

*ξi, ξ∗i ≥ *0

*, ∀i *= 1

*, ., n*
additional penalisation factors

*ki *can be fixed

*a priori *or
CAMPS-VALLS ET AL.

computed in an adaptive way by taking advantage of the
from 1 to 3. Additionally, we left weights unaltered if the
Iterated Re-Weighted Least Squares (IRWLS) procedure
committed error for a pattern was below an error threshold
[50]. In our application, we only considered a heuristic
(

*ε < *25 ng/mL). If not, we proceeded to the application
approach in which several combinations were tested.

of the typical equations of the on-line back-propagation (BP)
Further work will consider refined updating rules.

learning algorithm. In terms of statistical learning, the latter
The inclusion of a temporal confidence function in the SVM
approach can be referred to as the quadratic

*ε*-insensitive cost
formulation offers some advantages. Essentially, the overall
function. These modifications produced higher recognition
number of SV remains constant through time and better results
rates than the MLP trained with the standard BP and, in turn,
are obtained when abrupt changes appear in the time series, as
reduced the computational burden.

demonstrated in [50], [54]. In addition, note that the computa-
The training process for the FIR network was difficult
tional complexity of the proposed method is the same than for
because of its complexity since the number of free parameters
the standard SVM formulations, since the functional and the
increases geometrically with the number of inputs, as shown in
number of constraints is the same. The only shortcoming in
[29]. In order to obtain accurate models, a great many sweeps
the PD-SVM is the design of the confidence function (Eq.

were performed, varying the number of hidden neurons (from
(12) for regression or Eqs. (13)-(14) for classification). In
2 to 25), the number of taps per synaptic connection in a
our previous experience, the inclusion of

*a priori *meaningful
layer (from 1 to 2) and the learning rate (typically between
profile functions alleviated this restriction, resulting in more
0.0001 and 0.01). Initialisation of the weights depending on
elegant solutions to the problem [52], [55], [56].

the structure of the net was also used, as proposed by manyauthors [25], [29]. Models with few taps (

*<*4) and long
training (number of epochs

*>*10000) were necessary to attain
satisfactory results. The training of the Elman network depends
This section is organized as follows. First, we include
to a large extent on the learning rate and the number of context
a detailed description about pattern building, which serves
neurons. In fact, the network must be trained using high values
equally well both for prediction and classification. Then, we
of the momentum term (

*α > *0

*.*8) because this prevents weight
show and discuss results for the prediction of CyA levels. Af-
oscillations in the training process.

terwards, we compare MLP and SVM for the identification of
In the case of SVMs, we tested linear, polynomial and Ra-
subtherapeutic and toxic levels. Finally, we perform sensitivity
dial Basis Function (RBF) kernels to obtain the SVR solutions.

analysis of the MLP and analyze the SV distribution.

There are many reasons to select the RBF kernel

*a priori*:it has less numerical difficulties than linear, polynomial or

*A. Building the input-output data*
sigmoid kernels, and only the Gaussian width has to be tuned.

In order to develop a model, the input patterns from the
In addition, sigmoid kernels are non-positive definite kernels in
available data for each patient must be previously built. Both
all situations, which precludes their practical application [40],
for the problem of time series prediction and classification, we
[57], [58]. Note that one or more free parameters must be
built the input pattern following a time series methodology.

previously settled in the nonlinear kernels. We used exponen-
For this purpose, we tested several sizes of the time window
tially increasing sequences of

*C *(

*C *= 10

*−*2

*, *10

*−*1

*, · · · , *106),
(from one to five post-operatory days) [19]; however, given
and

*σ *(

*σ *= 1

*, · · · , *50). The tube size

*ε *was tuned linearly
that prediction in the early post-transplantation period is
(

*ε *= 0

*, · · · , *0

*.*75). The polynomial order

*d *was varied in the
strictly necessary, time-series were lagged by only one sample.

range 1 to 8, as suggested in the literature [59]. An additional
Therefore, an input pattern for a given patient contains two
parameter for the PD-SVR was

*λ *for the exponential memory
samples from each variable in Table I (except for the gender)
decay, which was varied from 0.70 to 1 in steps of 0.05. During
at time

*t *and

*t − *1, resulting in a total input dimension of 21
the development of the models, the data were pre-processed
(10

*×*2 + 1). This scheme allows a prediction in the first four
to give zero mean and unit variance. All models were de-
post-operatory days as a mean and, as a result, yields a fixed
veloped in MATLAB environment (Mathworks, Inc). Since
unique model.

the computational burden was high,

*m-files *were translated to
The prediction task consisted on estimating the value of
MEX-files and the programs run on a Pentium III (1.8GHz)
CyA blood concentration at time

*t *+ 1 from the input pattern.

with 256MB RAM.

The classification task used the same input pattern to identify
The criterion used to select a candidate model for the final
the range of concentration at

*t*+1. More details on this scheme
system was based on the model predictive performance in
are given in Section IV-C.

the validation data set. Bias was measured using the meanerror (ME). The root-mean-square error (RMSE) was used asa measure of precision. In addition, we measured blood levels

*B. Prediction of CyA levels*
accurately predicted (%BLAP) if an error margin of 20% is

*1) Model development: *With regard the MLP and NN-
fixed, as proposed in [3], [4]. We used the mean of the absolute
ARMAX models, we varied the number of hidden neurons
prediction error to compare the precision of the models using
(

*< *20 to avoid overfitting), the initialisation of weights and
the one-way ANOVA method. The results were also assessed
the learning rate (between 0.01 and 1) in order to determine
by inspecting the correlation coefficient (

*r*) as a measure of
the best topology. We also penalized committed errors larger
goodness-of-fit. The model accuracy was tested by using the
than 50 ng/mL by a penalization factor

*P *, which was varied
one-way analysis of variance (ANOVA) method.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
BEST RESULTS FOR THE CYA BLOOD CONCENTRATION PREDICTION OF
ALL THE MODELS. THE ROOT-MEAN-SQUARE ERROR (RMSE), THE MEAN
ERROR (ME) AND THE CORRELATION COEFFICIENT (

*r*) ARE GIVEN.

BLOOD LEVELS ACCURATELY PREDICTED (20%BLAP) ARE ALSO
SHOWN. 95% CI ARE GIVEN IN BRACKETS, WHICH WERE CALCULATED
USING BOOTSTRAP METHODS FOR THE CASE OF RMSE.

Residuals (ng/mL)
Desired concentration (ng/mL)
Predicted concentration (ng/mL)
Predicted concentration (ng/mL)
PD-SVR performance in the validation set for the CyA blood
concentration prediction. The solid line represents the line of identity, and
the dotted one is the regression line. (a) Predicted versus observed dosages
and (b) predicted versus residuals, in which the slope of the linear regression
is a measure of the expected systematic concentration-related deviation in the
patients and perform similarly in the early post-transplantation
period. Nevertheless, three patients have very poor predictions
(RMSE

*> *60 ng/mL) which could be due to errors in drug
dosage administration, to the inter- and intrasubject variability
in the drug absorption process, to the recording of blood
sampling times, or abrupt changes in each patient's clinical
condition. A linear dependency on interindividual variability
could explain those results. In fact, patients with highly biasedestimations are precisely those with high values of coefficient

*† *The distribution was divided into three monitoring regions (period
of variation (CV

*>*30% on the whole series and CV

*>*45% in
1: 0–60 days, period

2: 60–160 days, period

3: 160–400 days) and
a different PD-SVR was applied in each of them. Both

*ε *and

*C *were
the early post-transplantation month). When these patients are
tuned using an exponential memory decay:

*λ*1 = 0

*.*98,

*λ*2 = 0

*.*995,
not considered, results improve drastically (

*r*=0.78, ME=0.27

*λ*3 = 0

*.*998.

ng/mL, RMSE=49 ng/mL, BLAP20%=75%).

As an example of these situations, we show evolution (CyA
dosage and blood concentrations) of three patients with good

*2) Numerical analysis: *Table II shows results in the pre-
(Fig. 5a), acceptable (Fig. 5b) and poor (Fig. 5c) predictive
diction task for all models in the validation set. Results do
performances. In the same figure, we show CyA concentration
not yield significant numerical differences between models in
predictions of MLP, FIR network, and the SVR method. We
terms of accuracy (RMSE) or goodness-of-fit (

*r*) (see Table
usually obtain good predictions (RMSE

*<*40 ng/mL) in patients
II). A better performance is achieved with PD-SVR, NNOE,
under 50 years old, with total body weight higher than 50 Kg
NNSSIF, and FIR models regarding accuracy and success
and with low number of subtherapeutic or toxic CyA blood
rates. The PD-SVR method performs better than MLP, FIR,
levels (

*<*10%). We have observed average over-estimations
and the standard formulation of the SVR. The less biased mod-
(ME

*>*0) in patients with accurate predictions even in the early
els are the Elman recurrent network (0.30 ng/mL), SVR (0.38
post-transplantation days or when receiving moderate doses.

ng/mL), and PD-SVR (0.36 ng/mL) methods. In addition, the
Abrupt changes in the time series produce appreciable loss
size of the difference between CI95% is similar for all models,
in the prediction, as shown in Fig. 5b, where in two weeks,
but SVR and PD-SVR keep them symmetrically distributed.

a toxic level is reached with a moderate dosage. Even with
An ANOVA test shows that no statistical differences in bias
these difficulties, models efficiently capture the dynamics in
(F=0.6292,

*p*=0.7540) or accuracy (F=0.3912,

*p*=0.9259) are
the first post-operatory month. The poor results obtained in
Fig. 5c can be mainly due to the presence of abrupt changes
Figure 4 shows predicted-versus-observed and predicted-
in the first post-transplantation month. These poor predictions,
versus-residuals plots in the validation set for the PD-
even with moderate initial doses (

*<*8 mg/Kg), could be due to
SVR, which yields the best compromise between accuracy
the higher variability in the absorption and disposition process
and bias of the estimations. Good determination coefficients
of CyA during the first four weeks of post-transplantation. In
(

*r*2 = 0

*.*78) and negatively biased estimations (linear re-
fact, under-prediction in this period is a common characteristic
gression; slope

*±*IC95%: 1.112

*±*0.097, intercept

*± *IC95%: -
of the models for almost all the patients, and the prediction
27.524

*±*24.755) are observed. Residuals do not indicate any
of CyA blood concentration presents serious difficulties. This
trend. As the figure shows, all the models capture abrupt
problem is lessened as long as blood levels become more
changes in the time series of CyA blood concentration in the
stable, which is when slight over-predictions are obtained (see
CAMPS-VALLS ET AL.

blood concentration (ng/mL)
Post-transplantation days (d)
Standard deviation of the Gaussian noise,
Standard deviation of the Gaussian noise, σ
Evaluation of the (a) (absolute) mean error and the (b) RMSE
Daily dose (mg/Kg/d) 3
measurement when additive Gaussian noise with zero mean and standard
deviation

*σ *is introduced in the predictive models. Results refer to the
Post-transplantation days (d)
validation set and were repeated 100 times, which represents a reasonableconfidence margin for the measurements.

patients in Fig. 5a and 5b).

Since no numerical or statistical differences were observed
between the neural and kernel models, we decided to test their
robustness by introducing additive noise at models inputs. This
blood concentration (ng/mL) 100
can simulate situations such as blood sampling errors, patient
compliance and the sensitivity of the model to exact input
Post-transplantation days (d)
values. This process was tested with the most precise and
unbiased models (ELMAN, FIR, PD-SVR) and the classical
MLP network. In Fig. 6, we show the performance (bias and
accuracy) in models when different levels of noise variance (

*σ*)
are introduced. Both measurements increase as the noise level
is increased. However, as

*σ *is increased, PD-SVR shows an
Daily dose (mg/Kg/d) 4
excellent behaviour regarding bias and accuracy. We conclude
that PD-SVR offers excellent robustness capabilities when
Post-trasplantation days (d)
low noise levels are introduced (

*σ <*0.05), which indicates
less sensitivity to exact input values in normal situations.

Certainly, regularisation not only provides smoother solutions
but also improves stability of predictions. This issue has been
extensively demonstrated in the literature in general, and in
our problem in particular.

blood concentration (ng/mL)CyA

*C. Levels identification*
Post-transplantation days (d)
Even though the previous forecasting models are accurate,
they do not capture abnormal CyA levels (only 5% of toxic
levels and 4% of subtherapeutic levels are correctly predicted),
and thus they would not aid in preventing nephrotoxicity or
transplant rejection. An alternative approach consists in pre-
dicting whether the next CyA blood level increases (decreases)
Daily dose (mg/Kg/d) 2
to a toxic (subtherapeutic) level. With such an approach, the
prediction task becomes a classification problem with three
Post-transplantation days (d)
classes (CyA levels

*<*150, [150, 400] or

*>*400 ng/mL) andthus

*Nc *= 3. For this purpose, we developed two schemes:
Plots of evolution of three individual patients showing (a) good
(RMSE=34.8 ng/mL), (b) acceptable (RMSE=42.9 ng/mL), and (c) unsatisfac-
(1) the one-against-all classification scheme, in which each of
tory (RMSE=73.1 ng/mL) predictive performance. For each patient, the upper
the three binary classifiers is trained to distinguish the samples
panel shows observed (thick solid line) and predicted CyA blood concentration
in a given class from the samples in the two remaining classes;
(ng/mL) using PD-SVR (thin solid line), SVR (thinner solid line), FIR (thindashed line), and MLP (thin dotted line). In each bottom panel, the oral dose
and (2) the one-against-one scheme, in which

*N c*(

*Nc − *1)

*/*2
(mg/Kg/d) versus post-operatory day is represented for proper analysis.

binary classifiers are developed to distinguish a pair of classes.

*1) Model development and comparison: *In this task, two
approaches were considered; one-against-all and one-against-
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
than MLP (size of the difference is 11.54%) but it yields
a lower recognition rate (size of the difference is 5.88%)
CONFUSION MATRICES IN THE VALIDATION SET OF THE MLP (IN
when predicting subtherapeutic levels. In any case, sensitivity
BRACKETS), SVM (ITALICS) AND PD-SVM (BOLD FACE) MODELS FOR
is much better than in a previous work by Hirankarn et al.

PREDICTING TOXIC AND SUBTHERAPEUTIC LEVELS USING
[9], in which accuracy in subtherapeutic ranges was about
ONE-AGAINST-ALL (TOP) AND ONE-AGAINST-ONE SCHEMES (BOTTOM)

*†*.

62%. This issue has been addressed by using the PD-SVM,

**ONE-AGAINST-ALL CLASSIFICATION SCHEME**
which improves results of the standard SVM and MLP models,

**Actual CyA levels [ng/mL]**
especially significant for subtherapeutic level detection.

**24 ***24 *(21)

**78 ***123 *(92)

**1 ***1 *(4)

**264 ***194 *(244)

**2 ***2 *(2)

**1 ***1 *(1)

**42 ***67 *(48)

**13 ***13 *(14)

**Actual CyA levels [ng/mL]**
**150 ng/mL**
**25 ***24 *(24)

**143 ***150 *(144)

**1 ***2 *(2)

**184 ***170 *(173)

**1 ***1 *(1)

**0 ***0 *(0)

**57 ***64 *(67)

**16 ***16 *(15)

*† *The best results for the PD-SVM were obtained using

*k*2=1,

*k*1=0.3,

*ko*=0 for identifying toxic levels,

*k*2=-1,

*k*1=-0.4,

*ko*=600
for subtherapeutic levels, and

*k*
% False Alarms (100-Specificity)
% False Alarms (100-Specificity)
2=1,

*k*1=0.12,

*ko*=0.05 in the one-
against-one scheme.

Receiver operating characteristic (ROC) curve of the MLP and the
SVM methods in the validation set when used as dedicated prediction modelsof (a) subtherapeutic levels and (b) toxic levels. The decision limit

*γ *was
one classifiers. With regard to the MLP models, we varied
varied throughout the output range [–1,+1] to obtain this curve. Circles (MLP),crosses (SVM) and stars (PD-SVM) represent the origin of the decision limit
the number of hidden neurons (

*< *20 to avoid overfitting), the
weight initialisation range, and the learning rate (between 0.01and 3) in order to determine the best topology. We tested linear,
In order to assess models' performance, we also calculated
polynomial, and Radial Basis Function (RBF) kernels to obtain
the area under a receiver operating characteristic (ROC) curves
the SVM solutions. The same ranges as the ones presented in
for the dedicated classifiers of toxic and subtheratpeutic blood
Section IV-B.1 were used. An additional parameter for the
levels (Fig. 7). The plot of false alarms versus hits provides a
PD-SVM was

*ki*, which were heuristically tuned.

useful way to compare models for a wide range of ‘levels of
In a multiclass problem, one usually optimizes a

*global*
discrimination', and thus it has become a traditional method
measurement of model performance, such as the overall
for model assessment. In our models, the area under the
success rate (SR[%]). However, in unbalanced datasets, it
ROC curve (AUC) is higher with the PD-SVM method in
is more useful to pay attention to the sensitivity/specificity
detecting both subtherapeutic (MLP150: 76.47%, SVM150:
in order to avoid skewed results. In our case, models were
82.77%, PD-SVM150: 87.43%) and toxic levels (MLP400:
selected by evaluating the average of the sensitivity (SE) and
89.83%, SVM400: 95.11%, PD-SVM400: 97.45%). In ad-
specificity (SP) factors obtained in the

*classes of interest*
dition, the PD-SVM reduces the number of false positives
(toxic and subtherapeutic levels), since the final goal is to
for both models (see Table III[top]) and the number of SVs
obtain a highly sensitive, robust classifier capable of provid-
(28% against 35% of the whole training set for the standard
ing ‘alarm signals' for patient's monitoring. The sensitivity
formulation). An additional comment refers to the distance
determines the percentage of true results that are correctly
between discrimination levels. When large jumps from one
classified, e.g. CyA levels greater than 400 ng/mL correctly
level of discrimination to another are found in these curves,
classified, and specificity determines the percentage of false
a lack of knowledge of the classifier's behaviour in that area
results that are correctly classified, e.g. greater than 150 ng/mL
is present. In this sense, it is worth noting that very similar
correctly classified. Since the distribution of classes is highly
curves are observed for toxic level models (Fig. 7b), but
unbalanced (4.95% of the cases are subtherapeutic levels and
(slightly) lower confidence can be obtained by using an MLP
5.77% of the cases are toxic levels), committed errors were
for predicting subtherapeutic levels (Fig. 7a). Similar results
systematically penalized according to the number of cases in
are observed between the SVM and its profiled version. The
each class [47]. In the case of the PD-SVM, the former priors-
latter conclusions can also be observed by inspecting the false
based penalisation multiplies the penalisation provided by the
alarm rate in Table III[top]. In general, the ROC curves relative
specific profile (Eqs. (13) and (14)).

to the PD-SVM are above and on the left compared with the

*2) Numerical comparison:*
Table III[top] shows one-
ROC curves relative to the other models.

against-all models performance. High rates of the (SE+SP)
Table III[bottom] shows the one-against-one confusion ma-
criterion (

*> *81%) were obtained for all classifiers. This result
trix. Several conclusions can be drawn: (1) The PD-SVM
enables its use as sensitive models for the real clinical practice.

method yields better results than the rest, specificially the
Specifically, the SVM classifier yields a higher sensitivity ratio
MLP, with a raise of 2.23% in terms of SR[%] and 6.25%
CAMPS-VALLS ET AL.

in terms of SE+SP; (2) a dramatic error in classifying toxic
RANKING OF INPUT VARIABLES ACCORDING TO THE DELTA ERROR (DE),
patterns is committed by the MLP; (3) the PD-SVM improves
AVERAGE GRADIENT (AG) AND AVERAGE ABSOLUTE GRADIENT (AAG)
sensitivity of detection of subtherapeutic levels and drastically
MEASUREMENTS FOR THE BEST MLP. THE MOST RELEVANT INPUT
reduces the misclassification rate of therapeutic levels; and
VARIABLES ARE DAILY DOSAGE (DD), CYA BLOOD CONCENTRATION (C),
(4) once again model complexity is reduced with the profile-
CREATININE (CR), POST-TRANSPLANTATION DAYS (PTD), AND
dependent technique, by which we obtain a mean reduction of
HEMATOCRIT (HTO) FOR POST-TRANSPLANTATION DAYS

*t *AND

*t − *1.

4% of SVs

*per *class.

*3) Statistical comparison: *As we did in the prediction
approach, we have analyzed the numerical but also statistical
differences among classifiers and schemes. For this purpose,
we have computed the statistical pairwise comparison of
two classifiers through

*Z*-scores [60]. In general, SVM-based
models give better performance in terms of sensitivity, but
MLP is (slightly) better in specificity. In particular, PD-
SVM yields better (SE+SP) scores so it is more sensitive,specific and balanced classifier in all schemes. Statistical testsyielded

*Z *scores higher than 1.96 for all classifiers, and thus
past value of dosage, CyA blood concentration and creatinine
results are significant and classifications are better than random
level. In a second level of relevance, we find the post-
choice. An interesting result is that the PD-SVM and MLP are
transplantation days along with past hematocrit and creatinine
preferred statistically when working in one-against-all schemes
clearance levels. By analysing the sign of DE and AAG we
(

*Z*PD-SVM = 5

*.*38,

*Z*SVM = 3

*.*00,

*Z*MLP = 4

*.*12) than in one-
can conclude that, on average, an increase in past dosage
against-one schemes (

*Z*PD-SVM = 3

*.*00,

*Z*SVM = 2

*.*60,

*Z*MLP =
produces an increase in future CyA blood concentration, which
2

*.*62), in which no appreciable differences appear. Performing
indicates that model captures correctly this issue. On average,
pairwise statistical comparisons, one can conclude that only
lower creatinine levels are associated to an increase in CyA
the PD-SVM is significantly different than the other classifiers
blood concentration. These results agree with those obtained
in one-against-all scheme, and no statistical differences appear
when other machine learning approaches [19] and NONMEM
in the one-against-one schemes. These results match the ones
modeling (ANOVA and univariate analysis methods) [20], [21]
shown in [61], in which the authors pointed out that, in some
ocassions, a one-against-all scheme can be as accurate as any

*2) Distribution of the Support Vectors: *SVMs have demon-
strated to be well-suited techniques in classification and re-gression tasks. An additional advantage also arises from theiruse: the solution is expressed as a linear combination of some

*D. Models analysis*
instances and, thus, their analysis offers some knowledge gain
Knowledge discovery is defined as "the process of identify-
about the problem. Indeed, the final model is a good com-
ing valid, novel, potentially useful, and ultimately understand-
promise between accuracy (almost 80% of predictions with
able structure in data" [62]. The scientific community is not
errors under 20%) and simplicity (24% of samples become
only searching for methods that provide accurate estimations
support vectors). Support vectors are mainly scattered around
of the underlying system function, but for methods that also
CyA blood levels of 320 ng/mL (

*±*100,

*p > *0

*.*05) and in
explain those complex, and often non-linear, relationships
patients who weigh more than 45 Kg and who are over 50
from the input-output mapping performed by the models. In
this paper, sensitivity analyses for the MLP and insight on SV
We also compared distributions of the entire training set
distribution for the PD-SVR have been used in order to gain
and the obtained support vectors using Principal Component
knowledge about the problem.

Analysis (PCA). After their respective diagonalisation and

*1) Sensitivity analysis: *Sensitivity analysis is used to study
standardising, we evaluated the scatter degree in every subset
the influence of input variables on the dependent variable and
as a measure of the distance,

*d*(

*i, j*), among the eigenvectors,
consists of evaluating the changes in training error that would

**v***i *and

**v***j*, weighted by their eigenvalues,

*λi *and

*λj*:

result if an input were removed from the model. This measure,
commonly known as

*delta error *in the literature, produces

*d*(

*i, j*) =

*
λi***v***i − λj***v***j
a valuable ranking of the relevance of the variables. Two
This distance was averaged over all possible pairs (**i, j*) of
additional sensitivity measures, which are based on perturbing
different eigenvectors. No geometrical differences were found
an input and monitoring network outputs, can be computed:
between the original ( ¯
**DT ***± σ***DT**: 6.22*±*3.14) and the SV

the *Average Gradient (AG) *and the *Average Absolute Gradient*
data set (6.44*±*3.64), which suggests that SVs scatter in a
*(AAG)*. All these measurements are extensively described in
way similar to that for the whole data set and, consequently,
reveals that a robust solution has been achieved.

In Table IV, different rankings in accordance with these
measurements are shown for the MLP. Only the top seven
V. DISCUSSION AND CONCLUSIONS
relevant inputs are shown. Several conclusions can be drawn.

In this paper, we have presented time series prediction
The most informative variables considered by the model are
and classification approaches for a complex TDM problem.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
We have compared state-of-the-art support vector machinesand neural networks. Model comparison has been carried outin terms of accuracy and robustness. A novel kernel-basedapproach has been presented, which allows the incorporationof *a priori *knowledge and improves results in both approaches.

Finally, we have analyzed model structure by performingsensitivity analyses on the best MLP model and inspecting thedistribution of the support vectors on the best SVM model.

These methods not only provide a ranking of relevant vari-
ables, but also constitute a methodology for model assessment.

The prediction of immunosuppressive blood concentrations
Windows of the application for predicting the CyA blood concentra-
is a challenging issue and leads to difficulties in selecting
the optimum dose drug to avoid graft rejection and minimizeadverse effects. Intensive drug monitoring is necessary in order
and tailored model.

to keep blood concentrations within the proper range. In this
With regard to the classification problem, both one-against-
context, we have presented the formulation of state-of-the-art
all and one-against-one schemes have been attempted. Models
models that could help to individualize the CyA posology.

could be useful for clinicians in designing dosage regimens
Blood concentration models have been built to achieve ac-
to avoid toxic and subtherapeutic cyclosporine ranges and, in
curacy and robustness. By predicting concentration instead of
turn, to reduce costs to the Health Care System. The SVM
dosage we achieve objectivity and usefulness since the latter is
method performs slightly better than the MLP in both schemes
based on a certain protocol of dosage administration that could
regarding sensitivity rates and AUC. Once again, designing
disturb the final goal of TDM. In fact, this approach attempts
specific penalisation profiles have produced better results.

to assist the health care team in dosage individualisation since
Based on these outcomes, the application of SVM in the
physicians could take the blood concentration estimation as
context of TDM can become a clinically useful tool. In this
a helpful guide for dose administration. In [10], [12], we
sense, we implemented the best model in an easy-to-use
presented a scheme of two chained models where the CyA
computer program in order to aid in the individualisation of
blood concentration predicted by the concentration prediction
dosage and pharmacotherapeutical attention (Fig. 8), which
model constituted an input to the dosage prediction model.

in turn brings state-of-the-art models closer to clinicians
This system could serve as a dosage guide to the clinician,
[65]. The main limitation encountered in this work is due
but it presented two basic problems. First, dosing follows
to the group's location. Since patients were all from the
a therapeutic guideline, which makes predicting dosage an
same nephrology units, they had a series of characteristics in
indirect way of predicting doctors protocol. Second, from our
common. For example, the treatment guidelines and protocol
particular point of view and experience, a direct translation
administration for the patients were similar, which means that
of concentration to dosage could influence, or even replace,
extrapolations to other centres should be treated with caution.

the doctor's own decision. A decision-support system can be
Furthermore, a strict test should be performed before using
defined as "a computer-based algorithm that assists a clinician
the application in new situations. This, however, should not
with one or more component steps of the diagnostic process"
prevent the use of SVM methodology in other nephrology
[64], and thus, it should only aid doctors, rather than influence
units, where they should be implemented taking into account
them or substitute them. In contrast, the scheme presented
the local population characteristics and dosing protocols.

in this paper provides the clinician with different signals,
Further studies are necessary in order to explore statis-
rankings, and follow-up information.

tical differences between methods, the influence of clinical
Despite the fact that the results obtained for the blood
covariates, and the expansion of the predictive performance
concentration prediction are acceptable, they are inferior to
up to long-term follow-up. However, there is no doubt that
those for classification purposes. Nevertheless, its joint consid-
the appearance of new protocols based on two-hour post-
eration could be a valuable help in TDM. The best results for
dosing monitoring (C
the CyA blood concentration prediction were obtained using
2*h*) constitutes the new cornerstone in
CyA TDM. At present, there are only limited C
the PD-SVR, where an ME of 0.36 ng/mL and a RMSE of
2*h *data in our
hospital, but in a few years, we expect a substantial amount
52.01 ng/mL were observed in the validation set. Our results
to be collected. The poor preliminary results obtained with
clearly improve a previous work that followed the time series
neural networks and ARMA modeling [66] encourage the use
methodology [26], in which an MLP with lagged inputs was
of SVM in this new application.

used to predict CyA levels in renal allograft recipients andthe results were not optimal (bias: 25 ng/mL, precision: 74ng/mL in the test set). From a statistical point of view, there
are no significant differences between the neural and kernel
The authors want to express their gratitude to Prof. An-
models developed in our work. However, from our analysis
tonio J. Serrano-L´opez (Universitat de Val encia, Spain) for
of model robustness, we can conclude that although dynamic
his useful discussions and references on Decision-Support
neural models give good results, PD-SVM yields a more robust
Systems, to Sergio S´anz (TISSAT S.A., Spain) for his valuable
solution, which is a direct consequence of using a regularized
help in software development, to Prof. ´
Angel Navia-V´azquez
CAMPS-VALLS ET AL.

(Universidad Carlos III, Spain) for his useful comments on
[19] G. Camps-Valls, "Redes neuronales y m´aquinas de vectores soporte para
recurrent networks for uneven sampling problems, and to
la predicci ´on y modelizaci ´on de la concentraci ´on valle de ciclosporinaA (CyA) en pacientes con trasplante renal," Ph.D. dissertation, Depar-
Dr. Bego˜na Porta-Oltra (Pharmacy Service of the Dr. Peset
tament d'Enginyeria Electr onica. Universitat de Val encia, July 2002,
University Hospital of Val encia, Spain) for the stimulating
clinical discussions and careful data collection.

[20] B. Porta, J. J. P´erez-Ruixo, N. V. Jim´enez, A. Sancho, and L. M.

Pallard ´o, "Individualizaci ´on posol ´ogica de ciclosporina en pacientes contrasplante renal: propuesta de un modelo farmacocin´etico en predicci ´on."
*Farmacia Hospitalaria*, vol. 22, no. 4, pp. 181–187, 1998.

[21] B. Porta, "Modelado farmacocin´etico de ciclosporina en pacientes con
[1] P. Belitsky, "Neoral used in the renal transplant recipient," *Transplant-*
trasplante renal," Ph.D. dissertation, Departament de Farm acia Hospita-
*ation Proceedings*, vol. 32, no. 3A Suppl. Review., pp. S10–S19, May
laria i Gal enica. Universitat de Val encia., 2002.

[22] J. F. Hair, R. E. Anderson, R. L. Tatham, and W. C. Black, *Multivariate*
[2] L. A., "Factors influencing the pharmacokinetics of cyclosporine in
*Data Analysis*, 5th ed. New Jersey, U.S.A.: Prentice-Hall International,
man," *Therapeutic Drug Monitoring*, vol. 13, no. 6, pp. 465–477, Nov
[23] B. D. Kahan, W. G. Kramer, C. A. Wideman, S. M. Flechner, M. Lorber,
[3] J. Parke and B. G. Charles, "NONMEM population pharmacokinetic
and C. T. van Buren, "Demographics factors affecting the pharmacoki-
modeling of orally administered cyclosporine from routine drug mon-
netics of cyclosporine estimated by radioinmunoassay," *Transplantation*,
itoring data after heart transplantation," *Therapeutic Drug Monitoring*,
vol. 41, pp. 459–464, 1986.

vol. 20, no. 3, pp. 284–293, Jun 1998.

[24] A. S. Weigend and N. A. Gershenfeld, *Time Series Prediction. Forecast-*
[4] B. Charpiat, I. Falconi, V. Br´eant, R. W. Jellife, J. M. Sab, C. Ducerf,
*ing the Future and Understanding the Past. Proceedings of the NATO*
N. Fourcade, A. Thomasson, and J. Baulieux, "A population pharmacoki-
*Advanced Research Workshop on Comparative Time Series Analysis held*
netic model of cyclosporine in the early postoperative phase in patients
*in Santa Fe, New Mexico, May 14–17, 1992. Proceedings Volume XV*.

with liver transplants, and its predictive performance with Bayesian
Addison–Wesley, 1994, vol. XV.

fitting," *Therapeutic Drug Monitoring*, vol. 20, pp. 158–164, 1998.

[25] S. Haykin, *Neural Networks: A Comprehensive Foundation*, 3rd ed.

[5] M. E. Brier, J. M. Zurada, and G. R. Aronoff, "Neural network predicted
New Jersey, U.S.A.: Prentice Hall, 1999.

peak and trough gentamicin concentrations," *Pharmaceutical Research*,
[26] M. E. Brier, "Empirical pharmacokinetic predictions for cyclosporine
vol. 12, no. 3, pp. 406–412, 1995.

using a time series neural network," *Pharmaceutical Research*, vol. 12,
[6] A. S. Hussain, R. D. Johnson, N. N. Vachharajani, and R. W. A.,
no. Suppl. S363, 1995.

"Feasibility of developing a neural network for prediction of human
[27] M. Nørgaard, O. Ravn, and N. Poulsen, "NNSYSID & NNCTRL –
pharmacokinetic parameters from animal data," *Pharmaceutical Re-*
tools for system identification and control with neural networks," *IEE*
*search*, vol. 10, no. 3, pp. 466–469, Mar 1993.

*Computing & Control Engineering Journal*, vol. 12, no. 1, pp. 29–36,
[7] P. Veng-Pedersen and N. Modi, "Application of neural networks to
pharmacodynamics," *Journal of Pharmaceutical Sciences*, vol. 82, pp.

[28] L. Ljung, *System Identification. Theory for the user*, 2nd ed.

918–926, 1993.

Jersey, U.S.A.: Prentice-Hall International, Inc., 1999.

[8] A. E. Gaweda, A. A. Jacobs, M. E. Brier, and J. M. Zurada, "Pharma-
[29] E. A. Wan, "Finite Impulse Response neural networks with appli-
codynamic population analysis in chronic renal failure using artificial
cations in time series prediction," Ph.D. dissertation, Department of
neural networks––a comparative study," *Neural Networks*, vol. 16, no.

Electrical Engineering. Stanford University, November 1993, available
5-6, pp. 841–845, 2003.

[9] S. Hirankarn, C. Downs, W. Street, and R. A. Herman, "Prediction of
[30] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang, "Phoneme
two ranges of cyclosporine level (subtherapeutic and toxic) using feature
recognition using time-delay neural networks," *IEEE Transactions on*
subset selection and artificial neural networks," in *AAPS Annual Meeting*,
*Acoustics, Speech, and Signal Processing*, vol. 37, pp. 328–339, 1989.

vol. 2 (2), Orlando, U.S.A., 2000, abstract 1274.

[31] E. A. Wan, "Modeling nonlinear dynamics with neural networks: Exam-
[10] G. Camps-Valls, E. Soria-Olivas, B. Porta-Oltra, J. J. P´erez-Ruizo, J. D.

ples in time series prediction," in *Proceedings of the Fifth Workshop on*
Mart´ın-Guerrero, A. J. Serrano-L ´opez, and N. V. Jim´enez-Torres, "A
*Neural Networks: Academic/Industrial/NASA/Defense, WNN93/FNN93.*,
neural approach to cyclosporine dose prediction," *World Congress on*
San Francisco, U.S.A., November 1993, pp. 327–332, available at
*Medical Physics and Biomedical Engineering*, July 2000.

[11] G. Camps-Valls, E. Soria-Olivas, J. D. Mart´ın-Guerrero, J. J. P´erez-
[32] B. de Vries and J. C. Principe, "The Gamma model – a new neural
Ruixo, and N. V. Jim´enez-Torres, "Neural networks ensemble for
model for temporal processing," *Neural Networks*, vol. 5, no. 4, pp.

cyclosporine concentration monitoring," in *International Conference on*
565–576, 1992.

*Artificial Neural Networks*, vol. 2130.

Vienna, Austria: Lecture Notes
[33] J. C. Principe, B. deVries, and P. G. deOliveira, "The gamma filter
in Computer Science. Springer–Verlag., Aug 2001, pp. 706–711.

– A new class of adaptive IIR filters with restricted feedback," *IEEE*
[12] G. Camps-Valls, B. Porta-Oltra, E. Soria-Olivas, J. D. Mart´ın-Guerrero,
*Transactions on Signal Processing*, vol. 41, no. 2, pp. 649–656, Feb
A. J. Serrano-L ´opez, J. J. P´erez-Ruixo, and N. V. Jim´enez-Torres, "Pre-
diction of cyclosporine dosage in patients after kidney transplantation
[34] J. C. Principe, B. de Vries, J. Kuo, and P. Guedes-de Olivera, "Mod-
using neural networks," *IEEE Transactions on Biomedical Engineering*,
eling applications with the focused Gamma net," in *Neural Informa-*
vol. 50, no. 4, pp. 442–448, April 2003.

*tion Processing Systems, NIPS*, 1991, pp. 143–150, available from
[13] G. Camps-Valls, E. Soria-Olivas, J. P´erez-Ruixo, A. Art´es-Rodr´ıguez,
F. P´erez-Cruz, and A. Figueiras-Vidal, "Cyclosporine concentration
[35] J. L. Elman, "Finding structure in time," *Cognitive Science*, vol. 14, pp.

prediction using clustering and Support Vector Regression methods,"
179–211, 1988.

*IEE Electronics Letters*, vol. 38, no. 6, pp. 568–570, June 2002.

[14] M. Oellerich, V. W. Armstrong, B. Kahan, L. Shaw, D. W. Holt,
[36] V. N. Vapnik, *Statistical Learning Theory*.

Wiley, 1998.

R. Yatscoff, A. Lindholm, P. Halloran, K. Gallicano, and K. Wonigeit,
[37] B. Sch ¨olkopf and A. Smola, *Learning with kernels*.

MIT Press, 2002.

"Lake Louise consensus conference on cyclosporin monitoring in or-
[38] B. Boser, I. Guyon, and V. N. Vapnik, "A training algorithm for optimal
gan transplantation: report of the consensus panel," *Therapeutic Drug*
margin classifiers," in *Proc. 5th Ann. Workshop on Computational*
*Monitoring*, vol. 17, pp. 642–654, Dec 1995.

*Learning Theory*, D. Haussler, Ed.

ACM Press, 1992, pp. 144–152.

[15] T. A. S. Assays, "Manual analitique," Rundix Cedex, France: Laborato-
[39] V. N. Vapnik, S. Golowich, and A. Smola, "Support vector method for
ries ABBOTT, Division Diagnostic, XII-CYCLO-MONO-13.

function approximation, regression estimation, and signal processing,"
[16] T. Kohonen, *Self-Organizing Maps*, 3rd ed.

Springer Series in Infor-
in *Neural Information Processing Systems*, M. Mozer, M. Jordan, and
mation Sciences, Vol. 30, 2001.

T. Petsche, Eds.

Cambridge, MA: M.I.T. Press, 1997, pp. 169–184.

[17] L. Breiman, J. Friedman, R. Olshen, and C. Stone, *Classification and*
[40] B. Sch ¨olkopf and A. Smola, *Learning with Kernels – Support Vector*
Monterey, CA: Wadsworth and Brooks, 1984.

*Machines, Regularization, Optimization and Beyond*. MIT Press Series,
[18] A. Abraham and D. Steinberg, "Is neural network a reliable forecaster on
earth? a MARS query!" in *Connectionist Models of Neurons, Learning*
[41] C.-W. Hsu and C.-J. Lin, "A comparison of methods for multiclass sup-
*Processes, and Artificial Intelligence*, J. M. A. P. (Eds.), Ed.

port vector machines," *IEEE Transaction on Neural Networks*, vol. 13,
Notes on Computer Science. LNCS2084. Springer-Verlag, 2001.

no. 2, 3 2002.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS
[42] B. Sch ¨olkopf, K.-K. Sung, C. J. Burges, F. Girosi, P. Niyogi, T. Poggio,
*de Ingenier´ıa Biom´edica, CASEIB2000*, Cartagena, Spain, Sep 2000, pp.

and V. N. Vapnik, "Comparing support vector machines with Gaussian
kernels to radial basis function classifiers," *IEEE Transaction on Signal*
[66] G. Camps-Valls, A. J. Serrano-L ´opez, B. Porta-Oltra, J. D. Mart´ın-
*Processing*, vol. 45, no. 11, pp. 2758–2765, Nov. 1997.

Guerrero, E. Soria-Olivas, and N. V. Jim´enez-Torres, "Neural networks
[43] J. Weston and C. Watkins, "Multi-class support vector machines," in
for *C*2*h *cyclosporine concentration modelling," in *32nd European*
*ESANN*, 1999.

*Symposium on Clinical Pharmacy, ESCP 2003.*, Val encia, Spain, Sep
[44] U. H. G. KreBel, *Pairwise classification and support vector machines,*
*In: Advances in Kernel Methods: Support Vector Learning*. Cambridge,MA, U.S.A.: The MIT Press, Cambridge, MA, 1999.

[45] V. N. Vapnik, *Statistical Learning Theory*.

New York: John Wiley &
[46] B. Sch ¨olkopf, P. L. Bartlett, A. Smola, and R. Williamson, "Shrinking
the tube: a new support vector regression algorithm," in *Advances inNeural Information Processing Systems 11*, M. S. Kearns, S. A. Solla,and D. A. Cohn, Eds.

Cambridge, MA: MIT Press, 1999, pp. 330 –
[47] Y. Lin, Y. Lee, and G. Wahba, "Support Vector Machines for classi-
fication in nonstandard situations," University of Wisconsin-Madison,Department of Statistics TR 1016, 2000.

[48] C.-F. Lin and S.-D. Wang, "Fuzzy support vector machines," *IEEE*
*Transactions on Neural Networks*, vol. 13, no. 2, pp. 464–471, 2002.

[49] Q. Tao, G.-W. Wu, F.-Y. Wang, and J. Wang, "Posterior probability
support vector machines for unbalanced data," *IEEE Transactions onNeural Networks*, vol. 16, no. 6, pp. 1561–1573, 2005.

[50] F. P´erez-Cruz, "M´aquina de vectores soporte adaptativa y compacta,"
Ph.D. dissertation, Dpto. Teor´ıa de la Se ˜nal y Comunicaciones. Univer-sidad Carlos III de Madrid., Dec. 2000.

[51] G. Camps-Valls, E. Soria-Olivas, J. P´erez-Ruixo, A. Art´es-Rodr´ıguez,
F. P´erez-Cruz, and A. Figueiras-Vidal, "A profile-dependent kernel-basedregression for cyclosporine concentration prediction," in *Neural Infor-mation Processing Systems, NIPS 2001. Workshop on New Directionsin Kernel-Based Learning Methods*, December 2001.

[52] J. D. Mart´ın-Guerrero, G. Camps-Valls, E. Soria-Olivas, A. J. Serrano-
L ´opez, J. J. P´erez-Ruixo, and N. V. Jim´enez-Torres, "Dosage individ-ualization of erythropoietin using a profile-dependent support vectorregression," *IEEE Transactions on Biomedical Engineering*, vol. 50,no. 10, pp. 1136–1142, June 2003.

[53] A. N. Refenes, Y. Bentz, D. W. Bunn, A. N. Burgess, and A. D. Zapranis,
"Financial time series modeling with discounted least squares back-propagation," *Neurocomputing*, vol. 14, p. 123 –138, 1997.

[54] F. E. H. Tay and L. J. Cao, "Modified support vector machines in
financial time series forecasting," *Neurocomputing*, vol. 48, pp. 847–861, 2002.

[55] G. Camps-Valls, A. Chalk, A. Serrano-Lopez, J. D. Martin-Guerrero,
and E. Sonnhammer, "Profiled support vector machines for antisenseoligonucleotide efficacy prediction," *BMC Bioinformatics*, no. 5, p.

135, available in OpenAccess: http://www.biomedcentral.com/1471-2105/5/135.

[56] G. G ´omez-P´erez, G. Camps-Valls, J. Guti´errez, and J. Malo, "Perceptual
adaptive insensitivity for support vector machine image coding," *IEEETransactions on Neural Networks*, vol. 16, no. 6, pp. 1574–1581, 2005.

[57] S. S. Keerthi and C.-J. Lin, "Asymptotic behaviors of support vector
machines with gaussian kernel," *Neural Computation*, vol. 15, no. 7,pp. 1667–1689, 2003.

SVM and the training of non-PSD kernels by SMO-type meth-ods," National Taiwan University, Department of Computer Sci-ence and Information Engineering, Tech. Rep., 2003, available athttp://www.csie.ntu.edu.tw/*∼*cjlin/papers/tanh.pdf.

[59] C. Cortes and V. Vapnik, "Support vector networks," *Machine Learning*,
vol. 20, pp. 273 – 297, 1995.

[60] R. Congalton and K. Green, *Assessing the Accuracy of Remotely Sensed*
*Data. Principles and Practices*, 1st ed.

U.S.A.: CRC Press, 1999.

[61] R. Rifkin and A. Klautau, "In defense of one-vs-all classification,"
*Journal of Machine Learning Research*, vol. 5, no. 1, pp. 101–141, 2004.

[62] P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian, "Mathematical pro-
gramming for data mining: formulations and challenges," MathematicalProgramming Technical Report 98-01, Computer Sciences Department,University of Winsconsin, WI, Tech. Rep. MSR-TR-98-04, Jan 1998.

[63] G. B. Orr and K.-R. M ¨uller, *Neural Networks: Tricks of the Trade*.

Springer-Verlag, Berlin, Heidenberg, 1998.

[64] E. S. Berner, *Clinical Decision Support Systems. Theory and Practice*,
New-York: Springer–Verlag, 1999.

[65] S. S´aez, E. Soria-Olivas, G. Camps-Valls, J. D. Mart´ın-Guerrero,
A. J. Serrano-L ´opez, and N. V. Jim´enez-Torres, "Aplicaci ´on inform´aticabasada en redes neuronales temporales para problemas de farma-cocin´etica cl´ınica." in *XVIII Congreso Anual de la Sociedad Espa˜nola*

Source: http://www.tsc.uc3m.es/~fernando/IEEESMC.pdf

the business voice of advanced energy ILLINOIS' ELECTRIC POWER SYSTEM AND THE CLEAN POWER PLAN The U.S. Environmental Protection Agency (EPA) will soon release the final rule for carbon emissions from existing power plants, called the Clean Power Plan (CPP). The rule represents the next step in the process of carbon regulation that began with the Supreme Court's determination in 2007 that carbon dioxide (CO )

Open Access REPORT ON NEGATIVE RESULT Bacterial Hash Function Using DNA-Based XOR Logic Reveals Unexpected Behavior of the LuxR PromoterBrianna Pearson1,‡, Kin H. Lau1,‡, Alicia Al en2, James Barron1,3, Robert Cool2, Kel y Davis4, Wil DeLoache1, Erin Feeney1, Andrew Gordon2, John Igo5, Aaron Lewis5, Kristi Muscalino4, Madeline Parra4, Pal avi Penumetcha1, Victoria G. Rinker1,6,