home/Research Interests/Markov decision processes

Two sample applications of MDP and FPI

Markov decision processes (MDPs) and policy iteration are powerful tools to solve dynamic decision problems. Here we give two application examples:

1. Dynamic RWA problem in Wavelength Routed Optical Networks

  • In Routing and Wavelength Assignment (RWA) problem, in the absence of wavelength converters, the task is to find such a route and a wavelength for each end-to-end lightpath so that no link has two lightpaths sharing the same wavelength.
  • In dynamic RWA problems, lightpath requests arrive according to (Poisson) process
  • Currently established lightpaths correspond to the state of the system
  • Decision on how customer/request is handled triggers a state change in the system
  • Constitutes a MDP, where the objective is to minimize the blocking probability
  • Enormous state space defies the computation of value functions, and thus also the policy iteration
  • First policy iteration can be carried out by estimating the relative values by on-the-fly simulations [1-3]
Related publications:
[1]  E. Hyytiä and J. Virtamo, Dynamic Routing and Wavelength Assignment Using First Policy Iteration, in the Fifth IEEE Symposium on Computers and Communications, ISCC'2000, pp. 146-151, 2000, Antibes, Juan les Pins, France. (pdf)
[2]  E. Hyytiä and J. Virtamo, Dynamic Routing and Wavelength Assignment Using First Policy Iteration, Inhomogeneous Traffic Case, in the International Conference on Performance and QoS of Next Generation Networking, P&QNet2000, pp. 301-316, 2000, Nagoya, Japan. (pdf)
[3]  E. Hyytiä and J. Virtamo, Adaptive Importance Sampling in Routing and Wavelength Assignment, European Transactions on Telecommunications (ETT), vol. 13, no. 4, pp. 331-339, 2002, Special Issue on Rare Event Simulation. (pdf)
RWA problem
Figure: A sample RWA problem with 8 lightpaths.


  1. Richard Bellman, Dynamic programming, Princeton University Press, 1957.
  2. Sheldon M. Ross, Applied Probability Models with Optimization Applications, Holden-Day Inc., 1970.
  3. Ronald A. Howard, Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes, Wiley Interscience, 1971.

2. Dispatching (Routing) Problem to Parallel Servers

  • Arriving customers are processed by a set of parallel servers
  • Customer is assigned a server upon arrival (dispatching decision)
  • Constitutes a MDP problem, where the objective is to minimize:
    • the mean delay (latency, sojourn time)
    • the slowdown (the ratio of delay to job size)
    • some other weighted sum of the waiting and/or sojourn time
Some optimality results exist for the basic setting, where all customers are flexible and subject to a dispatching decision. For identical servers with number in the queues is available, exponentially distributed service times and Poisson arrival process, Winston has shown that JSQ minimizes the mean delay. The optimality of Round-robin has been shown by Ephremides et al. in case when the current state of the queues is not available. FPI based approaches offer a general methodology to devise state-dependent policies, see, e.g., Krishan.
More details in the dispatching page.
dispatching system
Figure: A sample dispatching system.


  1. W. Winston, Optimality of the shortest line discipline, Journal of Applied Probability, 1977.
  2. A. Ephremides, P. Varaiya and J. Walrand, A simple dynamic routing problem, IEEE Trans. on Automatic Control, 1980.
  3. K. R. Krishnan, Joining the right queue: a state-dependent decision rule, IEEE Trans. on Automatic Control, 1990.