Game theoretic analysis for MIMO radars with multiple targets

This paper considers a distributed beamforming and resource allocation technique for a radar system in the presence of multiple targets. The primary objective of each radar is to minimize its transmission power while attaining an optimal beamforming strategy and satisfying a certain detection criterion for each of the targets. Therefore, we use convex optimization methods together with noncooperative and partially cooperative game theoretic approaches. Initially, we consider a strategic noncooperative game (SNG), where there is no communication between the various radars of the system. Hence each radar selfishly determines its optimal beamforming and power allocation. Subsequently, we assume a more coordinated game theoretic approach incorporating a pricing mechanism. Introducing a price in the utility function of each radar/player enforces beamformers to minimize the interference induced to other radars and to increase the social fairness of the system. Furthermore, we formulate a Stackelberg game by adding a surveillance radar to the system model, which will play the role of the leader, and hence the remaining radars will be the followers. The leader applies a pricing policy of interference charged to the followers aiming at maximizing his profit while keeping the incoming interference under a certain threshold.We also present a proof of the existence and uniqueness of the Nash equilibrium (NE) in both the partially cooperative and non cooperative games. Finally, the simulation results confirm the convergence of the algorithm in all three cases.


I. INTRODUCTION
M ULTIPLE-input multiple-output (MIMO) radar is an innovative technology that has raised expectations over the last decade that it will provide substantial improvements to the currently used radar systems. The main characteristic that allows MIMO radar to offer superior capabilities as compared to other radar regimes is its waveform diversity, which implies that MIMO radar can use multiple antennas to simultaneously transmit several orthogonal waveforms and multiple antennas to receive the reflected signals from the targets [1]. There are two principal MIMO radar schemes considered in the literature, the systems incorporating colocated antennas and those that consist of widely separated antennas (bistatic, multistatic) [2], [3]. The leading fields of research within MIMO radar technology are beamformer and waveform design, detection optimization and radar imaging [4]- [6]. Succeeding the advances in those fields, the main advantages offered by MIMO radar are higher angular resolution, direct applicability of adaptive array techniques, multiple targets detection and the ability to obtain spatial diversity in the target's radar cross section (RCS). Nevertheless, one substantial drawback in a multiple target, distributed radar system, that has not yet been completely resolved, is the multiple source interference imposed at the receivers of each radar. More specifically, the inter-radar 1 , the intra-radar 2 and the clutter interference lead to reduced efficiency and performance degradation of the radar system. Hence, an optimal beamforming and power allocation strategy is crucial as it minimizes the interference in between the radars of the same organization, while preserving a detection criterion. Game theory is a natural and effective tool for modeling this kind of interactions, as it offers a mathematical framework of conflict and cooperation between intelligent, self-interested and rational players.
The increasing need for independent, autonomous and decentralized communication systems has sparked much interest in using game theoretic techniques in the communication literature [7]. More specifically, the aforementioned distributed, multistatic beamforming and resource allocation problem in radar systems can be compared to similar issues raised in multicell wireless systems in communication applications [8]- [16]. In [8], the authors introduced the idea of joint beamforming and power control, proposing an iterative algorithm to simultaneously obtain the optimal beamforming and power vectors. The incorporation of game theory in this context then rapidly became a focal point in communications research [9]- [14]. The majority of this literature considers the technique of strategic noncooperative games (SNG), where each player selfishly maximizes its payoff function, given the strategies of the other players. The authors of [9] exploited an iterative water-filling algorithm to reach the Nash equilibrium in a non-cooperative, distributed, multiuser power control problem. Since each player greedily optimizes its utility function, the equilibrium might not be the Pareto-optimal solution. Introducing pricing policies to the system resources leads to a more Pareto-efficient solution and increases the social welfare of the system. A pricing regime that is a linear function of the transmit power was studied in [10]. Another example of pricing the transmit power of each player is considered in [11], whereas in [12] and [13] the pricing policy is applied on the intercell interference among the players. In [14], the authors consider the optimization of a set of precoding matrices at each node of a multi-channel, multi-user cognitive radio MIMO network in order to minimize the total transmit power of the network, while applying a pricing scheme based on global information. Cooperative game theoretic techniques combined with a two-level Stackelberg game were utilized in [15] to address the problem of relay selection and power allocation without the knowledge of channel state information (CSI). Finally, the authors in [16] formulated a Stackelberg Bayesian game to obtain the optimal power allocation for a two-tier network, while applying an interference constraint at the leader and considering channel gain uncertainty.
Game theory is also an efficient tool to overcome various problems that arise in radar systems. In particular, the authors in [17] approached the problem of polarimetric waveform design by considering a zero-sum game between an opponent and the radar system engineer. The zero-sum game was also used in [18] to investigate the interaction between a MIMO radar and an intelligent target, that applies jamming techniques. Potential game theory was exploited in [19] with the main objectives of optimal waveform design and maximization of the signal-tointerference plus noise ratio (SINR). A non-cooperative game theoretic per antenna power optimization based on signalto-disturbance ratio (SDR) estimation with a desired SINR constraint was investigated in [20]. Non-cooperative game theory was also employed in [21] to facilitate the power control problem in a radar network. To address the power allocation problem the authors of [22] used a cooperative game approach and exploited the Shapley value solution scheme.
In this paper, inspired by the aforementioned game theoretic methods applied in communications [10]- [14], although reinvestigated to adapt to the radar case, we have developed a broad game theoretic analysis for the optimal beamforming and resource allocation problem in a MIMO tracking radar system with multiple targets. Initially, we consider an SNG, where each radar/player greedily optimizes the beamforming and power allocation vectors in two stages. In the first stage, the optimal transmit and receive beampatterns are designed by exploiting convex optimization techniques in a power minimization problem, while attaining a certain detection criterion. After designing the optimal beampatterns, the primary joint beamforming and resource allocation problem reduces to a power only minimization game. Thus, in the second stage of the game we obtain the best response strategy of a radar in an SNG setup and show that it is a standard function [32], which proves the uniqueness of the Nash equilibrium, similar to the work in [12] for wireless communication applications.
The fact that each radar acts selfishly and does not take into account the damage it may inflict to other radars, through interradar interference, leads to a solution that may not be optimal from a social welfare point of view. Since we assume that the radars belong to the same organization, it is safe to consider some sort of cooperation and introduce a pricing policy to all players in order to minimize the interference induced to other radars. More specifically, the radars are encouraged to steer their beams in directions that cause less damage to other players, which results in a more Pareto-optimal solution.
In order to complete our radar model, we incorporate a surveillance radar as part of the previously studied MIMO tracking radar system. The main application of the surveillance radar is to continuously search the operating area for new incoming targets. By adding a surveillance radar, our hybrid radar system is capable of both acquiring new targets and tracking every target in an operating field. However, all radars operate simultaneously and hence the tracking radars interfere with the surveillance radar and increase the probability of false alarm. In order to secure the smooth operation of the system, we set a maximum limit of interference induced at the surveillance radar. In order to achieve both the target SINR and to guarantee the interference limit at the surveillance radar, we utilize a Stackelberg game approach. In particular, the surveillance radar is the leader and the MIMO tracking radars are the followers in the hierarchy of the game. We next introduce the system model. Fig. 1: A multistatic MIMO radar network with two radars and two targets.

II. SYSTEM MODEL
We consider a multistatic radar network that consists of K separate radars each consisting of M transmit/receive antennas. The set of radars is denoted by C = {1, . . . , K}. In order to complete the model, L targets are assumed in the far-field of the radars, so that the main objective for each radar is to attain a specific detection performance for every target using the minimum possible transmission power. In the noncooperative design of the multistatic radar network, the radars try to minimize their transmission power independently, having full knowledge of the uplink and the downlink channels of their own radar, whereas they have no knowledge of the inter-radar channel gains. Since we consider that the radars belong to the same organization, the design of the model is not competitive, as there is no deliberate interference between the radars. However, as we do not assume communication between radars a noncooperative game is appropriate. An example of a multistatic radar network with two radars, two targets and clutter in the far-field is illustrated at Fig.1.
In order to detect the l th target, the transmit array of the k th radar emits the l th element of the independent, predesigned waveform vector ψ k (t) = [ψ k1 (t), . . . , ψ kL (t)] T of size L × 1, which satisfies the orthogonality condition T0 ψ k (t)ψ H k (t)dt = I L , where (·) T denotes the transpose operator, t refers to the time index within the radar pulse, T 0 is the radar pulse width, I L is the L × L identity matrix, and (·) H denotes the Hermitian transpose operator. Thus, the waveforms corresponding to different targets are not correlated, i.e. T0 ψ kl (t)ψ kl (t)dt = 0, where l = l . We assume that the waveform vector maintains the orthogonality condition for a set of acceptable time delays τ a , τ a and Doppler frequency shifts f Da , f Da , such as [23]: However, if the waveforms arrive with considerable delays and Doppler shifts, we may expect nonzero correlation between waveforms. This correlation factor is denoted as: where τ l,l is the relevant delay of the waveform returned from the l th target as compared to the delay of the waveform returned from l th target. The relative difference in Doppler frequency is given by ∆f = f Dl − f Dl . This introduces interference between the signals returning from different targets, as discussed later. The M × 1 vector which consists of the complex elements of the signal transmitted from the k th radar and intended for the l th target is of the form where w t(k,l) is the M × 1 transmit beamforming vector from radar-k to target-l. Hence, the overall transmitted signal from radar-k is As depicted in Fig.1, h kl is the channel gain vector from targetl to radar-k, c kl denotes the interfering signal returns from the clutter when the k th radar tags target-l. The cross-channel gain between radar-k and radar-i is denoted as µ ki and λ kij represents the inter-radar interfering signal channel at the k th radar echoing from the j th target and emitted from the i th radar. The uplink and downlink parts of the path gains can be obtained by the following equations with respect to the transmit beamforming vectors and the receive beamforming vectors respectively: is the M × 1 receive weight vector for radar-k when aimed at target-l, β l is the complex amplitude proportional to the radar cross section (RCS) of target-l, β cl denotes the RCS amplitude of the clutter and a(θ kl ) and b(θ kl ) are the M × 1 transmit and receive steering vectors for radar-k respectively as defined below: where d is the distance between the adjacent antennas and is considered the same for all radars, θ kl is the azimuth direction of target-l by considering radar-k as reference, θ cl(k) is the direction of the clutter as seen from the k th radar and θ rad(k,i) is the direction of radar-i as observed from radar-k and λ is the wavelength of the transmitted signal. From the definition, it is apparent that the transmit and receive steering vectors are equal, as the uplink and downlink channels remain constant over the duration of a full game. By matched-filtering at the receiver of radar-k each of the orthogonal waveforms ψ kl (t − τ l )e j2πf Dl t , l = 1, ..., L, the desired received signal for the detection of target-l is obtained by Considering a distributed, multistatic and multitarget radar scheme, the detection of a target is deteriorated by direct and collateral inter-radar interference, in addition to the interference induced by the signals intended for other targets by the same radar, the clutter effect and the noise power. As a result, the interference signal can be modeled as where k,l,m,j (τ l,j ) denotes the correlation factor between the waveform emitted from the k th radar and echoed by the l th target and the waveform emitted from the m th radar but echoed by the j th target.
Since we defined the desired and interfering signals for radar-k regarding target-l in (1) and (2), the relevant SINR is straightforwardly defined as where || · || denotes the Euclidian norm. Using the above system model, the next section describes the game theoretic formulation of the proposed scheme.

A. Game Theoretic Formulation
In order to determine the optimal transmit/receive beamformers and power allocation between the radars, we incorporate an SNG. The various radars are considered as players, and therefore the player set is denoted by C = {1, . . . , K}. Consider the transmit beamforming weight vector matrix W t(k) = {w t(k,1) , . . . , w t(k,L) } as the strategy of player-k and the matrix W t(−k) as the strategy chosen by the other players. Hence, we define the acceptable strategy set for radark as where γ kl is the desired SINR for target-l when targeted from the antennas of radar-k. The decision on the desired SINR depends on the probabilities of misdetection P md and false alarm P f a , which are derived from the following equations [24], [25]: where ξ kl denotes the threshold of the generalized likelihood ratio test (GLRT), applied to determine if there is absence or presence of a target [25] and N is the number of samples used for the GLRT. We define a specific design parameter ε kl to set an upper bound on the tolerance regarding P md and P f a . Hence, the optimum SINR kl for each radar regarding each target can be determined as [20], [21]: It is evident from (3) that the SIN R kl for player-k is a function of the beamforming weight vectors (which include transmission power) of all players. Hence, the set of admissible strategies P k (W t(−k) ) for radar-k depends on the beamforming weight matrix W t(−k) of every other player (radar).
The last component required to complete the game is the utility function for each player, which is defined as u k (W t(k) ) = W t(k) 2 F representing the transmit power of player-k, where || · || F denotes the Frobenius norm. The game is summarized as In the SNG considered, given the beamforming strategies of the other players, each player selfishly minimizes its power allocation subject to a predefined detection criterion. As a result, the best response strategy for player-k is the result of the following optimization: T is the total interference induced by all other radars except radar-k plus the additive white Gaussian noise (AWGN) from the environment vector. For target-l, it is defined as One of the main objectives of this work is to investigate whether the game G converges to a stable point, where no player can profit by unilaterally changing its beamforming strategy, as it will lead to higher power consumption to achieve the same SINR for every target. Such a point is a Nash Equilibrium (NE) and for the game considered, it is defined as the strategy set , ∀k ∈ C In the next section we will determine the optimal beampatterns and investigate the best response strategy. We will also prove the existence and uniqueness of the NE of the game G.

B. Convex Optimization Beamforming and the Best Response Strategy
Convex optimization has been widely utilized in the radar beamforming literature. Most of the work concentrates on designing the beamforming vectors in order to approximate a desired beampattern, decided by the target position [26]- [29]. In the first stage of this analysis, we determine the optimal beampattern for every radar corresponding to each of the targets using convex optimization techniques. After securing the optimal beampatterns, each player should just allocate the minimum possible transmission power, while minimizing the inter-radar interference and achieving a certain detection performance.
The optimal transmit beampatterns for each radar can be designed by solving the following optimization problem: The optimization in (5) can be converted to semidefinite programming (SDP) using the rank relaxation method and solved as in [30] and [31]. The optimal receive weight vectors can be found using generalised eigenvector techniques.
Claim 1: The optimal transmit and receive beampatterns are independent of the inter-radar interference r −k .
Proof: The proof can be found in Appendix A. Hence, when the radars reallocate the power of transmission, the inter-radar interference plus noise vector r −k is modified. From Claim 1, radar-k retains the optimal beampatterns derived from (5), however reallocates only its transmission power for each target, in order to achieve the detection criterion. This observation is similar to that considered in wireless communication applications [12], regardless of the appearance of additional clutter in the denominator of the SINR equation in (5). As a result, after obtaining the optimal transmit/receive beamforming vectors, we can reformulate the initial optimization problem (4) as a power minimization problem shown in (6) is the normalized optimal transmit weight vector and p kl is the power used by radar-k on the beam directed to target-l. At this point, by redefining the acceptable strategy as P k (p −k ) = {p k ∈ R L + | SIN R kl ≥ γ kl , ∀l} and the utility function as u k (p k ) = L l=1 p kl , game G becomes a power allocation SNG: In order to prove the existence and the uniqueness of the NE of game G, we need to show that the best response strategy for every player is a standard function. We note that all the constraints must be active at the optimal power allocation. As a result, the inequality in the constraints of (6) can be replaced by equality and can be written as: where G k ∈ R L×L and its elements are defined as The solution of (7) provides the optimal power allocation for (6). Following Claim 2 in [12], the problem (6) is always feasible ∀r −k > 0 elementwise. As a result, the matrix G k must be invertible so we can straightforwardly obtain the best response strategy for the k th cluster as: The existence of the solution is guaranteed through the Arrow-Debreu theorem [33]. Since the NE exists, the uniqueness of this NE is proved by establishing the best response function is standard [12]. We define the inter-cluster interference matrix from the m th radar to the k th radar as G mk ∈ R L×L and [G mk ] i,j = |ŵ H t(k,i) λ r(kmj) | 2 + |ŵ H t(k,i) µ r(km) | 2 . Hence, by replacing the interference vector r −k , we can restate the best response strategy as: (9) where 1 L denotes the all ones vector of size L × 1. Lemma 1: The best response function (9) is a standard function.
Proof: The best response strategy (9) satisfies the following necessary properties for all p ≥ 0: a) Positivity: BR k (p) > 0, as G −1 k is a positive matrix straightforwardly from (8) and G mk is a positive matrix from its definition. b) Monotonicity: If p ≥ p , then: By applying a pricing policy to each player we introduce some cooperation among them, which leads to a more Pareto efficient solution, as described in the next section.

IV. BEAMFORMER DESIGN AND POWER ALLOCATION GAME WITH PRICING A. Game Theoretical formulation
Since each radar optimizes its beamformers and power allocation greedily, the equilibrium point is not necessarily the best solution from a social fairness point of view. This is explained because each player ignores the direct path interference it induces on other players. In order to obtain a more Pareto efficient solution and to increase the social welfare of the SNG, we introduce a pricing scheme applied to each radar's utility function. As a result, the players are encouraged to allocate their available resources more efficiently by minimizing the direct path interference induced to the other radars.
In order to achieve the aforementioned advantages, each radar/player needs to have information about the channel to the other radars in the system. Since we assume that the radars belong to the same organization, the knowledge of the channels between the radars is justified, as each radar knows the exact position of the others. Hence, each radar performs the following optimization: where κ kmi is the price charged to radar k for the interference it induces to radar m when aiming at target i and w t(k,i) µ r(km) 2 denotes the corresponding interference. The aforementioned optimization encourages each player to adopt a more socially efficient power allocation strategy by steering its beampattern to the desired target, while keeping the sidelobes at the direction of the other players low and therefore causing less interference to other radars. As a result, the efficiency of the system as a whole is improved, yet the distributed nature of the game is preserved.
In order to reformulate the SNG G to a more cooperative game with pricing cosideration, we just need to redefine the utility function of radar k as The mathematical form of the pricing game is:

B. Optimal Beamforming and the Best Response Strategy
In this section, we design the optimal transmit and receive beamformers and the best response strategy for each of the players. Therefore, we exploit the fact that the optimization problem (10) can be reformulated as a convex optimization problem with second order cone (SOC) constraints [30]. This important property allows us to obtain the optimal solution via duality. The Lagrangian associated with the optimization problem (10) can be written as: where λ k = [λ k1 , . . . , λ kL ] T is the L × 1 vector of the Lagrangian multipliers associated with the SINR inequality constraints of the problem in (10). The Lagrangian can be reorganized as: where Ω k (κ kmi ) = K m =k L i=1 κ kmi µ r(km) µ H r(km) + I. At this point, we define the Lagrange dual function as the minimum value of the Lagrangian over W t(k) : is not positive semi-definite, the Lagrangian is unbounded below in W t(k) and the dual function can take the value −∞. Hence, the dual problem associated with (10) can be formulated as: As mentioned in [34] and [12], where the authors investigate the downlink beamforming problem for communications application, the dual problem (11) is analogous to the receive beamforming optimization problem presented in (12).
Since the constraints are satisfied with equality at optimality, the optimal Lagrangian multipliers can be obtained by applying the fixed point iteration [34], as shown in (13). As proved in [34], the fixed point iteration described in (13) is shown to be a standard function and is guaranteed to converge to a unique solution, if the optimization problem (11) is feasible. min λ k1 ,...,λ kL w r(k,1) ,...,w r(k,L) L l=1 λ kl r −kl (12) s.t.
Subsequently, the optimal receive weight vector is the minimum mean-square error (MMSE) receiver, obtained as the following equation: Following [35], we can obtain the optimal transmit beamformer as a scaled version of the receive weight vector, w t(k,l) = δ k,l w r(k,l) , where δ k,l is a scalar factor. The scaling factors δ k,l can be found by exploiting the fact that the SINR constraints in (10) are met with equality at optimality. Hence by replacing w t(k,l) = δ k,l w r(k,l) into the SINR constraints, the scaling factors can be found from the following equation: where δ k = [δ k1 , δ k2 , . . . , δ kL ] T and F ∈ R L×L and is defined as Having decided the optimal transmit and receive beamformers, the solution of problem (10) is concluded. Similar to the game without pricing consideration, we can reformulate the initial optimization problem (10) as a power minimization problem. Following the same analysis as in Section III and by denoting the power vector of radar k as π k ∈ R L + , the best response strategy for the k th radar can be obtained from the following equation: where ∆ k ∈ R L×L and is defined as Moreover, we denote the inter-radar interference matrix from the m th radar to the k th radar as ∆ mk ∈ R L×L and [∆ mk ] i,j = |w H t(k,i) λ kmj | 2 + |w H t(ki) µ r(km) | 2 . Consequently, by replacing the interference vector r −k = K m =k ∆ mk p * m + 1σ 2 n we can redefine the best response strategy as: The best response function (16) of the game with pricing consideration is a standard function.
Proof: The proof is identical to that in Lemma 1.
In the next section we present a hierarchical strategic game, known as Stackelberg game.
V. STACKELBERG GAME SYSTEM MODEL Fig. 2: A hybrid distributed MIMO radar network with a surveillance radar, two tracking radars and two targets.
In this section, we consider a hybrid MIMO network. More specifically, in addition to the multistatic tracking radar network mentioned in Section II, we incorporate a surveillance radar as part of the network, as seen in Fig.2. We assume that all radars belong to the same organization and operate in the same field. As a result, the tracking radars may interfere with the surveillance radar and deteriorate its performance (increase the probability of false alarm). In order to guarantee the unimpeded operation of the system, the interference observed at the surveillance radar must not exceed a specific value, as shown below: K k=1 L l=1 |q H r(sur) g kl | 2 ≤ I max (18) where g kl = w H t(k,l) a(θ sur(k) ) denotes the interfering signal in the direction of the surveillance radar when the k th tracking radar tags target l, θ sur(k) is the direction of the surveillance radar as observed from the k th tracking radar, and I max is the maximum interference allowed. Since there is no transmit or receive beamformer at the surveillance radar, its receive filter q r(sur) is a complex scalar.
In order to guarantee constraint (18), an interference cost can be imposed on every tracking radar in order to minimize their effect on the surveillance radar. Thus, a similar pricing mechanism to the previous section can be applied to every radar with the main objective to minimize the direct path interference to the surveillance radar. Owing to the fact that all radars belong to the same organization, we can safely assume that the information of the inter-radar channels is given. Similarly to the previous section, each tracking radar performs the following optimization: where κ sur is the pricing factor of interference, which is equally imposed by the surveillance radar to all tracking radars. This interaction between the radars can be translated to a power allocation Stackelberg game, where the surveillance radar is the leader and the tracking radars are the followers. The strategy of the leader is the price of interference charged to the followers and the leader's utility function is its profit, which is defined as: Based on the price imposed by the leader, the followers decide their best response strategy as the result of the optimization in (19).

A. Followers' Game
Since the followers know the price of interference announced by the leader, they decide their optimal beamformers and resource allocation by solving the optimization problem in (19). In order to formulate the followers' game, we observe that this game is similar to the game G pr , when we redefine the utility function of player k as s k (W t(k) ) = W t(k) Hence the mathematical representation of the followers' game is: Following the same analysis as for game G pr , the optimal beamforming vectors can be derived by exploiting the duality properties of the convex optimization problem (18). Hence, respectively to the receive weight vector optimization problem (12), we address the optimization problem in (21).
We denoteλ k = [λ k1 , . . . ,λ kL ] T as the L × 1 vector of the Lagrangian multipliers associated with the inequality SINR constraints of problem (19), Ω k (κ sur ) = L i=1 κ sur g kl g H kl +I andw r(k,l) as the M × 1 receive weight vector for radark regarding target-l for the study of the Stackelberg game. Similar to (13), we obtain the optimal Lagrangian multipliers from (22) (the fixed point iteration in (22) is a standard function and admits a unique solution [34]) and the optimal receive beamformers through the MMSE receiver as: The optimal transmit beamformers are scaled versions of the optimal receive weight vectors: w t(k,l) = δ k,lwr(k,l) (24) Correspondingly to the method of G pr and by indicating the power vector of radar k as ρ k ∈ R L + , the best response strategy for the k th radar can be obtained from the following equation: where Ξ k ∈ R L×L and is defined as Furthermore, we denote the inter-radar interference matrix from the m th radar to the k th radar as Ξ mk ∈ R L×L and [Ξ mk ] i,j = |w H t(ki) λ r(kmj) | 2 + |w H t(ki) µ r(km) | 2 . Consequently, by replacing the interference vector r −k = K m =k Ξ mk p * m + 1σ 2 n we can redefine the best response strategy as: The study on the existence and the uniqueness of the solution is similar to the one in Section II. min λ k1 ,...,λ kL w r(k,1) ,...,w r(k,L) L l=1λ kl r −kl (21)

B. Leader's Game
From the definition of the Stackelberg game, the leader knows the best response strategy of the followers. Likewise in our model, the surveillance radar is aware of the existence of the tracking radars, as they belong to the same organization, and can determine the followers best response strategy. Hence, the leader's optimal strategy is extracted from the following optimization problem, where the leader's profit is maximized, while the interference is constrained under a maximum value to guarantee the efficient performance of the surveillance radar.
In order to determine the optimal price imposed by the leader to the tracking radars and solve the optimization problem (27), we adopt the learning algorithm for the leader as proposed in [16]. Initially, we determine the price κ * sur , where the constraint of the optimization problem (27) is met with equality: Hence, since the interference is a decreasing function of the price imposed by the leader, the constraint can be guaranteed when the price charged to the followers is not less than κ * sur , i.e. κ sur ≥ κ * sur . In Algorithm 1, we assume α is the learning rate of the algorithm (α > 0) and κ t sur is the price imposed by the leader at iteration t.

VI. SIMULATION RESULTS
In this section, we present some simulation results to illustrate the performance of the beamformers and the convergence of the resource allocation methods for all three different games, which are the beamformer design and power allocation SNG, the beamformer design and power allocation game with pricing policy and the Stackelberg game. Thus, we consider a bistatic network of two tracking MIMO radars, where each one consists of 10 transmit/receive antennas with half-wavelength Algorithm 1: Learning algorithm for optimization problem (27) 1 Set an initial price κ 1 sur = κ * sur determined at the equality of the constraint of optimization problem (27); 2 Determine an increment ∆κ sur and set the second price value as: κ 2 sur = κ * sur + ∆κ sur 3 Set t = 1 4 while the convergence is not reached do: 5 Obtain the best response strategies for the tracking radars, by playing the followers' game at price κ t sur 6 Calculate the profit of the leader s lead at price κ t sur 7 Determine the new price from the following learning equation: 9 Set t = t + 1 10 end while spacing between adjacent antennas. The referential direction of the second radar as seen from the first radar is θ rad(1,2) = 72 o and θ rad(2,1) = −75 o conversely. Moreover, we assume two targets placed at directions θ 11 = 37 o , θ 12 = 22 o as observed from the first radar and θ 21 = −38 o , θ 22 = −12 o using the second radar as reference. Furthermore, we assume strong clutter as a focal point with directions θ cl(1) = 52 o from radar-1 and θ cl(2) = −54 o using the second radar as reference. The complex amplitudes of the targets and the clutter radar cross sections are equal to β 1 = β 2 = β cl = 1. The background noise is considered as AWGN with variance 0.4 and the correlation factors between the waveforms for different targets l = l are fixed to be equal to 0.1 ( k,l,l = k,l,m,l = 0.1).
A. Comparison of the SNG and the coordinated game with pricing consideration The first stage of the algorithm refers to the design of the optimal transmit and receive beamformers. In particular, for the SNG we obtain the aforementioned beamformers using convex semidefinite programming methods for the optimization problem (5), whereas for the coordinated game with pricing policy we exploit the duality properties of the optimization problem (10) and we find the transmit and receive weight vectors using the solution of the dual problem (14). It is obvious that in both games the beampatterns are concentrated on the desired target by maintaining very low sidelobe levels in other directions. Figs. 3-6 clearly depict the tendency towards social welfare of the game with pricing consideration, since the beampatterns of the first player enforce deep nulls at the direction of the other player, minimizing the interference leakage.    The resource allocation optimization is considered at the second stage of the algorithms for both games compared. Before the initialization of the games, we decide the detection criterion for each player by setting the SINR targets at 7 for radar 1 (γ 11 = γ 12 = 7) and 6.5 for radar 2 (γ 21 = γ 22 = 6.5) for both games. We also set the maximum number of game iterations at T = 40 to study the convergence of the algorithms. Figs. 8-9 depict the resource allocation update for each (a) Without pricing consideration.
(b) With pricing consideration. Fig. 6: Comparison of the transmit beampatterns for player 2 aiming at target 2 (dB).
radar aiming each target. Power allocation using both methods clearly converges to a unique solution. Comparing Fig. 8 to Fig. 9 the advantages of the coordinated design with pricing are obvious, since the transmit power of each radar is lower compared to that of the SNG without pricing consideration. This result shows that due to the reduced interference among the radars using the coordinated design, as displayed in Fig.7, each player needs less power to attain the SINR target, and hence the resource allocation for this game is more efficient.

B. Stackelberg Game
The surveillance radar is placed at direction θ sur(1) = 65 o as observed from the first tracking radar and θ sur(2) = −67 o using the second radar as reference. Based on the price announced by the leader, the followers decide their optimal beamformers and power allocation by following game G f ol . The transmit weight vectors and the power allocation of the followers, when the price set by the leader is κ sur = 7.4 are depicted in Figs. 10-11 and Fig. 12, respectively. It is clear that the beampatterns of both the followers are steered away from the direction of the leader and hence the interference leakage to the surveillance radar is minimized.
In order to find the optimal value of the price set by the leader, we solve the optimization problem in (27) incorporating the learning algorithm from Section V. We set the maximum interference allowed at the surveillance radar as I max = 0.0103 and the learning rate as α = 0.2. For this interference threshold, the corresponding price is determined as κ sur = 7.4, which we consider as the initial price for the leader's game. The convergence of the price set by the leader is shown in Fig. 13. As expected, the algorithm rapidly converges to the starting price κ sur = 7.4, which is the minimum price so that the leader's interference constraint is secured.

VII. CONCLUSION
We have investigated a game theoretic approach to tackle the problem of joint beamforming and power allocation in a distributed radar network. At first, we studied an SNG, without any coordination among the radars/players. Thus each player greedily decides its optimal beamformers and power allocation. Furthermore, we incorporated a pricing mechanism to minimize the inter-radar interference and to improve the social welfare of the network. The simulation results confirm Fig. 13: Convergence of the price imposed by the leader. that this partially coordinated game provides a more Parettoefficient Nash equilibrium. Additionally, we formulated a Stackelberg game by introducing a surveillance radar within the network and studied the convergence of both the followers' and the leader's games. Finally, the proofs for the existence and the uniqueness of both the partially coordinated and the noncooperative games have also been presented.

APPENDIX A PROOF OF CLAIM 1
In order to prove the optimal beampatterns independence of the inter-radar interference, we investigate the dual problem of the optimization problem (5). The Lagrangian associated with the aforementioned problem is given as: where λ k = [λ k1 , . . . , λ kL ] T is the L × 1 vector of the Lagrangian multipliers associated with the SINR inequality constraints of the problem in (10). The Lagrangian can be reformulated as: Subsequently, we write the Lagrange dual function as: λ kl c r(ki) c H r(ki) must be positive semi-definite, for the dual problem to be feasible. Hence, the dual problem associated with (5) can be designed as: Following [34] and [12], the dual problem (30) can be solved through the receive beamforming optimization problem in (31).
Since the constraints are satisfied with equality at optimality, the optimal Lagrangian multipliers can be derived by applying the fixed point iteration method as in (32) [36]. It is also shown in [36] that the fixed point iteration function in (32) belongs to the framework of standard functions. Thus, the aforementioned iteration process is guaranteed to converge to a unique solution, if the respective optimization problem is feasible.
The optimal receive weight vector is the minimum meansquare error (MMSE) receiver, obtained from the following equation: h t(kl) (33) Following [35], the optimal transmit beamformer can be obtained as a scaled version of the receive weight vector w r(k,l) . Thus, it is clear that the optimal transmit and receive beampatterns are independent of the inter-radar plus noise vector r −k .
ACKNOWLEDGMENT This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) Grant number EP/K014307/1 and the MOD University Defence Research Collaboration (UDRC) in Signal Processing. We would also wish to thank the Associate Editor and all the reviewers for their very valuable and insightful comments during the revision of this work. min λ k1 ,...,λ kL w r(k,1) ,...,w r(k,L) L l=1 λ kl r −kl (31) s.t. λ kl |w H r(k,l) h t(kl) | 2 L j =l λ kj |w H r(k,l) h t(kj) | 2 + L i=1 |w H r(k,l) c t(ki) | 2 + w H r(k,l) w r(k,l)  He is also a Guest Professor at Harbin Engineering University. He has served as advisor to more than 70 PhD graduates and published more than 500 outputs. Dr Chambers is a Fellow of the Royal Academy of Engineering, the Institution of Electrical Engineers, and the Institute of Mathematics in its Applications in the UK. He has served as an Associate Editor for IEEE TRANSACTIONS ON SIGNAL PROCESSING for three terms over the periods [1997][1998][1999][2004][2005][2006][2007], and as a Senior Area Editor 2011-2015.