Content Placement in Cache-Enabled Sub-6 GHz and Millimeter-Wave Multi-antenna Dense Small Cell Networks

This paper studies the performance of cache-enabled dense small cell networks consisting of multi-antenna sub-6 GHz and millimeter-wave base stations. Different from the existing works which only consider a single antenna at each base station, the optimal content placement is unknown when the base stations have multiple antennas. We first derive the successful content delivery probability by accounting for the key channel features at sub-6 GHz and mmWave frequencies. The maximization of the successful content delivery probability is a challenging problem. To tackle it, we first propose a constrained cross-entropy algorithm which achieves the near-optimal solution with moderate complexity. We then develop another simple yet effective heuristic probabilistic content placement scheme, termed two-stair algorithm, which strikes a balance between caching the most popular contents and achieving content diversity. Numerical results demonstrate the superior performance of the constrained cross-entropy method and that the two-stair algorithm yields significantly better performance than only caching the most popular contents. The comparisons between the sub-6 GHz and mmWave systems reveal an interesting tradeoff between caching capacity and density for the mmWave system to achieve similar performance as the sub-6 GHz system.


I. INTRODUCTION
The global mobile data traffic continues growing at an unprecedented pace and will reach 49 exabytes monthly by 2021, of which 78 percent will be video contents [1]. To meet the high capacity requirement for the future mobile networks, one promising solution is network densification, i.e., deploying dense small cell base stations (SBSs) in the existing macrocell cellular networks. Although large numbers of small cells shorten the communication distance, the major challenge is to transfer the huge amount of mobile data from the core networks to the small cells and this imposes stringent demands on backhaul links. To address this problem, caching popular contents at small cells has been proposed as one of the most effective solutions, considering the fact that most mobile data are contents such as video, weather forecasts, news and maps, that are repeatedly requested and cacheable [2]. The combination of small cells and caching will bring content closer to users, decrease backhaul traffic and reduce transmission delays, thus alleviating many bottleneck problems in wireless content delivery networks. This paper focuses on the caching design at both sub-6 GHz (µWave) and millimeterwave (mmWave) 1 SBSs in dense small cell networks.
A. Related Works 1) Caching in µWave and mmWave networks: MmWave communication has received much interest for providing high capacity because there are vast amount of inexpensive spectra available in the 30 GHz-300 GHz range. However, compared to µWave frequencies, mmWave channel experiences excessive attenuation due to rainfall, atmospheric or gaseous absorption, and is susceptible to blockage. To redeem these drawbacks, mmWave small cells need to adopt narrow beamforming and be densely deployed in an attempt to provide seemless coverage [3][4][5]. The study of content caching applications in mmWave networks is of great importance, due to the fact that mmWave will be a key component of future wireless access and content caching at the edge of networks is one of 5G service requirements [5]. Cache assignment with video streaming in mmWave SBSs on the highway is discussed in [6] and it is shown to significantly reduce the connection and retrieval delays. Certainly, combining the advantages of µWave and mmWave technologies will bring more benefits [7]. Caching in dual-mode SBSs that integrate both µWave and mmWave frequencies is studied in [8], where dynamic matching game-theoretic approach is applied to maximize the handovers to SBSs in the mobility management scenarios. The proposed methods can minimize handover failures and reduce energy consumption in highly mobile heterogeneous networks. Dynamic traffic in cache-enabled network was studied in [9].
Recent contributions also pay attention to the caching in MIMO networks such as [10][11][12]. In [10,11], cache-enabled cooperative MIMO framework for wireless video streaming is investigated. In [12], coded caching for downlink MIMO channel is discussed.
2) Optimization of content placement: Content placement with finite cache size is the key issue in caching design, since unplanned caching in nearby SBSs will result in more interference. The traditional method of caching most popular content (MPC) in wired networks is no longer optimal when considering the wireless transmission. A strategy that combines MPC and the largest content diversity caching is proposed in [13], together with cooperative transmission in cluster-centric small cell networks. This strategy is extended to the distributed relay networks with relay clustering in [14] to combat the half-duplex constraint, and it significantly improves the outage performance. A multi-threshold caching that allows BSs to store different number of copies of contents according to their popularity is proposed in [15], and it allows a finer partitioning of the cache space than binary threshold, but its complexity is exponential in the number of thresholds.
Probabilistic content placement under random network topologies has also been investigated. In [16], the optimal content caching probability that maximizes the hit probability is derived. The results are extended to heterogeneous cellular networks in [17] which shows that caching the most popular contents in the macro BSs is almost optimal while it is in general not optimal for SBSs.
3) Caching in Heterogeneous Networks: Extensive works have been carried out to understand the performance gain of caching for heterogeneous networks (HetNets) and stochastic geometry is the commonly used approach. In [18], the optimal probabilistic caching to maximize the successful delivery probability is considered in a multi-tier HetNet. The cacheenabled heterogeneous signal-antenna cellular networks are investigated in [19]. The optimal probabilistic content placement for the interference-limited cases is derived, and the result shows that the optimal placement probability is linearly proportional to the square root of the content popularity with an offset depending on BS caching capabilities. Caching policies to maximization of success probability and area spectral efficiency of cache-enabled HetNets are studied in [20], and the results show that the optimal caching probability is less skewed to maximize the success probability but is more skewed to maximize the area spectral efficiency. The work of [21] proposes a joint BS caching and cooperation for maximizing the successful transmission probability in a multi-tier HetNet. A local optimum is obtained in the general case and global optimal solutions are achieved in some special cases. Cache-based channel selection diversity and network interference are studied in [22] in stochastic wireless caching helper networks, and solutions for noise-limited networks and interference-limited networks are derived, respectively.

B. Contributions and Organization
The existing caching design for SBSs are restricted to the single-antenna case and mainly for the µWave band. Little is known about the impact of multiple antennas at the densely deployed SBSs and the adoption of mmWave band on the successful content delivery and the optimal content placement. Analyzing multi-antenna networks using stochastic geometry is a known difficulty, as acknowledged in [23]. In contrast to existing works, in this paper we analyze the performance of caching in multi-antenna SBSs in µWave and mmWave networks, and propose probabilistic content placement schemes to maximize the performance of content delivery. The main contributions of this paper are summarised as follows: ‚ Derivation of successful content delivery probability (SCDP) of multi-antenna SBSs. We use stochastic geometry to model wireless caching in multi-antenna dense small cell networks in both µwave and mmwave bands. The SCDPs for both types of cache-enabled SBSs are derived. The results characterize the dependence of the SCDPs on parameters such as channel effects, caching placement probability, SBS density, transmission power and number of antennas. ‚ Development of a near-optimal cross-entropy optimization (CEO) method for a general distribution of content requests. The derived SCDPs do not admit a closed form, and are highly complex to optimize. To tackle this difficulty, we first propose a constrained CEO (CCEO) based algorithm that optimizes the SCDPs. The original unconstrained CEO algorithm is a stochastic optimization method based on adaptive importance sampling that can achieve the near-optimal solution with moderate complexity and guaranteed convergence [24]. We adapt this method to deal with the caching capacity constraints and the probabilities constraints in our problem. ‚ Design of a simple heuristic content placement algorithm.
To further reduce the complexity, we propose a heuristic two-stage algorithm to maximize the SCDP via probabilistic content placement when the content request probability follows the Zipf distribution [25]. The algorithm is designed by combining MPC and caching diversity (CD) schemes while taking into account the content popularity. The solution demonstrates near-optimal performance in single-antenna systems, and various advantages in multiantenna scenarios. ‚ Numerical results show that in contrast to the traditional way of deploying much higher density SBSs or installing many more antennas, increasing caching capacity at mmWave SBSs provides a low-cost solution to achieve comparable SCDP performance as µWave systems. The rest of this paper is organized as follows. The system model is presented in Section II. The analysis of SCDPs for µWave and mmWave systems are provided in Section III. Two probabilistic content placement schemes are described in Section IV. Simulation and numerical results as well as discussions are given in Section V, followed by concluding remarks in Section VI.

II. SYSTEM MODEL
We consider a cache-enabled dense small cell networks consisting of the µWave and mmWave SBSs tiers. In such networks, each user equipment (UE) in a tier is associated with the nearest SBS that has cached the desired content, and the optimal designs of content placement under such association assumption can address the concern that operators are required to place the content caches close to UEs [26]. We assume that there is a finite content library denoted as F :" tf 1 , . . . , f j , . . . , f J u, where f j is the j-th most popular content and the number of contents is J, we assume each content has normalized size of 1 and each BS can only store up to M contents [15,19,22]. The analysis and optimization can be applied to the case of unequal content sizes. It is assumed that M ! J. The request probability for the j-th content is a j , and ř J j"1 a j "1. Without loss of generality, we assume the contents are sorted according to a descending order of a j .

A. Probabilistic Content Placement
We consider a probabilistic caching model where the content is independently stored with the same probability in all SBSs of the same tier (either µWave or mmWave) [16]. Let b j denote the probability that the j-th content is cached at a SBS. Fig. 1 shows an example of probabilistic caching with J " 7 and M " 4, where the contents tf 2 , f 3 , f 5 , f 7 u are cached at a SBS by drawing uniformly a random number which is 0.9 in this example. In the probabilistic caching strategy, the caching probability b " tb 1 , ...b j , ...b J u needs to satisfy the following conditions: Note that although the probabilistic caching strategy is used, implementation of it will allow each SBS to always cache the maximum amount of total contents up to its caching capacity M .

B. Downlink Transmission
In the considered downlink networks, each µWave SBS is equipped with N µ antennas, and each mmWave SBS has directional mmWave antennas. All UEs are single-antenna nodes, in the both µWave and mmWave, only one singleantenna user is allowed to communicate with the SBS at one time slot 2 . The positions of µWave SBSs are modeled by a homogeneous Poisson point process (HPPP) Φ µ with the density λ µ , and the positions of mmWave SBSs are modeled by an independent HPPP Φ mm with the density λ mm . Define Φ µ j and Φ mm j as the point process corresponding to all SBSs that cache the content j in the µWave tier and the mmWave tier with the density b j λ µ and b j λ mm , respectively. 2 In dense small cell networks, we assume that the density of users is much higher than the density of µWave or mmWave SBSs and this can be handled by using multiple access techniques [27]. 1) µWave Tier: In the µWave tier, the maximum-ratio transmission beamforming is adopted at each SBS. All channels undergo independent identically distributed (i.i.d.) quasistatic Rayleigh block fading. Without loss of generality, when a typical µWave UE located at the origin o requests the content j from the associated µWave BS X o that has cached this content, its received signal-to-interference-plus-noise ratio (SINR) is given by where P µ is the transmit power, h µ j " Γ pN µ , 1q is the the equivalent small-scale fading channel power gain between the typical µWave UE and its serving µWave SBS, where Γpk 1 , k 2 q denotes Gamma distribution, with a shape parameter k 1 and a scale parameter k 2 . The path loss is where β µ is the frequency dependent constant parameter and α µ is the path loss exponent. The σ 2 µ is the noise power at a µWave UE. The inter-cell interference I µ j and I µ j are given by In (3), Φ µ j zX o is the point process with density b j λ µ corresponding to the interfering SBSs that cache the content j, and Φ µ j " Φ µ´Φ µ j with density p1´b j q λ µ is the point process corresponding to the interfering SBSs that do not store the content j. The h i,o , h k,o " exp p1q are the interfering channel power gains that follow the exponential distribution, and |X i,o | , |X k,o | denote the distances between the interfering SBSs and the typical UE.
2) mmWave Tier: In the mmWave tier, we assume that the directional beamforming is adopted at each mmWave SBS and small-scale fading is neglected, since small-scale fading has little change in received power as verified by the practical mmWave channel measurements in [28]. Note that the traditional small-scale fading distributions are invalid for mmWave modeling due to mmWave sparse scattering environment [29]. Unlike the conventional µWave counterpart, mmWave transmissions are highly sensitive to the blockage. According to the average line-of-sight (LOS) model in [30,31], we consider that the mmWave link is LOS if the communication distance is less than D L , and otherwise it is none-line-of-sight (NLOS). Moreover, the existing literature has confirmed that mmWave transmissions tend to be noise-limited and interference is weak [30,32]. Therefore, when a typical mmWave UE requests the content j from the associated mmWave SBS that has cached this content, its received SINR is given by where P mm is the transmit power of the mmWave SBS, G mm is the main-lobe gain of using direction beamforming and equal to number of antenna elements [33]. The path loss is expressed as L`ˇˇY mm jˇ˘" β mm`ˇY mm jˇ˘´α with the distancěˇY mm jˇa nd frequency-dependent parameter β mm . The path loss exponent α " α L when it is a LOS link and α " α N when it is an NLOS link. The σ 2 mm is the combined power of noise and weak interference 3 .

III. SUCCESSFUL CONTENT DELIVERY PROBABILITY
In this paper, SCDP is used as the performance indicator, which represents the probability that a content requested by a typical UE is both cached in the network and can be successfully transmitted to the UE. We assume that each content has η bits, and the delivery time needs to be less than T . By using the Law of total probability, the SCDP in the µWave tier is calculated as where W µ is the µWave bandwidth allocated to a typical user (frequency-division multiple access (FDMA) is employed when multiple users are served by a SBS in this paper), and . Likewise, in the mmWave tier, the SCDP is calculated as where W mm is the mmWave bandwidth allocated to a typical user, and ϕ mm " 2 η Wmm T´1 . The rest of this section is devoted to deriving the SCDPs in (5) and (6).

A. µWave Tier
Based on (2) and (5), the SCDP in the µWave tier can be derived and summarized below.
Theorem 1: In the cache-enabled µWave tier, the SCDP is given by where P µ j,SCD pb j q denotes the probability that the j-th request content is successfully delivered to the µWave UE by its serving SBS, and is expressed as where P µ cov px, b j q is given by (10) at the top of the this page, which represents the conditional coverage probability that the received SINR is larger than ϕ µ given a typical communication distance x. f |X µ j | pxq is the probability density function (PDF) of the distanceˇˇX µ jˇb etween a typical µWave UE and its nearest serving SBS that stores content j , and is given by [35] f |X µ j | pxq " 2πb j λ µ xe´π bj λµx 2 .

(9)
Proof 1: Please see Appendix A. Note that P µ j,SCD pb j q becomes the probability of successful transmission from the serving SBS to the typical user when b j =1 in traditional µWave networks without caching. We see that the SCDP expression for multi-antenna systems is much complicated, compared to the closed-form expression for single-antenna systems in [19].

B. mmWave Tier
Based on (4) and (6), the SCDP in the mmWave tier can be derived and summarized below.
Theorem 2: In the cache-enabled mmWave tier, the SCDP is given by where P mm,L j,SCD pb j q and P mm,N j,SCD pb j q denote that probabilities that the content j is successfully delivered when the mmWave UE is connected to its serving mmWave SBS via LOS link and NLOS link, and are given by and P mm,N j,SCD pb j q " e´D

IV. OPTIMIZATION OF PROBABILISTIC CONTENT PLACEMENT
In this section, we aim to maximize the SCDP by optimizing the probabilistic content placement tb j u. The main difficulty is that the SCDP expressions (7) and (13) do not have a closed form for the multi-antenna case and whether they are concave with regard to tb j u is unknown, which is much more challenging than the single-antenna SBS case studied in [19]. Therefore, the optimal content placement problem for the multi-antenna case is distinct. To tackle this new problem, here we propose two algorithms, the first one is developed based on the CEO method that can achieve near-optimal performance, and the other two-stair scheme is based on the combination of MPC and CD content placement schemes with reduced complexity.
where Θ fi ttt q u n q"1 | n ř q"1 q¨t q " n, t q is an integer, @nu, csc p¨q is the Cosecant trigonometry function, and A. The Near-Optimal CCEO Algorithm The optimal caching placement probability in the multiantenna case is hard to achieve, so we introduce CEO to resolve the difficulty of maximizing the SCDP by optimizing the probabilistic content placement. CEO is an adaptive variance algorithm for estimating probabilities of rare events. The rationale of the CEO algorithm is to first associate with each optimization problem a rare event estimation problem, and then to tackle this estimation problem efficiently by an adaptive algorithm. The outcome of this algorithm is the construction of a random sequence of solutions which converges probabilistically to the optimal or near-optimal solution [24,36]. The CEO method involves two iterative steps. The first one is to generate samples of random data according to a specified random (normally Gaussian) distribution. And the second step updates the parameters of the random distribution, based on the sample data to produce better samples in the next iteration. The CEO algorithm has been successfully applied to a wide range of difficult optimization tasks such as traveling salesman problem and antenna selection problem in multiantenna communications [37]. It has shown superior performance in solving complex optimization problems compared to commonly used simulated annealing (SA) and genetic algorithm (GA) [38] that are based on random search.
The original principle of the CEO algorithm was proposed for unconstrained optimization. To deal with the constraints on the probabilities tb j u and the content capacity constraint, we propose a CCEO algorithm as shown in Algorithm 1. In the proposed CCEO algorithm, we force the randomly generated samples to be within the feasible set tb j |0 ď b j ď 1, @ju in the Project step. To satisfy the constraint of ř J j b j ď M , we introduce a penalty function H´ř positive number that represents the parameter for the penalty function. The dynamic Smoothing step will prevent the result from converging to a sub-optimal solution. It can be seen that at each iteration, the main computation is to evaluate the objective functions for N s times and no gradient needs to be calculated, so the complexity is moderate and can be further controlled to achieve a complexity-convergence tradeoff. In Fig. 2, we provide an example of the iterative results of content placement probabilities with iteration indices t " 1, t " 5, t " 20, and t " 70. In this example, the algorithm converges when t " 70. Each sub-figure presents the resulting

Algorithm 1 Constrained Cross-Entropy Optimization (CCEO) Algorithm
Initialization: Randomly initialize the parameters of Gaussian distribution N pµ j,t"0 , σ 2 j,t"0 q where t " 0 is the iteration index. Set sample number N s , the number of selected samples N elite ! N s the stopping threshold and a large positive number H as the parameter for the penalty function. repeat Sampling: Generate N s random samples b " tb 1 , b 2 , ..b j , ...b Ns u from the N pµ t , σ 2 t q distribution. Projection: Project the samples onto the feasible set tb j |0 ď b j ď 1, @ju, i.e., b " minpmaxpb, 0q, 1q. Modification: We modify the objective function to the following: where P SCD pbq is the original objective function in (7) and (13) for µWave and mmWave, respectively. Selection: Evaluate p P SCD pbq for N s samples b. Let I be the indices of the N elite selected best performing samples with p P SCD pbq. Updating: for all j P F, calculate the sampling mean and variance: Smoothing: The Gaussian distribution parameters are updated as follows, In particular, α is a fixed smoothing parameter (0.5 ď α ď 0.9) while β t is a dynamic smoothing parameter given by where β is a fixed smoothing parameter (0.8 ď β ď 0.99), and q is an integer with a typical value between 5 and 10. Increment: t " t`1. until A convergence criterion is satisfied, e.g., max jPF pσ 2 t q ă Output: The optimal caching probability is b˚" µ t . mean value of µ t at the end of iteration t, and it will help to generate random samples in next iteration. We can observe that when t " 20, the caching placement probability is quite close to the converged solution, which could significantly reduce the complexity. Overall the CEO algorithm converges fast and is an efficient method to find the near-optimal SCDP result, and the complexity of the CEO algorithm is O`n 3˘ [ 39]. It is also noted that the top ranked contents are cached with probability b j " 1, while to make effective use of the rest caching space, caching diversity is more important. Based on this observation, we design a low-complexity heuristic scheme in the next subsections. To further reduce the complexity of the optimization, we devise a simple two-stair (TS) scheme, when the content popularity is modeled as the Zipf distribution [13,16,25] based on empirical studies, which is given by

B. Two-Stair Scheme for the µWave Tier
where γ is the Zipf exponent that represents the popularity skewness.
In the TS scheme, a fraction of caching space εM (0 ď ε ď 1) at a SBS is allocated to store the most popular contents which is called the MPC region. The remaining cache space is allocated to randomly store the contents with certain probabilities and is called the CD region. As illustrated in Fig. 3, in the 'Two-Stair' caching scheme, the contents in the CD region are cached with a common probability . The rest of the contents are not cached and must be fetched through the backhaul links. These content placement schemes will be studied in detail in the rest of this section.
In this scheme, the content placement probabilities tb j u need to satisfy the following conditions: which are characterized by two variables ε and , where denotes the common probability value that content j in the CD region is stored at a SBS.
As such, the µWave SCDP (7) can be expressed as It is seen in (24) that contents t1,¨¨¨, tεM uu have the same SCDP P µ j,SCD p1q, and contents ! tεM u`1,¨¨¨, tεM u`Y M´tεM u ]) have the same SCDP P µ j,SCD p q. Our aim is to maximize the overall SCDP, and the problem is formulated as where 1 pAq is the indicator function that returns one if the condition A is satisfied. The convexity of the problem (25) is unknown, and finding its global optimal solution is challenging. To obtain an efficient caching placement solution, we first use the following approximations [40] tεM u respectively, based on the fact that for Zipf popularity with 0 ă γ, γ ‰ 1 and M ! J, we have ř M j"1 j´γ{ ř J m"1 m´γ « M 1´γ´1˘{`J 1´γ´1˘ [ 40]. Therefore, the objective function of (24) can be approximated as Note that for the special case of MPC caching, i.e., ε " 1, " 0, the above reduces to r P µ SCD « P µ j,SCD p1q M 1´γ´1 J 1´γ´1 . Then the problem (25) can be approximated as Because ε and are coupled in the objective function of (29), we use a decomposition approach to solve this problem. Since M 1´γ is always positive, given , the optimal ε is obtained by solving the following equivalent sub-problem: where µ o " P µ j,SCD p1q P µ j,SCD p q ě 1 is independent of ε. Thus, we have the following theorem: Theorem 3: The optimal solution of the problem (30) is given by where Since the problem (32) is non-convex, we propose to use Newton's method to solve it, which is shown in the Appendix D. Note that the Newton's method converges faster than the Karush-Kuhn-Tucker (KKT) method and the gradient-based method [41]. Suppose the obtained solution isˆ , then the optimalˆ ˚i s minpmaxpˆ , 0q, 1q, and the optimal ε˚can be obtained from (31).

C. Two-Stair Scheme for the mmWave Tier
Similar to the µWave case, the SCDP of the mmWave tier can be approximated by Then the optimal two-stair content caching can be found obtained by solving the following problem: The problem (34) can be efficiently solved by following the decomposition approach. Given , the optimal ε is obtained by solving the following equivalent sub-problem: where mm o " pP mm,L j,SCD p1q`P mm,N j,SCD p1qq pP mm,L j,SCD p q`P mm,N j,SCD p qq . The rest procedures follow the same approach in the section IV-A, except that the derivation of the search direction to solve the optimal ˚, which is provided Appendix E.

V. RESULTS AND DISCUSSIONS
In this section, the performance of the proposed caching schemes are evaluated by presenting numerical results. Performance comparison between cache-enabled µWave and mmWave systems is also highlighted. The system parameters are shown in Table I, unless otherwise specified. 1 GHz and 60 GHz are chosen for the µWave and mmWave frequency bands, respectively. Fig. 4 verifies the SCDPs for content j derived in Theorem 1 and Theorem 2 against the content placement probability. The analytical results are obtained from (8), (14) and (15). The SCDP for an arbitrary content j is observed to be a monotonically increasing and concave function of the caching placement probability for both µWave and mmWave systems. Notice that all our derived analytical results match very well with those ones via Monte Carlo simulations averaged over 2,000 random user drops and marked by '¨'.
In Fig. 5, we examine the comparison of successful transmission probabilities of µWave from (8) and mmWave from (14) and (15) as bit rate of each content varies, which Bit rate of each content, η/T corresponds to the case with caching placement probability b j " 1. It is seen that when content size is small, the µWave system shows better performance than mmWave, but as the content size increases, the mmWave system outperforms the µWave system for its ability to provide high capacity. The successful mmWave transmission probability shows a 'ladder drop' effect, and this is because the mmWave system combines LOS part and NLOS part. The LOS effect is limited to the region within the distance D L q while NLOS has a much wider coverage, so when the required content size is small, the performance is dominated by the NLOS part. However, the NLOS part cannot provide high capacity due to the much larger path loss exponent α N , so its performance drops steeply as the bit rate of each content increases. Next, in Figs. 6-7, we compare the performance of the two proposed content placement schemes with the close-form optimal solution [19] and the intuitive MPC scheme [20] in the µWave single-antenna case. Note that in the general multiantenna setting, the close-form optimal content placement is still unknown. The SCDP with different caching capacity M is shown in Fig. 6. It is observed that the CCEO algorithm achieves exactly the same performance as the known optimal solution in [19], and the proposed TS scheme provides closeto-optimal and significantly better performance than the MPC solution, especially when γ is large and the caching capacity M is small. The MPC solution is the worst caching scheme because it ignores the content diversity which is particularly important when the content popularity is more uniform. Fig. 7 shows the SCDP with different content sizes η. It is found that the SCDP of the TS scheme is closer to the optimum when the η{T is large. However, as the bit rate of each content η{T increases, both TS and MPC schemes become very close to the optimal solution. Fig. 8 shows the SCDP comparison of various systems with different caching capacities M . It shows that both of the proposed content placement schemes perform consistently better than MPC, especially for the 60GHz mmWave, the SCDP of the TS scheme is close to that of the CCEO algorithm. The results also indicate that µWave always has a superior performance than the 60GHz mmWave with the same SBS density of 600{km 2 . Fig. 9 shows the SCDP comparison of various systems versus the caching capacities M with different content sizes. We generate a random set of of content size S " ts 1 , ..., s j , ...s J u, where s j denotes the content size of f j . For simplicity, s j is chosen to be 1 or 2 with equal probability of 0.5 in our simulation. The caching probability satisfies ř J j"1 b jˆsj ď M . It is shown that in the unequal-size content case, CEO still greatly outperforms MPC, following a similar trend as the equal-size content case.    Fig. 10 studies the impact of content library size on SCDPs of different systems. It is seen that as the library size J increases, the SCDP drops rapidly. The gap between the proposed content placement schemes and the MPC scheme remain stabilized when the library size increases. Fig. 11 compares the SCDPs for the two proposed content placement schemes against Zipf exponent γ. It can be seen that the SCDP increases with γ because caching is more effective when the content reuse is high. In the high-γ regime of both µWave and mmWave systems, the content request probabilities for the first few most popular content are large, and SCDPs of both proposed placement schemes almost coincide. It is noteworthy that the proposed TS placement scheme achieves performance close to the CCEO algorithm,  especially in the µWave system and at low and high γ regimes. Finally, we investigate the cache-density tradeoff and its implication on the comparison of µWave and mmWave systems. The CEO placement scheme is used. Fig. 12 demonstrates the SCDPs with different caching capacity M , SBS densities λ µ and λ mm . It is also observed that the µWave channel is usually better than the mmWave channel when λ " 600{km 2 , so with the same SBS density, µWave achieves higher SCDP. To achieve performance comparable to that of the µWave system with SBS density of 600{km 2 , the mmWave system needs to deploy SBSs with a much higher density of 1000{km 2 , but the extra density of λ mm =400 /km 2 is too costly to afford. Fortunately, by increasing the caching capacity from 10 to 20, the mmWave system can achieve the same SCDP of 91% as the µWave system while keeping the same density of 600{km 2 . This result shows great promise of cache-enabled small cell systems because it is possible to trade off the relatively cheap storage for reduced expensive infrastructure.

VI. CONCLUSION
In this paper, we have investigated the performance of caching in µWave and mmWave multi-antenna dense networks to improve the efficiency of content delivery. Using stochastic geometry, we have analyzed the successful content delivery probabilities and demonstrated the impact of various system parameters. We designed two novel caching schemes to maximize the successful content delivery probability with moderate to low complexities. The proposed CCEO algorithm can achieve near-optimal performance while the proposed TS scheme demonstrates performance close to CCEO with further reduced complexity. An important implication of this work is that to reduce the performance gap between the µWave and mmWave systems, increasing caching capacity is a low-cost and effective solution compared to the traditional measures such as using more antennas or increasing SBS density. As a promising future direction, to study cooperative caching in a multi-band µWave and mmWave system could further reap the benefits of both systems.
APPENDIX A: PROOF OF THEOREM 1 Based on (5), P µ SCD is calculated as where P µ cov px, b j q is the conditional coverage probability, and f |X µ j | pxq is the PDF of the distanceˇˇX µ jˇ. Then, we derive P µ cov px, b j q as where L I µ j p¨q is the Laplace transform of the PDF I µ j , and L I µ j p¨q is the Laplace transform of the PDF I µ j . Then L I µ j psq is given by Likewise, L I µ j psq is given by Based on (4) and (6), the SCDP for a LOS mmWave link can be derived as α L , f |Y mm j | pyq is the PDF of the distanceˇˇY mm jˇb etween a typical user and its serving mmWave SBS , which is given by [35] f |Y mm j | pyq " 2πb j λ mm ye´π bj λmmy 2 , y ě 0.
Similarly, the SCDP for a NLOS mmWave link can be derived as ď 0, which means that f 1 pεq is a concave function w.r.t. ε. By setting Bf1pεq Bε to zero, we obtain the stationary point as Note that 0 ď ď 1, and µ o´1 ´1´1 ě 0, we have ε o ě 0. To obtain the optimal ε˚, we need to consider the following cases: ‚ Case 1: 0 ď ε o ă 1. In this case, the optimal solution of the problem (30) is ε˚" ε o . ‚ Case 2: ε o ě 1. In this case, Bf1pεq Bε ě 0 for ε P r0, 1s, and thus the optimal solution of the problem (30) is ε˚" 1. Based on the above cases, we obtain (31) and complete the proof.
APPENDIX D: NEWTON 1 S METHOD TO OPTIMIZE IN (32) We propose Newton's Method to solve the non-convex problem (32) with fast convergence. Based on (28), the firstorder derivative of r P µ SCD is given by respectively. In (D.3), to simplify the computation, we let P cov pxq « P µ cov px, 0q, based on the fact that the interference I µ j`I µ j can be approximated as ř kPΦµ P µ h k,o L p|X k,o |q, particularly in the dense small cell scenarios [43]. Similarly, the second-order derivative of r P µ SCD pη, T q is given by and B 2 P µ j,SCD p q B 2 " According to [41], the search direction in Newton method can be defined as Then, is iteratively updated according to p `1q " r p q`δ 2 p q ∆ s where denotes the iteration index, µ o " P µ j,SCD p1q P µ j,SCD p q is already defined in (30), δ 2 p q is the step size that can be determined by backtracking line search [44]. Thus, the optimal ˚c an be obtained when reaching convergence.  (32). Here we only derive the search direction that involves the first and second-order derivative, and the rest is similar to Appendix D.
We first derive the first-order derivative. Similar to (D.1), we change BP µ j,SCD p q B to BP mm j,SCD p q B and get the result below: BP mm j,ST p q B "´D 2 L πλ mm e´D 2 L π λmm pmin pD L , d L qq 2 πλ mm e´p minpDL,dLqq 2 π λmm pmax pD L , d N qq 2 πλ mm e´p maxpDL,dNqq 2 π λmm . (E.1) Next we focus on the second-order derivative. Changing in (D.4) leads to the following result: B 2 P mm j,SCD p q B 2 " D 4 L π 2 λ 2 mm e´D 2 L π λmḿ pmin pD L , d L qq 4 π 2 λ 2 mm e´p minpDL,dLqq 2 π λmḿ pmax pD L , d N qq 4 π 2 λ 2 mm e´p maxpDL,dNqq 2 π λmm . (E.2) Therefore, the search direction in Newton Method can be expressed as