IEEE Signal Processing Society 1998 Workshop on Multimedia Signal Processing
December 7-9, 1998, Los Angeles, California, USA
Electronic Proceedings
© 1998 IEEE

Implementation of Optimized Cache Replenishment Algorithms in a Soft Caching System

Jussi Kangasharju: Integrated Media Systems Center; University of Southern California; Los Angeles, CA; Email: [email protected]; http://biron.usc.edu/~kangasha/
Young Gap Kwon: Integrated Media Systems Center; University of Southern California; Los Angeles, CA; Email: [email protected]; http://biron.usc.edu/~younggap/
Antonio Ortega: Integrated Media Systems Center; University of Southern California; Los Angeles, CA; Email: [email protected]; http://sipi.usc.edu/~ortega/
Xuguang Yang: HP Labs; Palo Alto, CA; Email: [email protected]
Kannan Ramchandran: University of Illinois; Urbana-Champaign, IL; Email: [email protected]

Abstract

In this paper we address practical issues which arise in the implementation of optimized cache replenishment algorithms within a ``soft'' caching framework.

We study the algorithms that have been proposed for optimized soft caching and simulate them using actual proxy traces. Our objective is to determine what compromises have to be made in order to approximate the desired optimal performance while maintaining a complexity level sufficiently low to enable a real-time implementation.

Introduction
Main Derivation
- Hard Access
- Soft Access
Design and Implementation
Simulations
Bibliography

Introduction

With the popularity of the Internet and various web browsing applications, an increasing amount of image data is transmitted over the network. Proxy caches, or intermediate network nodes in which frequently used data are stored, emerge as an efficient solution to reduce network traffic. In a typical server-proxy-client scenario, the client makes requests through a proxy which retrieves the image from the server, delivers it to the client and caches a copy.

A question naturally arises as to how to efficiently allocate the proxy cache memory to achieve a minimum expected delay time. This so-called web caching problem has recently attracted a lot of research [4],[8].

The key difference between image-specific caching and conventional caching is as follows. Conventional cache management are ``hard'' because objects are either present in the cache or not, but cannot be partially available. Soft caching, however, is more flexible, in that variable amounts of cache memory can be assigned to each image. Therefore assuming that a progressive image format is used, a low resolution version of the original image can be kept in the cache. The main advantage of this approach is that, in many cases, the user will not need to download the full resolution image from the server.

In this work we study the implementation issues that arise in a soft caching scenario [1], [2], [3], [7], [9]. In [7], two soft caching scenarios are distinguished, namely, (i) hard access, in which the user always requires the entire image, and (ii) soft access, in which she can possibly choose to view the image at a certain resolution level. Obviously, soft-access-soft-caching represents the most realistic scenario, and is the focus of the current research.

In [9] efficient low-complexity algorithms to optimize the management of a soft cache are proposed. It is proved that the soft-access problem can be converted into a hard-access problem by breaking image resolution levels into separate subimages. Such a conversion is shown to maintain resolution-consistency in cache loading, i.e., the lower resolution version will always be loaded prior to the enhancement layers for higher resolutions.

In this paper we review the optimization algorithm of [9] and study the issues involved in implementing it in a real time system.

<-- Back to Table of Contents

Main Derivation

Hard Access

Let us start with the simpler case of hard access. We consider a set of N images, and assume a known access probability ${\{P_i \}}_{i=1, N}$ to them. Image i has size R_i and is initially stored at a server of bandwidth B_{S_i}. A proxy cache of size C and transmission rate B_C is used as intermediate storage. Usually B_C>>B_{S_i}. The problem is how to assign cache memory to the images so that the average access delay is minimized. The following reviews the algorithm of [9] and the reader is referred to [9] for details.

Suppose image i is assigned r_{C_i} bits, for which $0 \leq r_{C_i} \leq R_i$ . Hard access will always request the entire image of R_ibits. The total delay for accessing image i will then be

$\begin{displaymath}\delta_i(r_{C_i}) = B_C^{-1}. r_{C_i}+ B_{S_i}^{-1}.(R_i-r_{C_i}) \end{displaymath}$

(1)

The expected delay time to retrieve all the images is therefore $\sum_{i=1}^N P_i \delta_i({r_{C_i}})$ .

To minimize this, we imagine that the cache is filled at infinitesimal steps. If we allocate a very small amount of $\epsilon$ bits to image i, we achieve a delay reduction of $P_i(B_{S_i}^{-1}-B_C^{-1}) \epsilon$ . Obviously, to maximize this reduction we should pick the image that has the largest P_i(B_{S_i}^-1-B_C^-1). Therefore, the solution to the hard access problem is simply to compute a priority index $\beta_i= P_i(B_{S_i}^{-1}-B_C^{-1})$ for each image, sort the images from large to small on $\beta_i$ . Then, when it is necessary to remove images from the cache this can be done following the order determined by the priority index.

<-- Back to Main Derivation

Soft Access

Now, each image has a finite number of resolution levels. Resolution j of image i has r_ij bits ( $0< r_{ij}<r_{i,j+1} \leq R_i$ ), and access probability P_ij, with $\sum_{i,j}P_{ij}=1$ . Suppose the cache holds r_{C_i} bits for image i, its total delay in this case is,

$\begin{displaymath}\delta_i(r_{C_i}) = \displaystyle{\sum_{j: r_{ij} \leq r_{C_i... ...}} P_{ij}} ( r_{C_i} B_C^{-1}+ (r_{ij}-r_{C_i}) B_{S_i}^{-1}) \end{displaymath}$

(2)

$\delta_i(r_{C_i})$ is piecewise linear and convex [7].

Consider the situation where each image has already been assigned r_ibits, in the range of resolution j, i.e., $r_{ij} \leq r_i < r_{i,j+1}$ , and the cache has some vacancy. If we allocate $\epsilon$ bits to image i, the reduction in average delay is:

$\begin{displaymath}\delta_i(r_i)-\delta_i(r_i+\epsilon)= (\sum_{l \geq j} P_{il}) (B_{S_i}^{-1}-B_C^{-1}) \epsilon \end{displaymath}$

(3)

The first term in (3) is the probability that these $\epsilon$ bits in image i will get accessed. Obviously, we should give these $\epsilon$ bits to the image with the largest $(\sum_{l \geq j} P_{il}) (B_{S_i}^{-1}-B_C^{-1})$ .

The above analysis reveals a close tie between soft access and hard access: they are both solved using some priority indices. In fact, if we conceptually break image i into several subimages, let a subimage I_ij represent the difference between resolution levels j and j-1 of image i, and update the accessing probabilities accordingly, then the soft access problem is converted into a hard access problem.

In [9], it is shown that resolution consistency is maintained in cache allocation. In other words, subimage I_ij is always loaded into the cache prior to I_{i, j+1}.

<-- Back to Main Derivation

<-- Back to Table of Contents

Design and Implementation

Under the above conversion rule, every soft access problem is in fact a hard access problem with a larger set of images. In this section we explain how the access mechanism in our testbed can be formalized and a specific priority index assigned to each image.

Access type

Even though the soft access scenario is more appealing, all modern browsers implement only hard access. In order to have an access mechanism closer to soft access, the semantics of hard access need to be relaxed.

In our experiments we have the following access mechanism: the client requests an image within a page and receives the image available in the cache (which could be at a lower resolution) thus requiring time $B_C^{-1}\cdot r_{C_i}$ . If the user is not satisfied with the image resolution a full reload is requested so that the image is downloaded in its entirety from the server. Thus the time required when such a ``miss'' occurs is $B_{S_i}^{-1}\cdot R_i$ where R_i is the size of the i-th image.

The optimized algorithm requires the access probabilities for each of the subimages. Under hard access these are not available and they will need to be approximated. Our approximation technique will be detailed later on.

<-- Back to Design and Implementation

Simple LRU-based algorithm

Cache replacement strategies have been extensively studied in the context of normal (i.e. hard) web caching [4], [8], with LRU [8] being one of the most popular approaches. In a LRU environment the basic idea is to remove the image (regardless of size, server bandwidth and other considerations) which was accessed the least recently.

While LRU is an attractive strategy given its simplicity, it is not always easy to implement an LRU strategy in a soft cache. In a soft cache, the image with the lowest priority will be recoded, instead of being removed,. Now, consider the image with oldest access time. This image will be then recoded, but if the same algorithm (i.e. finding the LRU image) is used again, it is likely that this image will remain the LRU. Thus, if the recoding level is not considered as part of the removal decision the result will be that images are recoded repeatedly until the lowest resolution is reached and they are removed.

Therefore the access time of the recoded images needs to be adjusted as a function of the recoding level. For this we use the following formula in which the new access time is

$\begin{displaymath} new = old + {remaining\_recoding\_levels \over max\_recoding\_level} (current\_time - old) \end{displaymath}$

(4)

The new access time for an image that has been recoded only a few times ( $remaining\_recoding\_levels$ is large) is close to the current time, which increases its priority in the LRU algorithm. A heavily recoded image is probably not very useful and its new access time is set closer to its old access time. This does not increase its priority very much and therefore it will be removed (or recoded again) sooner.

<-- Back to Design and Implementation

Optimized sorting based algorithm

Let us now consider a practical implementation based on the optimal replacement criterion. The priority indices are defined as

$\begin{displaymath}\beta_{ij}= P_{ij} (B_{S_{i}}^{-1}-B_{C}^{-1}) \end{displaymath}$

(5)

Because exact values for the three parameters, P_ij, B_{S_i}and B_C are not available, they will be estimated in the following ways.

All the clients are assumed to be connected over similar links. Therefore the client bandwidth, B_C, is constant and does not need to be estimated.

The bandwidth to server i, B_{S_i}, is estimated from the size of image i and the time it took to retrieve it from the server, averaged over subsequent retrievals of image i. This is an imperfect estimator, since using all images from server i would yield a better estimate, albeit still not necessarily correct. This, however, would require a dedicated data structure for each server and would be far costlier in terms of memory. The Squid proxy used as the basis for our testbed [6] expresses the image size in bytes and the duration of the request in milliseconds giving as the natural unit of bandwidth bytes per millisecond.

The main problem is estimating the values of P_ij. These can be estimated by the number of times the object has been referenced, but because of hard access, we do not have accesses to anything but the last resolution. Therefore all the lower resolutions share this same reference count. This would reduce the replacement algorithm into a hard replacement algorithm. Also because the reference count is reset to 0 every time the object is released from the cache, the estimate is somewhat inaccurate. For a better estimate, it would be necessary to keep the reference counts of every object ever seen by the cache, but this is completely infeasible in practice.

When an image is sent to the recoder, we simulate soft accesses by increasing the reference count of the next highest level. The average bandwidths in the experiments were around 4.5 bytes per millisecond and the soft accesses were simulated by adding 0.1 to the reference count. The effect from this is that the lower resolutions have a higher priority index and do not get recoded right away. It should be noted that for normal objects, only a single priority index is calculated, but such objects are not recoded, instead they are removed from the cache.

Using this kind of simulated soft access has the following effects on the reference count of the image. Assume that image I has reference count N, i.e. it has been accessed N times.

Then, the first time image I is recoded its reference count is set to N + 0.1. After the second recoding, the reference count is again increased by 0.1, giving as the new reference count N + 0.2. In our system, the maximum number of times an object can be recoded is 9. Therefore the maximum adjustment due to recoding is 0.9.

This maximum adjustment is slightly less than the increase of 1 in the reference count generated by a normal access. Therefore, real accesses get priority over the simulated accesses generated in the recoding process.

When a recoded image with an adjusted reference count gets a real access, this is counted as a full access, and the adjustments are kept as they were. In other words, if an image having a reference count of N + 0.5 (N accesses and 5 recodings) gets accessed for real, its new reference count will be N + 1 + 0.5, or N + 1.5.

<-- Back to Design and Implementation

<-- Back to Table of Contents

Simulations

We conducted simulations using a trace from our own Squid-proxy at our lab. This trace contained almost 150000 URLs and 1.4 gigabytes of data. Recoding was performed only on JPEG-images which represented 16 % of the bytes. The different replacement algorithms were run over different cache sizes.

A cache with an infinite size would get a 48 % hit-rate and a 21 % byte hit-rate. Hit-rate is defined as the ratio of requests satisfied with cached objects to the total number of requests. Byte hit-rate is the ratio of the bytes in the requests satisfied from the cache to the total number of bytes transmitted.

The cache sizes in the plots below are expressed as percentages of the size of the trace, i.e. 1.4 gigabytes. Given the maximum possible byte hit-rate of 21 %, a cache of about 300 megabytes could ideally hold all of the cacheable traffic. Nevertheless, cache sizes of up to 400 megabytes were used in the simulations. This was done because practical replacement algorithms sometimes make incorrect decisions and therefore do not achieve optimal performance for a given cache size.

Figure 1 shows the average download time of an object when using one of four different replacement policies. LRU means a standard LRU, LRU-SOFT refers to LRU with the access time modification (4), OPT-HARD is the optimal replacement policy without the simulated soft access, i.e. working as a hard replacement policy, and OPT-SOFT is the same policy with simulated soft access, i.e. recoding images and increasing their reference counts.

**Figure 1:** Relative gain in download time
$\begin{figure} \begin{center} \epsfbox{time.eps} \end{center}\end{figure}$

The results show that even with the sub-optimal estimators, the optimal replacement policy has the shortest average download time. Also the simple modification to LRU yields a shorter download time than stock LRU. As cache size increases, any cache can hold all of the cacheable traffic, thereby the gains disappear.

However, the differences between different replacement policies in Figure 1 are quite small. This is due to the way the average download time was measured. When an object is referenced for the first time, it must always be fetched from the origin server, regardless of the replacement policy in the cache. Because the measured average download time includes the download times of these objects as well, this does not leave much for the replacement algorithm. In our trace, almost 80 % of the bytes were transmitted only once, thereby reducing the potential gains on the average download time.

Two other popular cache performance metrics, hit-rate and byte hit-rate, are not affected by this phenomenon. Figure 2 contains the hit-rates attained by the different replacement policies on different cache sizes relative to the maximum possible hit-rate of 48 %. The results on byte hit-rate are plotted in Figure 3.

**Figure 2:** Hit-rate
$\includegraphics{mmsp.lru.hr.eps}$

**Figure 3:** Byte Hit-rate
$\includegraphics{mmsp.lru.bhr.eps}$

Although the optimized replacement policy has been designed to minimize average download time, it can be seen from Figures 2 and 3 that it also performs quite well overall. In fact, it is almost always has the best performance of all the four replacement policies studied.

It can also be seen that recoding helps to counter the suboptimal estimators used in the optimized replacement policy. This is most evident in Figure 3 where the optimized policy with approximated soft accesses outperforms its hard variant by 100 % in the smallest cache.

However, the low performance of the hard variant of the optimized policy is explained in part by the small cache size. Because of this, the access probability estimates for the objects are relatively inaccurate, and therefore the replacement criteria used is selecting the removed objects are off the mark.

Because recoding an object also prolongs its lifetime in the cache, we have a better chance of getting more accesses on the object and therefore increase the accuracy of the probability estimate.

Recoding also makes objects smaller and therefore a soft cache can hold more objects than the equivalent hard cache could. Figure 4 shows the number of objects in all the four caches over the whole experiment.

**Figure 4:** Cache Contents over Time
$\includegraphics{contents.50.eps}$

From Figure 4 we see that the optimized policies hold more objects than the LRU-variants. This is due to the different replacement criteria used and the estimators used for the different parameters in the optimized policy.

Because the bandwidth estimate also includes the TCP-connection setup latency, small objects have relatively longer download times than larger objects. Because the bandwidths are estimated per object, the smaller objects appear to come from servers with poor connections. Therefore the replacement criteria assigns more priority to them even though this might not be fully justified.

The optimized replacement policy with approximated soft accesses recodes objects that are not often accessed and/or come from servers with a good connection to the proxy. Because of the above mentioned inaccuracies in the bandwidth estimation, this could also include large objects.

Recoding a large object possibly frees up quite a lot of space in the cache, space which can be used to store many smaller objects. This is the reason for the large difference in the number of objects between the soft optimized policy and the others.

More results are available on our web site [5] and more will be made available as they are generated. This web site also contains a patch for the Squid proxy to make it operate as a soft cache.

<-- Back to Table of Contents

Bibliography

[1]: J. Kangasharju, Y. Kwon, and A. Ortega, "Design and implementation of a soft caching proxy," in Proc. 3rd WWW Caching Workshop, Manchester, UK, June 1998.
[2]: A. Ortega, F. Carignano, S. Ayer, and M. Vetterli, "Soft caching: Web cache management for images," in IEEE Signal Processing Society Workshop on Multimedia Signal Processing, Princeton, NJ, June 1997.
[3]: A. Ortega, Z. Zhang, and M. Vetterli, "A framework for the optimization of a multiresolution remote image retrieval system," in Proc. of Infocom, Toronto, June 1994.
[4]: J. Shim P. Scheuermann and R. Vingralek, "A case for delay-conscious caching of web documents," in Proc. Intl. WWW Conf, Santa Clara, CA, Apr. 1997.
[5]: Soft Caching Project Page. http://sipi.usc.edu/~ortega/SoftCaching/.
[6]: Squid Home Page. http://squid.nlanr.net/.
[7]: C. Weidmann, M. Vetterli, A. Ortega, and F. Carignano, "Soft caching: Image caching in a rate-distortion framework," in Proc. of ICIP, Santa Barbara,CA, Oct. 1997.
[8]: R. P. Wooster and M. Abrams, "Proxy caching that estimates page load delays," in Proc. Intl. WWW Conf, Santa Clara, CA, Apr. 1997.
[9]: X. Yang and K. Ramchandran, "An optimal and efficient soft caching algorithm for network image retrieval," in Proc. of ICIP, Chicago, IL, Oct. 1998.