Saturday, February 14, 2009

Intro to Caching,Caching algorithms and caching frameworks part 3

Introduction:

In part 1 we talked about the basics and terminologies of cache and we have also shown replacement policies , in part 2 we implemented some of these famous replacement polices and now in this part we will continue talking about the implementation of two famous algorithms which are LFU and LRU. Again, the implementation in this article is for sake of demonstration and in order to use it (we just concentrate over the replacement algorithm and we will skip other things like loading data and so on), you will have to do some extra work but you can base your implementation over it.

Meet LFU Cache Implementation:

public synchronized Object getElement(Object key) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element = (CacheElement) obj;
element.setHitCount(element.getHitCount() + 1);
return element.getObjectValue();
}
return null;

}

public final synchronized void addElement(Object key, Object value) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element;

// Just replace the value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

if (!isFull()) {

index = numEntries;
++numEntries;
} else {
CacheElement element = removeLfuElement();
index = element.getIndex();
table.remove(element.getObjectKey());

}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
cache[index].setIndex(index);
table.put(key, cache[index]);
}

public CacheElement removeLfuElement() {

CacheElement[] elements = getElementsFromTable();
CacheElement leastElement = leastHit(elements);
return leastElement;
}

public static CacheElement leastHit(CacheElement[] elements) {

CacheElement lowestElement = null;
for (int i = 0; i < elements.length; i++) {
CacheElement element = elements[i];
if (lowestElement == null) {
lowestElement = element;

} else {
if (element.getHitCount() < lowestElement.getHitCount()) {
lowestElement = element;
}
}
}
return lowestElement;
}
Analyzing LFU Cache Code (Talk Show):


Presenter: it is getting hotter and hotter now, our next contestant is LFU cache, please make some noise for it.

Audience began to scream for LFU which made LFU hesitated.

Hello, I am LFU, when the cache client want to add a new element and cache is full (no enough room for the new entry) I will have to kick out the least frequently used entry, by using the help of the removelfuElement method which will allow me to get the least frequently used element, after I get it, I will remove this entry and place the new entry


else {
CacheElement element = removeLfuElement();
index = element.getIndex();
table.remove(element.getObjectKey());

}


If we dived into this method…, I am saying if we dived into this method (still nothing happened)
LFU tried pressing the next button on the presentation remote control (to get the next presentation slide) but I didn’t work.

Ahh now we are talking, ok if we dived into this method we will see that the method is just getting the whole elements in cache by calling getElementsFromTable method and then returns the element with the least hit.


public CacheElement removeLfuElement() {

CacheElement[] elements = getElementsFromTable();
CacheElement leastElement = leastHit(elements);
return leastElement;
}
}


By calling leastHit method which loops over the cache elements and check if the current element has the least hit, if so, I will make it my lowestElement which I am going replace the new entry with.

public static CacheElement leastHit(CacheElement[] elements) {

CacheElement lowestElement = null;
for (int i = 0; i <>
CacheElement element = elements[i];
if (lowestElement == null)
{ lowestElement = element; }
else {

if (element.getHitCount() <>
{ lowestElement = element; }
}
}
return lowestElement;
}

LFU stopped talking and waited for any action from the audience and the only action it get was scratching heads (audience didn’t get some stuff).

One of the production team whispered to LFU cache and said: you didn’t mention how the lowest element will be distinguished from another element?

Then LFU cache started talking gain and said: By default when you add the element to the cache its hitCoint will be the same as the previous element so how do we handle the hit count thing?

Every time I encounter a cache hit I will increment the hit count of the entry and then return the entry the cache client asked for which would be something like that


public synchronized Object getElement(Object key) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element = (CacheElement) obj;
element.setHitCount(element.getHitCount() + 1);
return element.getObjectValue();
}
return null;

}


Magnifying the Code:

Did anyone say magnification?




Meet LRU Cache Implementation:

private void moveToFront(int index) {
int nextIndex, prevIndex;

if(head != index) {
nextIndex = next[index];
prevIndex = prev[index];

// Only the head has a prev entry that is an invalid index so
// we don't check.
next[prevIndex] = nextIndex;

// Make sure index is valid. If it isn't, we're at the tail
// and don't set prev[next].
if(nextIndex >= 0)
prev[nextIndex] = prevIndex;
else
tail = prevIndex;

prev[index] = -1;
next[index] = head;
prev[head] = index;
head = index;
}
}

public final synchronized void addElement(Object key, Object value) {
int index;
Object obj;

obj = table.get(key);

if(obj != null) {
CacheElement entry;

// Just replace the value, but move it to the front.
entry = (CacheElement)obj;
entry.setObjectValue(value);
entry.setObjectKey(key);

moveToFront(entry.getIndex());

return;
}

// If we haven't filled the cache yet, place in next available spot
// and move to front.
if(!isFull()) {
if(_numEntries > 0) {
prev[_numEntries] = tail;
next[_numEntries] = -1;
moveToFront(numEntries);
}
++numEntries;
} else {
// We replace the tail of the list.
table.remove(cache[tail].getObjectKey());
moveToFront(tail);
}

cache[head].setObjectValue(value);
cache[head].setObjectKey(key);
table.put(key, cache[head]);
}

Analyzing LRU Cache Code (Talk show):

After LFU finished talking, there were not much screaming, they didn’t like the presentation and LFU was hesitating while talking, this gave a big push to LRU which started by saying:

This time I will consider the case also when the cache is not full, I am little more complex than those other algorithms, when the cache isn’t full and it is the first entry I will just increment the numEntries which represents the number of entries in the cache.

After adding a second entry I will need to move it to the front by calling moveToFront method (we will talk about it soon), I didn’t do this for the first entry because it is for sure the first element.

So let’s see some action.

As you can see I am stating that the previous of the current entry will have the tail value and the next entry will be -1 (undefined in other words) these are just initial data.

After adding the new entry (which isn’t the first entry) I will move it to front.

if(!isFull()) {
if(_numEntries > 0) {
prev[_numEntries] = tail;
next[_numEntries] = -1;
moveToFront(numEntries);
}
++numEntries;
}

The moveToFront method moves an entry to the head of the array so that the least recently used elements reside at the bottom of the array.

Before I do any move I check if the head is not equal to current index (this will be false in case we only have 1 entry) if yes, then assign the value of the next of the current entry (which is a pointer to next entry as in linked list) to nextIndex and the value of the previous of the current entry (which is a pointer to the previous entry as in linked list) to prevIndex

int nextIndex, prevIndex;

if(head != index) {
nextIndex = next[index];
prevIndex = prev[index];

Then I assign the value of the nextIndex to the value of next of the previous entry

// Only the head has a prev entry that is an invalid index so
// we don't check.
next[prevIndex] = nextIndex;

After that I am going to check for the nextIndex if it is greater that or equal 0 then the previous the next entry will have the value of prevIndex , else the tail will be equal to the prevIndex

// Make sure index is valid. If it isn't, we're at the tail
// and don't set prev[next].
if(nextIndex >= 0)
prev[nextIndex] = prevIndex;
else tail = prevIndex;

And because I moved this entry to the front so there won’t be any previous entry for it so am assigning -1 to it and the next entry of the current entry (top one) will be the head (previous old head) and the prev of head (the old head) will have the index of the current entry and then the new head is assigned the new index (current index)

prev[index] = -1;
next[index] = head;
prev[head] = index;
head = index;


Magnifying the Code:

It is magnifying time! Get your magnifying glass we are going to see some interesting stuff here



It is Confession Time! :

LRU didn’t mention that it is possible to implement the LRU algorithm in a simple way , our previous implementation is based on Arrays , the other implementation that LRU cache didn’t mention is through LinkedHashMap which was introduced in JDK 1.4

public class LRUCache2 extends LinkedHashMap
{
private
static final int MAX_ENTRIES = 3;


public LRUCache2()

{
super(MAX_ENTRIES+1, .75F,
true);
}

// This method is
invoked by put and putAll after inserting a new entry into

// the map. It allows the map to have up to 3 entries and then
// delete the oldest entry each time a new entry is
added. protected boolean removeEldestEntry(Map.Entry eldest)
{
return
this.size() > MAX_ENTRIES;
}

}

For sure, the LinkedHashMap solution is less time consuming that the array solution and it is more efficient coz you will leave the handling of the deletion and so on to the JDK itself, so you won’t bother yourself implementing such stuff.
OSCache use such implementation in its LRU caching implementation.

Conclusion:

We have seen how to implement LFU and LRU algorithms and the two ways to implement the LRU, it is based on you to choose which way to use, Arrays or LinkedHashMap for me I would recommend Arrays for small size entries and LinkedHashMap for big size entries.

In next part we will be talking about the Caching framework and a comparison between them and what caching algorithm is employed by which caching framework, stay tuned till then ;)

14 comments:

Xuân Tính said...

I really enjoy your blog series about caching, keep it up!

Artur said...

What I think would be useful is to show implementation of LFU cache which doesn't have O(n) performance on adds.

Ahmed Ali said...

Xuân,

Thanks for your nice comment, hope you will enjoy the comming parts

Thanks

Ahmed Ali said...

Artur,

thanks for your comment ,actually am thinking of making a whole part to discuss the performance of common algorithms but after i finish the part related to the framework :)

Thanks

mindas said...

Hi,

Thanks for interesting series. Just to add the comment about O(n) complexity, I think it would also make sense to emphasis on scalability.

This especially applies to synchronized keyword applied on methods, and ways to avoid them safely.

A good criterion would be how each of these algorithms can be implemented using optimistic/non-blocking behaviour when adding/removing/querying items in the cache.

Ahmed Ali said...

Mindas,

Thanks for your comment i really appriciate it, as for synchronized keyword , i could have just removed it and used the collections synchronized that shipped with javathis would be better than using synchronized keyword

i will try to include your recommendations in comming articles and implement it in a non-blocking nature :)

Thanks

Xander said...

Great Read!!! Quality!

Ahmed Ali said...

Xander,

Thanks for your comment , glad you liked it and hope you will ike the next part.

Thanks,

Vincent Van said...

good job~
hope you can write more, i've subcribed your RSS~~~

Ahmed Ali said...

Vincent,

Thanks for your comment,am glad you liked it and hope you will like the next part ;) stay tuned

Thanks,

Anonymous said...

Thanks for the articles.
I have a question about the use of these cache frameworks, most of the time I think we use them to accelerate the data access and the disk cache is just used to limit the cache memory footprint ... Right ?

I'm looking for a way to use them as a "object memory manager" in a low memory context, I mean like the operating system swapping pages out of physical memory... For instance in an application we cannot have more than 256M for the JVM and we have to manage more than 1,5 Go of data but not in same time, I think that the cache framework is a good tool to manage that by re-using eviction algorithms and memory/disk (and reverse) object migration ...
Any ideas ? Any useful URLs ?
Thanks for your help.

Gilles

Ahmed Ali said...

Gilles,

Thanks for your comment , as for your 1st question about the caching frameworks , yea we use the disk caching if we want to limit the memory max object size and when the cached objects in memory reach the max size ,they will be evicted to disk based on the replacement algorithm we are using ,this is a good thing coz by this we still have access to the cached object so if the cache didn’t find the object in the memory it will consult the disk and if it found the object ,it will load it into memory again which will be faster than loading it from the data source again
About your second question I don’t think I got it so could you please ask it in another way again :)

Thanks,

Anonymous said...

About your second question I don’t think I got it

yes, sorry ... in fact, I think I'm looking for a way to drive the cache size not by limiting the number of objects in the cache but by limiting the memory cache size. For example, in my (confusing) example I could set the maximum cache size to 80% of 256M, in theory if my application uses only a small part of the 1,5 Go of data, the cache is not used to accelerate the app but to manage what objects are in memory.

if the cache didn’t find the object in the memory it will consult the disk and if it found the object ,it will load it into memory again which will be faster than loading it from the data source again

yes that resolves exactly my first problem ... and as explained just before my second problem is to limit the cache size on memory use criterias ?


gilles

Ahmed Ali said...

Gilles,

Thanks for clarifying the question, as I understood from your question, currently I didn’t come to any cache framework that do so , in my point of view this wont benefit the performance of the cache as I will have to check the object size every time before I added it to memory and what if the object size is larger that then empty slot in memory , lets say there is only 500 K in memory while the object size is 1 MB (has a lot of relations) so in this case will you invalidate entries ? Or you won’t cache the item at all? If you have to invalidate some objects (Evict objects) you will need to loop on the objects to check their size which would be inefficient thing, and what if there is no more room in the memory? Which object will you evict? The largest one? Or the least used entry? And will the largest one size be enough for the new entry? Many combinations that will affect the performance of the cache, Plus there is not way in java to know the size of object in memory, you will have to implement your way (there are some samples online)
But as I said before this will affect the performance, hope I answered your question :)

Thanks,