Saturday, February 14, 2009

Intro to Caching,Caching algorithms and caching frameworks part 3

Introduction:

In part 1 we talked about the basics and terminologies of cache and we have also shown replacement policies , in part 2 we implemented some of these famous replacement polices and now in this part we will continue talking about the implementation of two famous algorithms which are LFU and LRU. Again, the implementation in this article is for sake of demonstration and in order to use it (we just concentrate over the replacement algorithm and we will skip other things like loading data and so on), you will have to do some extra work but you can base your implementation over it.

Meet LFU Cache Implementation:

public synchronized Object getElement(Object key) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element = (CacheElement) obj;
element.setHitCount(element.getHitCount() + 1);
return element.getObjectValue();
}
return null;

}

public final synchronized void addElement(Object key, Object value) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element;

// Just replace the value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

if (!isFull()) {

index = numEntries;
++numEntries;
} else {
CacheElement element = removeLfuElement();
index = element.getIndex();
table.remove(element.getObjectKey());

}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
cache[index].setIndex(index);
table.put(key, cache[index]);
}

public CacheElement removeLfuElement() {

CacheElement[] elements = getElementsFromTable();
CacheElement leastElement = leastHit(elements);
return leastElement;
}

public static CacheElement leastHit(CacheElement[] elements) {

CacheElement lowestElement = null;
for (int i = 0; i < elements.length; i++) {
CacheElement element = elements[i];
if (lowestElement == null) {
lowestElement = element;

} else {
if (element.getHitCount() < lowestElement.getHitCount()) {
lowestElement = element;
}
}
}
return lowestElement;
}
Analyzing LFU Cache Code (Talk Show):


Presenter: it is getting hotter and hotter now, our next contestant is LFU cache, please make some noise for it.

Audience began to scream for LFU which made LFU hesitated.

Hello, I am LFU, when the cache client want to add a new element and cache is full (no enough room for the new entry) I will have to kick out the least frequently used entry, by using the help of the removelfuElement method which will allow me to get the least frequently used element, after I get it, I will remove this entry and place the new entry


else {
CacheElement element = removeLfuElement();
index = element.getIndex();
table.remove(element.getObjectKey());

}


If we dived into this method…, I am saying if we dived into this method (still nothing happened)
LFU tried pressing the next button on the presentation remote control (to get the next presentation slide) but I didn’t work.

Ahh now we are talking, ok if we dived into this method we will see that the method is just getting the whole elements in cache by calling getElementsFromTable method and then returns the element with the least hit.


public CacheElement removeLfuElement() {

CacheElement[] elements = getElementsFromTable();
CacheElement leastElement = leastHit(elements);
return leastElement;
}
}


By calling leastHit method which loops over the cache elements and check if the current element has the least hit, if so, I will make it my lowestElement which I am going replace the new entry with.

public static CacheElement leastHit(CacheElement[] elements) {

CacheElement lowestElement = null;
for (int i = 0; i <>
CacheElement element = elements[i];
if (lowestElement == null)
{ lowestElement = element; }
else {

if (element.getHitCount() <>
{ lowestElement = element; }
}
}
return lowestElement;
}

LFU stopped talking and waited for any action from the audience and the only action it get was scratching heads (audience didn’t get some stuff).

One of the production team whispered to LFU cache and said: you didn’t mention how the lowest element will be distinguished from another element?

Then LFU cache started talking gain and said: By default when you add the element to the cache its hitCoint will be the same as the previous element so how do we handle the hit count thing?

Every time I encounter a cache hit I will increment the hit count of the entry and then return the entry the cache client asked for which would be something like that


public synchronized Object getElement(Object key) {

Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element = (CacheElement) obj;
element.setHitCount(element.getHitCount() + 1);
return element.getObjectValue();
}
return null;

}


Magnifying the Code:

Did anyone say magnification?




Meet LRU Cache Implementation:

private void moveToFront(int index) {
int nextIndex, prevIndex;

if(head != index) {
nextIndex = next[index];
prevIndex = prev[index];

// Only the head has a prev entry that is an invalid index so
// we don't check.
next[prevIndex] = nextIndex;

// Make sure index is valid. If it isn't, we're at the tail
// and don't set prev[next].
if(nextIndex >= 0)
prev[nextIndex] = prevIndex;
else
tail = prevIndex;

prev[index] = -1;
next[index] = head;
prev[head] = index;
head = index;
}
}

public final synchronized void addElement(Object key, Object value) {
int index;
Object obj;

obj = table.get(key);

if(obj != null) {
CacheElement entry;

// Just replace the value, but move it to the front.
entry = (CacheElement)obj;
entry.setObjectValue(value);
entry.setObjectKey(key);

moveToFront(entry.getIndex());

return;
}

// If we haven't filled the cache yet, place in next available spot
// and move to front.
if(!isFull()) {
if(_numEntries > 0) {
prev[_numEntries] = tail;
next[_numEntries] = -1;
moveToFront(numEntries);
}
++numEntries;
} else {
// We replace the tail of the list.
table.remove(cache[tail].getObjectKey());
moveToFront(tail);
}

cache[head].setObjectValue(value);
cache[head].setObjectKey(key);
table.put(key, cache[head]);
}

Analyzing LRU Cache Code (Talk show):

After LFU finished talking, there were not much screaming, they didn’t like the presentation and LFU was hesitating while talking, this gave a big push to LRU which started by saying:

This time I will consider the case also when the cache is not full, I am little more complex than those other algorithms, when the cache isn’t full and it is the first entry I will just increment the numEntries which represents the number of entries in the cache.

After adding a second entry I will need to move it to the front by calling moveToFront method (we will talk about it soon), I didn’t do this for the first entry because it is for sure the first element.

So let’s see some action.

As you can see I am stating that the previous of the current entry will have the tail value and the next entry will be -1 (undefined in other words) these are just initial data.

After adding the new entry (which isn’t the first entry) I will move it to front.

if(!isFull()) {
if(_numEntries > 0) {
prev[_numEntries] = tail;
next[_numEntries] = -1;
moveToFront(numEntries);
}
++numEntries;
}

The moveToFront method moves an entry to the head of the array so that the least recently used elements reside at the bottom of the array.

Before I do any move I check if the head is not equal to current index (this will be false in case we only have 1 entry) if yes, then assign the value of the next of the current entry (which is a pointer to next entry as in linked list) to nextIndex and the value of the previous of the current entry (which is a pointer to the previous entry as in linked list) to prevIndex

int nextIndex, prevIndex;

if(head != index) {
nextIndex = next[index];
prevIndex = prev[index];

Then I assign the value of the nextIndex to the value of next of the previous entry

// Only the head has a prev entry that is an invalid index so
// we don't check.
next[prevIndex] = nextIndex;

After that I am going to check for the nextIndex if it is greater that or equal 0 then the previous the next entry will have the value of prevIndex , else the tail will be equal to the prevIndex

// Make sure index is valid. If it isn't, we're at the tail
// and don't set prev[next].
if(nextIndex >= 0)
prev[nextIndex] = prevIndex;
else tail = prevIndex;

And because I moved this entry to the front so there won’t be any previous entry for it so am assigning -1 to it and the next entry of the current entry (top one) will be the head (previous old head) and the prev of head (the old head) will have the index of the current entry and then the new head is assigned the new index (current index)

prev[index] = -1;
next[index] = head;
prev[head] = index;
head = index;


Magnifying the Code:

It is magnifying time! Get your magnifying glass we are going to see some interesting stuff here



It is Confession Time! :

LRU didn’t mention that it is possible to implement the LRU algorithm in a simple way , our previous implementation is based on Arrays , the other implementation that LRU cache didn’t mention is through LinkedHashMap which was introduced in JDK 1.4

public class LRUCache2 extends LinkedHashMap
{
private
static final int MAX_ENTRIES = 3;


public LRUCache2()

{
super(MAX_ENTRIES+1, .75F,
true);
}

// This method is
invoked by put and putAll after inserting a new entry into

// the map. It allows the map to have up to 3 entries and then
// delete the oldest entry each time a new entry is
added. protected boolean removeEldestEntry(Map.Entry eldest)
{
return
this.size() > MAX_ENTRIES;
}

}

For sure, the LinkedHashMap solution is less time consuming that the array solution and it is more efficient coz you will leave the handling of the deletion and so on to the JDK itself, so you won’t bother yourself implementing such stuff.
OSCache use such implementation in its LRU caching implementation.

Conclusion:

We have seen how to implement LFU and LRU algorithms and the two ways to implement the LRU, it is based on you to choose which way to use, Arrays or LinkedHashMap for me I would recommend Arrays for small size entries and LinkedHashMap for big size entries.

In next part we will be talking about the Caching framework and a comparison between them and what caching algorithm is employed by which caching framework, stay tuned till then ;)


Read more!

Tuesday, January 13, 2009

Intro to Caching,Caching algorithms and caching frameworks part 2

Introduction:

In this part we are going to show how to implement some of the famous replacement algorithms as we mentioned in part 1, the code in this article is just for demonstration purpose which means you will have to do some extra effort if you want to make use of it in your application (if you are going to build your own implementation and wont use any caching frameworks)

The Leftover policy:

After programmer 1 read the article he proceeded to review the comments on this article, one of these comments were talking about leftover policy, which is named “Random Cache”

Random Cache:

I am random cache, I replace any cache entry I want, I just do that and no one can complain about that, you can say the unlucky entry, by doing this I remove any overhead of tracking references or so, am better than FIFO policy, in some cases I perform even better than LRU but in general LRU is better than me.

It is comment time:

While programmer 1 was reading the rest of the comments, he found very interesting comment about implementation of some of the famous replacement policies, actually it was a link to the commenter site which has the actual implementation so programmer 1 clicked the link and here what he got:

Meet the Cache Element:

public class CacheElement {

private Object objectValue;

private Object objectKey;

private int index;

private int
hitCount;
.
. // getters and setters
.
}

This is the cache entry which will use to hold the key and the value; this will be used in all the cache algorithms implementation

Common Code for All Caches:

public final synchronized void addElement(Object key,Object value) {

int index;
Object obj;

// get the entry from the table
obj = table.get(key);

// If we have the entry already in our table
then get it and replace only its value.
if (obj != null) {
CacheElement
element;

element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}
}

The above code will be common for all our implementation; it is about checking if the cacheElemnet already exists in our cache, if so then we just need to place its value and we don’t need to make anything else but what if we didn’t find it ? Then we will have to dig deeper and see what will happen below.

The Talk Show:

Today’s episode is a special episode , we have special guests , they are in fact compotators we are going to hear what everyone has to say but first lets introduce our guests:

Random Cache, FIFO Cache

Let’s start with the Random Cache.

Meet Random Cache implementation:

public final synchronized void addElement(Object key,Object value) {

int index;
Object obj;

obj = table.get(key);

if (obj
!= null) {
CacheElement element;

// Just replace the value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

// If we haven't
filled the cache yet, put it at the end.
if (!isFull()) {
index =
numEntries;
++numEntries;
} else {
// Otherwise, replace a random
entry.
index = (int) (cache.length * random.nextFloat());
table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);
}

Analyzing Random Cache Code (Talk show):

In today’s show the Random Cache is going to explain the code line by line and here we go.
I will go straight to the main point; if I am not full then I will place the new entry that the client requested at the end of the cache (in case there is a cache miss).

I do this by getting the number of entries that resides in the cache and assign it to index (which will be the index of the current entry the client is adding) after that I increment the number of entries.

if (!isFull()) {
index = numEntries;
++numEntries;
}

If I don’t have enough room for the current entry, I will have to kick out a random entry (totally random, bribing isn’t allowed).

In order to get the random entry, I will use the random util. shipped with java to generate a random index and ask the cache to remove the entry that its index equal to the generated index.

else {
// Otherwise, replace a random entry.
index = (int) (cache.length * random.nextFloat());
table.remove(cache[index].getObjectKey());
}

At the end I just place the entry -either the cache was full or no- in the cache.

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);

Magnifying the Code:

It is said that when you look at stuff from a near view it is better to understand it, so that’s why we have a magnifying glass and we are going to magnify the code to get more near to it (and maybe understand it more).

Cache entries in the same voice: hi ho, hi ho, into cache we go.

New cache entry: excuse me; I have a question! (Asking a singing old cache entry near to him)

Old cache entry: go ahead.

New cache entry: I am new here and I don’t understand my role exactly, how will the algorithm handle us?

Old cache entry: cache! (Instead of man!), you remind me of myself when I was new (1st time I was added to the cache), I used to ask questions like that, let me show you what will happen.






Meet FIFO Cache Implementation:

public final synchronized void addElement(Object
key,Object value) {
int index;
Object obj;

obj = table.get(key);

if (obj != null) {
CacheElement element;

// Just replace the
value.
element = (CacheElement) obj;
element.setObjectValue(value);
element.setObjectKey(key);

return;
}

// If we haven't
filled the cache yet, put it at the end.
if (!isFull()) {
index =
numEntries;
++numEntries;
} else {
// Otherwise, replace the current
pointer, entry with the new one
index = current;
// in order to make
Circular FIFO
if (++current >= cache.length)
current = 0;

table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);
}

Analyzing FIFO Cache Code (Talk show):

After Random Cache, audience went crazy for random cache, which made FIFO a little bit jealous so FIFO started talking and said:

When there is no more rooms for the new cache entry , I will have to kick out the entry at the front (the one came first) as I work in a circular queue like manner, by default the current position is at the beginning of the queue(points to the beginning of the queue).

I assign current value to index (index of the current entry) and then check to see if the incremented current greater than or equals to the cache length(coz I want to reset current –pointer- position to the beginning of the queue) ,if so then I will set current to zero again ,after that I just kick the entry at the index position (Which is the first entry in the queue now) and place the new entry.

else {
// Otherwise, replace the current pointer,
which takes care of
// FIFO in a circular fashion.
index = current;

if (++current >= cache.length)
current = 0;

table.remove(cache[index].getObjectKey());
}

cache[index].setObjectValue(value);
cache[index].setObjectKey(key);
table.put(key, cache[index]);

Magnifying the Code:

Back to our magnifying glass we can observe the following actions happening to our entries


Conclusion:

As we have seen in this article how to implement the FIFO replacement policy and also Random replacement policy, in the upcoming articles we will try to take our magnifying glass and magnify LFU, LRU replacement policy, till then stay tuned ;)


Read more!

Friday, January 2, 2009

Intro to Caching,Caching algorithms and caching frameworks part 1

Introduction:

A lot of us heard the word cache and when you ask them about caching they give you a perfect answer but they don’t know how it is built, or on which criteria I should favor this caching framework over that one and so on, in this article we are going to talk about Caching, Caching Algorithms and caching frameworks and which is better than the other.

The Interview:

"Caching is a temp location where I store data in (data that I need it frequently) as the original data is expensive to be fetched, so I can retrieve it faster. "

That what programmer 1 answered in the interview (one month ago he submitted his resume to a company who wanted a java programmer with a strong experience in caching and caching frameworks and extensive data manipulation)

Programmer 1 did make his own cache implementation using hashtable and that what he only knows about caching and his hashtable contains about 150 entry which he consider an extensive data(caching = hashtable, load the lookups in hashtable and everything will be fine nothing else) so lets see how will the interview goes.

Interviewer: Nice and based on what criteria do you choose your caching solution?

Programmer 1 :huh, (thinking for 5 minutes) , mmm based on, on , on the data (coughing…)

Interviewer: excuse me! Could you repeat what you just said again?

Programmer 1: data?!

Interviewer: oh I see, ok list some caching algorithms and tell me which is used for what

Programmer 1: (staring at the interviewer and making strange expressions with his face, expressions that no one knew that a human face can do :D )

Interviewer: ok, let me ask it in another way, how will a caching behave if it reached its capacity?

Programmer 1: capacity? Mmm (thinking… hashtable is not limited to capacity I can add what I want and it will extend its capacity) (that was in programmer 1 mind he didn’t say it)

The Interviewer thanked programmer 1 (the interview only lasted for 10minutes) after that a woman came and said: oh thanks for you time we will call you back have a nice day
This was the worst interview programmer 1 had (he didn’t read that there was a part in the job description which stated that the candidate should have strong caching background ,in fact he only saw the line talking about excellent package ;) )

Talk the talk and then walk the walk

After programmer 1 left he wanted to know what were the interviewer talking about and what are the answers to his questions so he started to surf the net, Programmer 1 didn’t know anything else about caching except: when I need cache I will use hashtable
After using his favorite search engine he was able to find a nice caching article and started to read.

Why do we need cache?

Long time ago before caching age user used to request an object and this object was fetched from a storage place and as the object grow bigger and bigger, the user had spend more time to fulfill his request, it really made the storage place suffer a lot coz it had to be working for the whole time this caused both the user and the db to be angry and there were one of 2 possibilities

1- The user will get upset and complain and even wont use this application again(that was the case always)

2- The storage place will pack up its bags and leave your application , and that made a big problems(no place to store data) (happened in rare situations )

Caching is a god sent:

After few years researchers at IBM (in 60s) introduced a new concept and named it “Cache”

What is Cache?

Caching is a temp location where I store data in (data that I need it frequently) as the original data is expensive to be fetched, so I can retrieve it faster.

Caching is made of pool of entries and these entries are a copy of real data which are in storage (database for example) and it is tagged with a tag (key identifier) value for retrieval.
Great so programmer 1 already knows this but what he doesn’t know is caching terminologies which are as follow:



Cache Hit:

When the client invokes a request (let’s say he want to view product information) and our application gets the request it will need to access the product data in our storage (database), it first checks the cache.

If an entry can be found with a tag matching that of the desired data (say product Id), the entry is used instead. This is known as a cache hit (cache hit is the primary measurement for the caching effectiveness we will discuss that later on).
And the percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.

Cache Miss:

On the contrary when the tag isn’t found in the cache (no match were found) this is known as cache miss , a hit to the back storage is made and the data is fetched back and it is placed in the cache so in future hits it will be found and will make a cache hit.

If we encountered a cache miss there can be either a scenarios from two scenarios:

First scenario: there is free space in the cache (the cache didn’t reach its limit and there is free space) so in this case the object that cause the cache miss will be retrieved from our storage and get inserted in to the cache.

Second Scenario: there is no free space in the cache (cache reached its capacity) so the object that cause cache miss will be fetched from the storage and then we will have to decide which object in the cache we need to move in order to place our newly created object (the one we just retrieved) this is done by replacement policy (caching algorithms) that decide which entry will be remove to make more room which will be discussed below.

Storage Cost:

When a cache miss occurs, data will be fetch it from the back storage, load it and place it in the cache but how much space the data we just fetched takes in the cache memory? This is known as Storage cost

Retrieval Cost:

And when we need to load the data we need to know how much does it take to load the data. This is known as Retrieval cost

Invalidation:

When the object that resides in the cache need is updated in the back storage for example it needs to be updated, so keeping the cache up to date is known as Invalidation.
Entry will be invalidate from cache and fetched again from the back storage to get an updated version.

Replacement Policy:

When cache miss happens, the cache ejects some other entry in order to make room for the previously uncached data (in case we don’t have enough room). The heuristic used to select the entry to eject is known as the replacement policy.

Optimal Replacement Policy:

The theoretically optimal page replacement algorithm (also known as OPT or Belady’s optimal page replacement policy) is an algorithm that tries to achieve the following: when a cached object need to be placed in the cache, the cache algorithm should replace the entry which will not be used for the longest period of time.

For example, a cache entry that is not going to be used for the next 10 seconds will be replaced by an entry that is going to be used within the next 2 seconds.

Thinking of the optimal replacement policy we can say it is impossible to achieve but some algorithms do near optimal replacement policy based on heuristics.
So everything is based on heuristics so what makes algorithm better than another one? And what do they use for their heuristics?

Nightmare at Java Street:

While reading the article programmer 1 fall a sleep and had nightmare (the scariest nightmare one can ever have)

Programmer 1: nihahha I will invalidate you. (Talking in a mad way)

Cached Object: no no please let me live, they still need me, I have children.

Programmer 1: all cached entries say that before they are invalidated and since when do you have children? Never mind now vanish for ever.

Buhaaahaha , laughed programmer 1 in a scary way, ,silence took over the place for few minutes and then a police serine broke this silence, police caught programmer 1 and he was accused of invalidating an entry that was still needed by a cache client, and he was sent to jail.

Programmer 1 work up and he was really scared, he started to look around and realized that it was just a dream then he continued reading about caching and tried to get rid of his fears.

Caching Algorithms:

No one can talk about caching algorithms better than the caching algorithms themselves

Least Frequently Used (LFU):

I am Least Frequently used; I count how often an entry is needed by incrementing a counter associated with each entry.

I remove the entry with least frequently used counter first am not that fast and I am not that good in adaptive actions (which means that it keeps the entries which is really needed and discard the ones that aren’t needed for the longest period based on the access pattern or in other words the request pattern)

Least Recently Used (LRU):

I am Least Recently Used cache algorithm; I remove the least recently used items first. The one that wasn’t used for a longest time.

I require keeping track of what was used when, which is expensive if one wants to make sure that I always discards the least recently used item.
Web browsers use me for caching. New items are placed into the top of the cache. When the cache exceeds its size limit, I will discard items from the bottom. The trick is that whenever an item is accessed, I place at the top.

So items which are frequently accessed tend to stay in the cache. There are two ways to implement me either an array or a linked list (which will have the least recently used entry at the back and the recently used at the front).

I am fast and I am adaptive in other words I can adopt to data access pattern, I have a large family which completes me and they are even better than me (I do feel jealous some times but it is ok) some of my family member are (LRU2 and 2Q) (they were implemented in order to improve LRU caching

Least Recently Used 2(LRU2):

I am Least recently used 2, some people calls me least recently used twice which I like it more, I add entries to the cache the second time they are accessed (it requires two times in order to place an entry in the cache); when the cache is full, I remove the entry that has a second most recent access. Because of the need to track the two most recent accesses, access overhead increases with cache size, If I am applied to a big cache size, that would be a problem, which can be a disadvantage. In addition, I have to keep track of some items not yet in the cache (they aren’t requested two times yet).I am better that LRU and I am also adoptive to access patterns.

-Two Queues:

I am Two Queues; I add entries to an LRU cache as they are accessed. If an entry is accessed again, I move them to second, larger, LRU cache.

I remove entries a so as to keep the first cache at about 1/3 the size of the second. I provide the advantages of LRU2 while keeping cache access overhead constant, rather than having it increase with cache size. Which makes me better than LRU2 and I am also like my family, am adaptive to access patterns.

Adaptive Replacement Cache (ARC):

I am Adaptive Replacement Cache; some people say that I balance between LRU and LFU, to improve combined result, well that’s not 100% true actually I am made from 2 LRU lists, One list, say L1, contains entries that have been seen only once “recently”, while the other list, say L2, contains entries that have been seen at least twice “recently”.

The items that have been seen twice within a short time have a low inter-arrival rate, and, hence, are thought of as “high-frequency”. Hence, we think of L1as capturing “recency” while L2 as capturing “frequency” so most of people think I am a balance between LRU and LFU but that is ok I am not angry form that.

I am considered one of the best performance replacement algorithms, Self tuning algorithm and low overhead replacement cache I also keep history of entries equal to the size of the cache location; this is to remember the entries that were removed and it allows me to see if a removed entry should have stayed and we should have chosen another one to remove.(I really have bad memory)And yes I am fast and adaptive.

Most Recently Used (MRU):

I am most recently used, in contrast to LRU; I remove the most recently used items first. You will ask me why for sure, well let me tell you something when access is unpredictable, and determining the least most recently used entry in the cache system is a high time complexity operation, I am the best choice that’s why.

I am so common in the database memory caches, whenever a cached record is used; I replace it to the top of stack. And when there is no room the entry on the top of the stack, guess what? I will replace the top most entry with the new entry.

First in First out (FIFO):

I am first in first out; I am a low-overhead algorithm I require little effort for managing the cache entries. The idea is that I keep track of all the cache entries in a queue, with the most recent entry at the back, and the earliest entry in the front. When there e is no place and an entry needs to be replaced, I will remove the entry at the front of the queue (the oldest entry) and replaced with the current fetched entry. I am fast but I am not adaptive

-Second Chance:

Hello I am second change I am a modified form of the FIFO replacement algorithm, known as the Second chance replacement algorithm, I am better than FIFO at little cost for the improvement. I work by looking at the front of the queue as FIFO does, but instead of immediately replacing the cache entry (the oldest one), i check to see if its referenced bit is set(I use a bit that is used to tell me if this entry is being used or requested before or no). If it is not set, I will replace this entry. Otherwise, I will clear the referenced bit, and then insert this entry at the back of the queue (as if it were a new entry) I keep repeating this process. You can think of this as a circular queue. Second time I encounter the same entry I cleared its bit before, I will replace it as it now has its referenced bit cleared. am better than FIFO in speed

-Clock:

I am Clock and I am a more efficient version of FIFO than Second chance because I don’t push the cached entries to the back of the list like Second change do, but I perform the same general function as Second-Chance.

I keep a circular list of the cached entries in memory, with the "hand" (something like iterator) pointing to the oldest entry in the list. When cache miss occurs and no empty place exists, then I consult the R (referenced) bit at the hand's location to know what I should do. If R is 0, then I will place the new entry at the "hand" position, otherwise I will clear the R bit. Then, I will increment the hand (iterator) and repeat the process until an entry is replaced.I am faster even than second chance.

Simple time-based:

I am simple time-based caching; I invalidate entries in the cache based on absolute time periods. I add Items to the cache, and they remain in the cache for a specific amount of time. I am fast but not adaptive for access patterns

Extended time-based expiration:

I am extended time based expiration cache, I invalidate the items in the cache is based on relative time periods. I add Items the cache and they remain in the cache until I invalidate them at certain points in time, such as every five minutes, each day at 12.00.

Sliding time-based expiration:

I am Sliding time-base expiration, I invalidate entries a in the cache by specifying the amount of time the item is allowed to be idle in the cache after last access time. after that time I will invalidate it . I am fast but not adaptive for access patterns

Ok after we listened to some replacement algorithms (famous ones) talking about themselves, some other replacement algorithms take into consideration some other criteria like:

Cost: if items have different costs, keep those items that are expensive to obtain, e.g. those that take a long time to get.

Size: If items have different sizes, the cache may want to discard a large item to store several smaller ones.

Time: Some caches keep information that expires (e.g. a news cache, a DNS cache, or a web browser cache). The computer may discard items because they are expired. Depending on the size of the cache no further caching algorithm to discard items may be necessary.

The E-mail!

After programmer 1 did read the article he thought for a while and decided to send a mail to the author of this caching article, he felt like he heard the author name before but he couldn’t remember who this person was but anyway he sent him mail asking him about what if he has a distributed environment? How will the cache behave?

The author of the caching article got his mail and ironically it was the man who interviewed programmer 1 :D, The author replied and said :

Distributed caching:

*Caching Data can be stored in separate memory area from the caching directory itself (who handle the caching entries and so on) can be across network or disk for example.

*Distrusting the cache allows increase in the cache size.

*In this case the retrieval cost will increase also due to network request time.

*This will also lead to hit ratio increase due to the large size of the cache

But how will this work?

Let’s assume that we have 3 Servers, 2 of them will handle the distributed caching (have the caching entries), and the 3rd one will handle all the requests that are coming (Which asks about cached entries):

Step 1: the application requests keys entry1, entry2 and entry3, after resolving the hash values for these entries, and based on the hashing value it will be decided to forward the request to the proper server.

Step 2: the main node sends parallel requests to all relevant servers (which has the cache entry we are looking for).

Step 3: the servers send responses to the main node (which sent the request in the 1st place asking to the cached entry).

Step 4: the main node sends the responses to the application (cache client).

*And in case the cache entry were not found (the hashing value for the entry will be still computed and will redirect either to server 1 or server 2 for example and in this case our entry wont be found in server 1 so it will fetched from the DB and added to server 1 caching list.


Measuring Cache:

Most caches can be evaluated based on measuring the hit ratio and comparing to the theoretical optimum, this is usually done by generation a list of cache keys with no real data, but here the hit ratio measurement assumes that all entries have the same retrieval cost which is not true for example in web caching the number of bytes the cache can server is more important than the number of hit ration (for example I can replace the big entry will 10 small entries which is more effective in web)

Conclusion:

We have seen some of popular algorithms that are used in caching, some of them are based on time, cache object size and some are based on frequency of usage, next part we are going to talk about the caching framework and how do they make use of these caching algorithms, so stay tuned ;)

Related Articles:

Part 2 (Algorithm Implementation)

Part 3 (Algorithm Implementation)

Part 4 (Frameworks Comparison)

Part 5 (Frameworks Comparison)


Read more!

Tuesday, December 23, 2008

Scripting in JDK6 (JSR 223) Part 2

Its Lunch Time!

After programmer 2 shown programmer 1 how to deal with JSR 223 or JDK6 scripting ( part 1) programmer 2 went for lunch as he didn’t eat anything since yesterday and left programmer 1 to meet his fate.

Programmer 1 began to apply what he learned from programmer 2 and stuff went so bad, after 30 minutes programmer 2 came back.

Programmer 2: Man, how is everything going? (While he was just about to finish his last bite from the sandwich)

Programmer 1: man the performance sucks, it is really really slow and that won’t be good thing, it is bottleneck now

Programmer 2: ok let me see what did you do. (Swallowed his last bite hardly)

The Crime!:

What programmer 1 wrote was the following:

ScriptEngineManager manager = new ScriptEngineManager();

ScriptEngine engine = manager.getEngineByName("javascript");
engine.put("x",1);
for(int i=0;i<100;i++){

engine.eval("function xReturner(){" +" x=x*10; return x;};xReturner();");

}

As we can see programmer 1 did nothing wrong , he just used what he learned but this code is a disaster, it will make performance issue , in our case we are using JavaScript which is interpreted with each request to the eval method ,this mean that every hit to this method and then calling the eval will do the following : read the file(if we are reading from file) , evaluate and then execute , oh man that so bad if we have a very complicated logic resides in a very big file or even a big function this will cause performance issue, lucky enough that there is something to over come this.

The Solution! :

There is an interface shipped with the script package named Compilable, from its name we can conclude that it might be used to compile our script, and yup that’s right it compiles our script so that we don’t need to evaluate and read it before executing it every time we need to invoke the script, this can be achieved as shown in the following code:

ScriptEngine engine = manager.getEngineByName("javascript");
if(engine instanceof Compilable)

{

Compilable compiledEng=(Compilable) engine;

CompiledScript script= compiledEng.compile("function xReturner(){" +" x=x*10; return x;};xReturner();");
for(int i=0;i<100;i++){

engine.put("x",i);

Object result =script.eval();}

}

else{

engine.eval("function xReturner(){" +" x=x*10; return x;};xReturner();");

}

First we check to see if this engine is implementing Compilable interface or no (this is an optional interface so some engines might not implement it but in our case as we are using JS engine shipped with JDK which implements this interface) so if our engine implements this interface then everything will be smooth else we don’t have any other way except the normal eval method.

Ok so we will just check if it implements the Compilable interface so we will invoke the compile method of the engine which will return CompiledScript which we are going to use and invoke our eval method on it and get the result.

By invoking these two scripts on programmers 1 machine we got these results:

Case 1 (no compiling): 931 ms

Case 2(with compiling) 661 ms

*Oh yea the machine that was used is a little bit slow (very slow, in fact used for performance testing for sure :p)

After solving the disaster programmer 1 was very happy by this performance tip.

Programmer 1: oh man thanks a lot for this tip, but I have another thing in mind, does JSR 223 enable me to invoke a method of my choice from a list of methods?

Programmer 2: oh that’s nice questions, let me show you how.

The Invocable!

Yup as you figured out it is the same thing as Compliable an interface that is optionally implemented by the engine you are using.

It enables you to invoke a method by name and also pass parameters to this method (we won’t need any binding here as all the stuff is handled by concrete implementation of this interface).

Before we dig deep in it and see sample code we need to mention that there are two methods that are used for this purpose and they are “invokeMethod”,”invokeFunction” lol it has the same meaning but if you give it another thought you would figure out that the names is correctly used (Function is usually used to address functions in non Object oriented or in other words flat structure and Method are used to address methods in object oriented language) so what are the difference between them ?

The difference between them is as follow:

invokeFunction: used to invoke methods on the top level which means methods that doesn’t reside inside a class (as in Ruby for example)

invokeMethod: used to invoke methods inside a class or module

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");

engine.eval("function getWelcomeMessage(name){return 'hello '+name;};");
Object[] params = {"Wolrd"};
Invocable invEngine=(Invocable)engine;
System.out.println(invEngine.invokeFunction("getWelcomeMessage",params[0]));

As we can see in the previous code we just got instance from our engine and cast it to invocable and then call the invokeFunction method which will take the function we want to invoke and also the parameters we want to pass, nothing else we can say about the invokeFunction on the other side invokeMethod as we said it can be applied for JRuby, well as people say nothing is better than a nice simple example ;)

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("jruby");
Object obj=engine.eval(new FileReader("c:\\JRubySample.rb"));
Invocable invEngine=(Invocable)engine;
Object[] params = {"Wolrd"};

System.out.println(invEngine.invokeFunction("show",params[0]));;

System.out.println(invEngine.invokeMethod(obj,"show",params[0]));

As we can see above the code, it is the same as we did before but here we changed our scripting engine to be JRuby and not JavaScript as we used to do but here we read from a file named JRubySample which has the following contents:

# Class Scripting sample
class SampleClass

def show(message)
puts "hello "+message
end
end

def show(message)
puts "hello "+message +" outside"
end
sample = SampleClass.new

return sample;

we can see that we have two methods with the name show, one is a top level (flat structure) and the other reside in a class so as we mentioned before the invoke function will work on top level functions so after invoking the invokeFunction we will get the value “hello world outside” on the other side we said that invokeMethod works on methods that resides in a class.

How can we decide which class we want to invoke its method?

This is determined in the first parameter of the invokeMethod as we can see in our JRuby sample we are returning an object from the class SampleClass and then this object is returned by the eval method and then we pass it to the invokeMethod which will take this object, look for show method in it and then invoke it so we will get this message “hello world”.

Nice, but is that the only thing Invocable interface can do? Nope it can do more than this.

Its Creation Time!

The title sounds strange (all titles in fact), well it is not that strange when you know what is it all about.

The other nice feature about Invocable interface is : with the usage of getInterface method you can create interface implementation from the methods in the script you are evaluating ,this mean that getInterface method returns an implementation of a specific java interface and the methods in this interface is implemented by the script being evaluated(the script has functions that have the same signature as the one in the java interface), so when I get the object returned by getInterface and invoke a method on it , it will just run the logic that were written in the script method, in the code below we can see that happens :

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");

engine.eval("function getAccountName(){var name='Account 1'; return name;};");
Invocable invEngine=(Invocable)engine;
AccountOperations operation=invEngine.getInterface(AccountOperations.class);
System.out.println(operation.getAccountName());

As we can see in the previous code sample we are going to evaluate the script which has one method named getAccountName (returns a string that represents the account name)
The new thing here is that we are using the getInterface method which would return an interface implementation to us, as we can see it takes the Interface that we want to get implementation for (it will return an implementation of that interface) after that we will call getAccountName method (the only method inside this simple interface) on the instance returned by the getInterface method and that’s it!

One last thing:

Ok I feel guilty as I didn’t mention something which is:” we can use java objects in JavaScript engine that is shipped with JDK6”

Oh yea you are talking about binding right? nope we are not talking about the bindings here I mean it literally, what I mean is that I can use java objects inside the script I am writing lets take a look at the following example :

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine egine = manager.getEngineByExtension("js");egine.eval("importPackage(javax.swing);var optionPane =JOptionPane.showMessageDialog(null, 'Hello!');");

As we can see we used this time getEngineByExtension instead of getEngineByName (both will do the same thing , no reason why I changed it here but as we said before I can use getEngineByExtension if we are parsing scripts for example so we don’t know what script we have now so we will just get engine instance based on its extension ) after that we actually imported java objects and used it in our script engine and when we call the eval method we will get a message box with Hello message

Back To Our Programmers:

Programmer 1: oh man that’s awesome thanks a lot. I owe you one, I want to do anything for you, tell me what can i do?
Programmer 2: it is ok, no need for that
Programmer 1: no no I insist
Programmer 2: well if you are, invite me on dinner.
Programmer 1: ok consider it done.


And in the restaurant programmer 1 forgot that he doesn’t have enough money and programmer 2 had to pay for it.

Despite this sad ending for programmer 2, programmer 1 were able to finish the task on time, thanks to JSR 223 which enabled him to do so and lets not forget programmer 2 as well

Conclusion:

JSR 223 is really nice feature shipped with JDK 6

*We saw that JSR 223 enables us to pass data from java to JavaScript and vice versa.

*Also enable us to invoke scripts not only that but also enables us to return implementation of a specific interface when the script we are invoking have methods as the interface we want to get implementation for.

*it also enable us to use java objects in the script as in the JavaScript engine “rhino“ shipped with the JDK 6





Read more!

Friday, December 12, 2008

Scripting in JDK6 (JSR 223) Part 1

Introduction:

For sure most of us (mm guess so) have heard about the Scripting provided in Java 6 (Mustang) or JSR 223, 1st time I saw that (just saw the title) I thought that the Java guys will enable us to compile and run JavaScript scripts and that’s the end, well nope that wasn’t what Scripting in java 6 about but actually it is about enabling scripting languages to access the java platform and get the advantages of using java and the java programmers to use such scripting languages engines and get advantages from that (as we will see ).after I got it I was thinking : oh man if the team who implemented the Scripting engine were living in the dark age and they came up with such idea they will be accused of The practice of witchcraft but thanks God we are open-minded now and they will live and the good thing is that they will continue to practice witchcraft :D
So for now java guys will stay alive and continue to do their voodoo stuff and we will be looking at this nice scripting engine.

A sad story (life before Scripting engine):

Programmer 1 is reading documents of an approved change request (while listening to Bon Jovi’s Have a nice Day) which was as follow:

“We need to change our Java Scripts in the client side and move it to server side and also our JRuby scripts to be converted into java classes and we need this by tomorrow”

Programmer 1:” Man, I’m going to have a bad day, how will I do this (develop, test and deploy) in one day this is insane, And that task is assigned to only me :S I don’t know JRuby and I am not that good at JavaScript, well better to move on and get some tutorials”

And in order to stick to the deadline programmer 1 had to get some advanced JavaScript, JRuby tutorials and didn’t have his lunch break L and he had to work till late hours in the next day morning in order to meet the deadline but as expected from programmer 1(that’s why they hired him) he completed his job on time and didn’t complain but he experienced a very bad day. TA DA end of story and it is one sad story, now lets rewind the tape and see how will the story end if programmer 1 had someone else to help him and to show him what is JSR 223.

A happy story (life with Scripting engine):

Same thing as previous (programmer 1 is reading CR document) and he got shocked

Programmer 1: man I’m gonna have a bad day, how will I do this (develop, test and deploy) in one day, this is insane.

A voice came from nowhere and said: what is going on? (That was programmer 2)

Programmer 1: man, check this change request we don’t have such time, we will have to work so hard to finish it in this time

Programmer 2: let me see… mmm well we could make a workaround and give them a build that everything is converted into java classes in no time.

Programmer 1: ok tell me more.

Programmer 2: we will give them what they want, which is all scripts are called and manipulated by java classes and called from server side (in JavaScript case for example) and by using the script engine we wont need to rewrite JRuby script or JavaScript in java we wont even need to know the logic or get any JRuby or JavaScript developer we will just call use the same logic and return the results and TA DA everything is going fine

Programmer 1: huh, seems to me that something hit you on the head, ok how we are going to do this?

Programmer 2: by using the scripting engine shipped with java 6

Programmer 1: huh !

Programmer 2: ok seems to me that you don’t understand a word, I will show you, just give me a paper

The Details:

What we need to know that the main class that we will need to interact with is ScriptEngineManager which will give us access to ScriptEngine which is the engine to any scripting language we want (in our case JavaScript and JRuby) by default JavaScript engine is shipped with JDK6 (Mozilla Rhino) but if we want other script engine we should download its engine implementation (there are some engines for, JRuby, PHP, Python, Groovy,…) ok lets see some JavaScript in action

In Action:

try {

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");
engine.eval("var out='hi';var date=new Date();");
}

catch (ScriptException scrpEx)
{
scrpEx.printStackTrace();
}

By analyzing the previous code we can see:

Line 3 we make an instance from the ScriptEngineManager which is the entry point or in other words we use it to gain access to our script engine like line 4 shows , we note here that we got the scriptEngine by specifying the name of the engine ,we can also get our engine by specifying MimeType or Extension and that’s by calling getEngineByMimeType or getEngineByExtension instead of calling getEngineByName ,this can be useful if we want to get a scriptEgnine instance based on a file extension or something like that.

ScriptEngine instance we got is an instance of JavaScript engine which will apply all JavaScript rules to the script written or evaluated on this engine.

After getting the script engine in line 5 we just invoked a method called eval on this engine, this method evaluates the script giving for this method here as we can see we have supplied the script as a normal string we can supply it from a file for example but for the sake of simplicity we will just invoke the eval method on a string.

After invoking the eval method on the previous script, the rules for JavaScript will be applied while parsing this script, if anything went wrong an exception will be thrown and the type of the exception that will be thrown is ScriptException as line 7 shows.

When Stuff Goes Wrong:

What if we changed the parsed line demonstrated in previous code to be something that JavaScript wont recognize, what will show up? A parsing exception will be thrown indicating that this is not correct, something like the exception below (after changing var to vear)

javax.script.ScriptException: sun.org.mozilla.javascript.internal.EvaluatorException: missing ; before statement (#1) in at line number 1


As we can see the exception will give you the line number that caused this exception with the column number and also the file name (in our case we are not using any file so we got Unknown Source)

Evaluating…:


Great now that we have known the basics of the scripting what else can I do with that? It doesn’t give me enough control yet, for example I can’t pass or get variable from java to JavaScript and vice versa, who said so? JSR 223 enables use to pass inputs and receive outputs, in the previous sample we’ve just shown how to acquire a ScriptEngine which is our entry to our selected script language but as we all know that this won’t be enough , we want to deal with the script itself in other words we need to pass variables to the script and get some output from that script to be manipulated by java. Fortunately JSR223 enables us to do this in an easy smooth way; the listing below shows us how we can get the return value from JS:

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");
System.out.println(engine.eval("var date=new Date(); date.getMonth();"));

The previous code just gets engine instance and after that it evaluates the expression and here eval will evaluate the expression against JS rules and then invoke this script which gives us back the value of the month but we are just printing the value, what if we wanted to take this value and use it somewhere else?

That’s easy we will just need to modify the last line to be like this:

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");
Double month=(Double)engine.eval("var date=new Date(); date.getMonth();");


Cross Worlds (the direct way):

And now we can do whatever we want on the resulted month value.

Ok now we have seen that we can get back the date from the script by evaluating it but what if I wanted to get a specific variable value? Or assigning a specific value to a variable?

What we didn’t mention is the eval method has more than overload (the one we talked about is either pass a string or read the script form a file)

There is something called ScriptContext which enables us to define the scope of Biding Objects (we will see what does that mean shortly).

Bindings, this is what we have been looking for, bindings enables you to pass java objects to the script and also get script variable to java world.

The ScriptContext can either be GLOBAL_SCOPE or ENGINE_SCOPE which means that every object that we are going to bind in case of GLOBAL_SCOPE will be shared for all engine script and ENGINE_SCOPE will mean that it will be only visible for this engine.

Let’s look for some code now:

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");
Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
bindings.put("a",1);
bindings.put("b",5);
Object a = bindings.get("a");
Object b = bindings.get("b");
System.out.println("a = " + a);
System.out.println("b = " + b);
Object result = engine.eval("c = a + b;");
System.out.println("a + b = " + result);

As we can see in the previous code we got the bindings of the engine and we’ve set its scope to be engine scope (only this engine).

After that we are calling bindings put method (which will allow us to pass java object to the script) bind the variable name “a” a value of 1 and variable “b” a value of 5so when evaluating the script, a will have value of 1 and b will have value of 5.

After that we are just getting the value of “a” and “b” back to the java world (we are just printing it but we could do anything with it and this is how we can get variable value from the script) and evaluating the script which will get us a result of 6

Cross Worlds (the indirect way):

We could have achieved the same functionality by using the indirect way as below:

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("javascript");
engine.put("a", 1);
engine.put("b", 5);
Object result = engine.eval("c=a + b;");
System.out.println("a + b = " + result);
System.out.println("a + b = " + engine.get("c"));

We just called put method on the engine which indirectly does the bindings for us same as the previous code and the get method also will get us the variable named “c” to our java world.

Conclusion:

For now we have seen how to make a simple application and call a script, evaluate it and get its return and even pass any input to what ever we want in part two we will look into more interesting stuff, so stay tuned ;)


Read more!

Friday, October 10, 2008

Another Late Post

sorry sorry sorry sorry sorry sorry sorry sorry

believe me i am in the middle of something and once i am done (either did it or no) i will tell you

everything about what i am doing (or was doing in the future :D )

just wait till november ;) and i will tell you everything sorry for not posting for a long time :(
Read more!