Monday, June 30, 2014

Building json.org as a java library

If you've done any pure Java development in the recent years you've probably found the need to work with JSON objects. Lots of SDK's include their own JSON parser or include the json.org parser. But what about if you're writing POJO (plain old java objects) libraries?

Here's a real quick and simple way to get the json.org code and compile it into a reusable library.

First you need to download the source code.

Next you need to extract the source code:
$ unzip json.org

$ mkdir src

$ mv org src

After you've extracted the source you need to compile the json.org source:

$ mkdir -p bin/classes

$ javac -d bin/classes src/org/json/*.java

The final step is to create the org.json.jar

$ cd bin/classes

$ jar cvf org.json.jar .

You now have a library you can use for JSON parsing with a simple reference in your project.

Monday, June 23, 2014

Chef on Windows: Web Deploy LWRP

This past April I had the pleasure of speaking at #ChefConf 2014 on the topic of Windows Web Server Management With ASP.NET. In that presentation I briefly touched on the topic of creating a Light Weight Resource Provider for Web Deploy. 

Today I thought I would walk you through creating that LWRP. The first thing you need to do when creating a LWRP is create the resource definition. The resource definition defines both the actions of the LWRP as well as the attributes. Our web deploy LWRP needs to support Web Deploys sync verb. I've chosen to use :sync for the name of my action because it corresponds to the domain specific language used by msdeploy.

There are three pieces of information that we want the consumers of our LWRP to be able to change in their recipe. The first is the name of the package being deployed. The second is the destination where the package should be deployed to. And last is any additional parameters that should be passed along to msdeploy.

We're also going to implement the initialize method of the resource in order to set :sync as the default action.  This is not strictly necessary but allows the consumer to not have to set the action in their recipe.

Here's what our resource definition looks like.

your_cookbook/resources/web_deploy.rb

actions :sync

attribute :package, :kind_of => String, :name_attribute => true
attribute :destination, :kind_of => String, :default => "auto"
attribute :parameters, :kind_of => Hash

def initialize(name, run_context=nil)
 super
 @action = :sync
end

Now that we've got our resource defined our next step in creating our Web Deploy LWRP is to create an implementation of our resource. We do this by defining a provider with the same name as the resource we previously defined. One of the really nice things about LWRP's is that Chef takes care of most of the heavy lifting in defining your provider. All we really have to do is implement the actions we declared in our resource.

Our :sync action implementation does 3 main things. First it creates the msdeploy command. This command is exactly what you'd type if you were running the command through a command prompt. Notice as we generate the command that we're setting the package and destination to the attributes with the same name on our new resource.

The second thing our implementation does is append any parameters set on the new resource to the msdeploy command.

The last thing our implementation does is to execute the command using the built in execute resource. It is VERY important to note that this IS NOT idempotent, meaning that this command will get run every time this recipe is run. This is really important to think about when creating your LWRP as you will need to build in the idempotency for implementations that run dynamically generated scripts.

your_cookbook/providers/web_deploy.rb

action :sync do
  msdeploy_cmd = "\"%programfiles%\\IIS\\Microsoft Web Deploy V2\\msdeploy.exe\" "
  msdeploy_cmd << "-verb:sync "
  msdeploy_cmd << "-source:package=\"#{@new_resource.package}\" " unless @new_resource.package.nil?
  msdeploy_cmd << "-dest:\"#{@new_resource.destination}\" " unless @new_resource.destination.nil?

  @new_resource.parameters.each do |name, value|
   msdeploy_cmd << "-setParam:name=\"#{name}\",value=\"#{value}\" "
  end unless @new_resource.parameters.nil?

  Chef::Log.info("MSDeploy Command: #{msdeploy_cmd}")

  execute "webdeploy" do
   command msdeploy_cmd
  end
end

You can build in that idempotency fairly easily in Chef. The first thing you need to do is define an attr_accessor on your resource. Something like attr_accessor :has_run. Then in your provider you need to override the load_current_resource method and perform any check necessary that will validate whether you need to perform your action or not. For example:

def load_current_resource
 ... perform check to see if your action has already been performed.
end

After you have that plumbing in place you can update your action :sync do implementation to wrap your command execution with a check to see if the action has already been run. For example.

 if @new_resource.has_run
  Chef::Log.info("Web Deploy has already been run")
 else
  ... put the msdeploy command execution logic here
 end

Now you have an LWRP which will allow you to idempotently deploy a web deploy package. Using your new LWRP within a recipe is very simple. Here's an example where we get the source package and additional parameters from an attribute. Notice that we don't set the destination parameter because our resource definition includes a default.

your_cookbook_web_deploy "#{node['webapp']['local_directory']}/#{node['webapp']['package']}" do
  parameters node['webapp']['parameters']
end


Monday, June 16, 2014

Back To The Basics: The Hash Table CRUD Operations

In my previous post Back To The Basics: The Hash Table I described the Hash Table data structure, provided a basic template for the class in Java, as well as walked you through the hashing algorithm. In this post we’ll take a look at the CRUD operations. Actually we'll really only talk about addition, retrieval, and deletion because adding and updating in a Hash Table are the same operation.

Let's take a look at the addget, and remove operations first in their simplest form.

private void add(String key, Object value) {
    int index = this.getKeyIndex(key);
    values[index] = value;
}

private Object get(String key) {
    int index = this.getKeyIndex(key);
    return values[index];
}

private Object remove(String key) {
    int index = this.getKeyIndex(key);
    Object value = values[index];
    values[index] = null;
    return value;
}

Looking at these three operations we can now see why a Hash Table in practice provides near constant time additional and retrieval operations. The only non-constant time operation that is performed is the generation of the hash code. But this operation can be optimized and it's resulting value cached (if necessary) effectively making the add and get operations approach O(1).

What you may notice by looking at these operations are that they don't handle collisions. The add operation is simply overwriting the existing value associated with the hash regardless of whether it's actually associated with the key. The get operation is returning whatever value is stored at the index generated by the getKeyIndex method. And, like the add operation, the remove operation is removing whatever is associated with the hash.

We can try to optimize our hashing algorithms as much as we want but we'll never truly be able to avoid collisions 100% of the time. The reason is that we'd have to know all the possible item's that can be hashed in order to create a truly perfect hashing algorithm. Since this is not really possible for a general purpose Hash Table, it's important that we add some code that can handle collisions when hashing the key.

The most straight forward way to handle hash collisions is to store the values at each index in a Linked List. Instead of setting the value associated with a particular key as the value in the backing array the value will be set to a Linked List which will contain all the key/value pairs that map to a particular hash. This will require us to add a little complexity to our Hash Table but will allow us to avoid putting our Hash Table into an unexpected state when a collision occurs.

The first thing we need to do to avoid collisions is to define an object that can represent a key/value pair item that gets stored in our Linked List for each value. Since we're using Java we're going to create a class that implements Map.Entry<String, Object>.

final class HashEntry<K,V> implements Map.Entry<K,V> {
private final K key;
private V value;

public HashEntry(K key, V value) {
this.key = key;
this.value = value;
}

@Override
public K getKey() { return this.key; }

@Override
public V getValue() { return this.value; }

@Override
public V setValue(V newValue) {
V oldValue = this.value;
this.value = newValue;
return oldValue;
}
}

The next thing we need to do is update our Hash Table class definition to reflect the fact that our backing array contains a Linked List of our Hash Entry key/value pairs instead of an array of objects. Our new class definition looks like:

public class HashTable {
private LinkedList<HashEntry<String,Object>>[] values;
private final int tableSize = 1543;

public HashTable() {
this.values = (LinkedList<HashEntry<String,Object>>[])new LinkedList[tableSize];
}
}

Now we can update our add, get, and remove methods to support using a Linked List as the backing array. We're going to start with the get method because we'll define a helper method that the addition operation can also take advantage of.

Our new get method is fundamentally the same as our previous get method with the only exception being that we've pushed the actual retrieval logic into it's own method named findEntry. First it tries to find the item. If it does it returns it's value, otherwise it returns null. The new get method looks like:

private Object get(String key) {
    int index = this.getKeyIndex(key);
    HashEntry<String, Object> entry = this.findEntry(values[index], key);
    if(entry != null)
        return entry.getValue();
    return null;
}

The findEntry method simply iterates through the linked list attempting to find an existing item with the given key. If it finds one it returns it, otherwise it returns null:

private HashEntry<String, Object> findEntry(LinkedList<HashEntry<String, Object>> list, String key) {
    if(list == null)
        return null;

    Iterator<HashEntry<String, Object>> iterator = list.iterator();
    while(iterator.hasNext()) {
        HashEntry<String, Object> entry = iterator.next();
        if(entry.getKey().equals(key))
            return entry;
    }
    return null;
}

Our new add method is also the same as our previous add method with the exception of two things. First we've added logic to create the Linked List if it doesn't already exist.  The second change is that we've pushed the actual addition logic into it's own addToList method. This allows the add method to maintain a single responsibility. The new add method looks like:

private void add(String key, Object value) {
    int index = this.getKeyIndex(key);
    LinkedList<HashEntry<String,Object>> list = this.values[index];
    if(list == null) {
        list = new LinkedList<HashEntry<String, Object>>();
        this.values[index] = list;
    }
    this.addValueToList(list, key, value);
}

The addToList method is not very complicated either. It tries to get a handle to the existing object using the findEntry method we've already defined. If it does not find an entry it adds the new key/value pair to the list otherwise it updates the existing entry value. Here's what the addToList method looks like:

private void addValueToList(LinkedList<HashEntry<String, Object>> list, String key, Object value) {
    HashEntry<String, Object> entry = this.findEntry(list, key);
    if(entry == null) {
        list.add(new HashEntry<String, Object>(key, value));
    } else {
        entry.setValue(value);
    }
}

Finally, our new remove method is also able to benefit from our findEntry method. It first tries to find the existing value. If it finds it then it removes it from the list and returns the value of the entry. Otherwise it returns null. The new remove method looks like:

private Object remove(String key) {
    int index = this.getKeyIndex(key);
    LinkedList<HashEntry<String, Object>> list = values[index];
    HashEntry<String, Object> entry = this.findEntry(list, key);

    if(list != null && entry != null) {
        list.remove(entry);
        return entry.getValue();
    }

    return null;
}

Wrap Up

There are a few additional things to consider when creating or using Hash Tables. Concurrency can become an issue when the hash table is used in a multi-threaded environment. If you use a hash table in a multi-threaded environment you'll need to make sure that the implementation you use is thread safe.

Another thing to consider when creating or using Hash Tables is that because the Hash Table relies on a fixed size array as it's backing object store, it may eventually need to grow as more items are added. The cost of performing a resize operation is typically O(N) as each item in the existing array needs to be copied to a new, larger, array. This can be done all at once or incrementally.

Lastly, you'll also need to consider how you are going to handle null keys and/or null values. Some implementations don't allow null keys and/or null values.

Monday, June 9, 2014

Back To The Basics: The Hash Table

A few weeks ago I began a series entitled Back To The Basics. This series is intended to go over various software data structures that are helpful to understand as a software engineer. I started the series with the Binary Tree. The next data structure I'd like to take a look at in this series is the Hash Table.

Hash Table's are very important data structures to understand. At their most basic level they allow for associative array functionality with near constant time addition, deletion, and retrieval. The ability for a Hash Table to provide near constant time operations depends on how good the hashing algorithm is. The goal of the hashing algorithm is to avoid collisions where two different values create the same hash. When a collision is encountered the Hash Table strays farther away from being able to provide constant time operations.

The ability to provide near constant time operations makes the Hash Table ideal for use as a cache or an index as items can be inserted and retrieved with very little overhead.

While most modern programming languages and frameworks provide several optimized implementations of the Hash Table, let's take a look at what it would take to build a Hash Table from scratch which can handle adding, removing, and retrieving values.

Let's first start by defining our HashTable class. The most basic definition of a Hash Table will define the underlying array which will store our values and our constructor will initialize the store. The important thing to pay attention to here is the size of our initial array. We're going to initialize it to a size using a prime number because it will help us evenly distribute values within our backing store. The reason for this will become more apparent when we create our hashing function.

public class HashTable {
private Object[] values;
private final int tableSize = 1543;

public HashTable() {
this.values = new Object[tableSize];
}
}

Now that we have our initial class definition let's start by creating a method which returns the index we should use to store the value for a given key.

private int getKeyIndex(String key) {
    return key.hashCode() % this.values.length;
}

The getKeyIndex method is pretty simple. It first gets a hash code for the key and then returns an index location based on modding the hash code with the length of the array that holds our values. The reason we return a mod is so that we can make sure that the index returned is within the bounds of our array. And because we've used a prime number for our array length we can get better distribution of hashes in our array so that they're not clustered together.

Right now the getKeyIndex method is relatively boring and hides the magic of the actual hashing algorithm because our keys are strings and we're able to rely on the String classes hashCode method. But what if there was no hashCode method on String? How would we go about writing our own?

A good hashing algorithm will try to provide uniform distribution of values in order to avoid collisions which will increase the cost of the operation. The cost is increased because for every collision you have to resolve it somehow. The work it takes to resolve the collision is relative to the number of collisions for that key and the algorithm used to resolve the collisions. One way to resolve the collisions is to store the values in a Linked List and iterate through the list till we find the item with our hash code. This will cause our resolve algorithm to have an O(n) cost where n is the number of collisions for that key.

Now let's build our string hash algorithm. Because strings are made up of individual characters and those characters have an integer value we could create a hashing function that just sums the character values. For example:

private int hash(String key) {
    int hash = 0;
for(int index=0; index < key.length(); index++) {
hash += key.charAt(index);
}
    return hash;
}

This has a problem in that it's likely to cause collisions. For example let's say we wanted to hash the string "dog". We'd start our hash with an initial value of 0. Then we'd iterate through the characters in the string summing them as we go. For dog this would produce a hash of 314 ((d)100 + (o)111 + (g)103). Now let's say we wanted to create a hash of the string "cav". This would also create a hash of 314 ((c)99 + (a)97 + (v)118). This is a collision, which is something we want to avoid.

The good thing is that there's a pretty simple way to help avoid these collisions. If we change our algorithm slightly to include the use of a constant that the character value can be multiplied against then we can avoid these collisions. Notice that I multiply the constant by the current hash value during each iteration. This helps us make sure the hash is unique because the value used for a character is different depending on it's position within the string. This helps us decrease our likelihood of a collision.
  
private int hashCode(String key) {
    int constant = 31;
    int hash = 0;
for(int index=0; index < key.length(); index++) {
hash +=  (constant * hash) + key.charAt(index);
}
    return hash;
}

With our new algorithm dog will hash to 99644 and cav will hash to 98264. Because we've added a constant that is multiplied against the current hash value at each iteration we've increased the likelihood that the resulting hash code is unique.

Note: The above algorithm is loosely based on the hashCode method in the OpenJDK's String class. If you're curious as to why the constant value of 31 was chosen check out Effective Java by Joshua Bloch. This Java Bytes blog post goes into even more depth for the ultra-curious


Monday, June 2, 2014

Back To The Basics: The Binary Tree Deletion

In my previous post Back To The Basics: The Binary Tree Traversing I discussed several common algorithms for traversing a Binary Search Tree. In this post I’d like to turn our attention to one last common algorithm, delete.

The delete algorithm consists of three main scenarios. First, deleting a leaf node. Second deleting a node with only one child. Last, deleting a node with two children.

The first scenario is the easiest scenario to account for. Remember that a leaf node is a node which has no child nodes. So to delete a leaf node requires us simply to null out the value of that node in it's parent.


The second scenario of deleting a node with only one child is also pretty straight forward. You can simply change the pointer that points to the node being deleted to point to the deleted node's child.




The third scenario is the more complicated one. When deleting a node with two children you replace the pointer from the node being deleted to either the minimum value node from the right tree or maximum value node from the left tree and then delete that node from that tree.

 

Here is what that algorithm looks like.

public Tree<T> delete(T data) {
        return this.delete(this, data);
}

private Tree<T> delete(Tree<T> parent, T data) {
        if(parent == null)
                return parent;

        int comparison = data.compareTo(parent.data);
        if(comparison < 0)
                parent.left = this.delete(parent.left, data);
        else if(comparison > 0)
                parent.right = this.delete(parent.right, data);
        else {
                if(parent.right == null)
                        return parent.left;
                if(parent.left == null)
                        return parent.right;

                Tree<T> temp = parent;
                parent = temp.right.minTree();
                parent.right = this.deleteMin(temp.right);
                parent.left = temp.left;
        }

        return parent;
}

private Tree<T> deleteMin(Tree<T> parent) {
        if(parent.left == null)
                return parent.right;
        return this.deleteMin(parent.left);
}