Wednesday, December 5, 2012

What is Hashfunction?

A hash function H is a transformation that takes an input m and returns a fixed-size string, which is called the hash value h (that is, h = H(m)). Hash functions with just this property have a variety of general computational uses, but when employed in cryptography, the hash functions are usually chosen to have some additional properties.
The basic requirements for a cryptographic hash function are as follows.
  • The input can be of any length.
  • The output has a fixed length.
  • H(x) is relatively easy to compute for any given x.
  • H(x) is one-way.
  • H(x) is collision-free. 

 
A one-way hash function maps an arbitrary-length input message M to a fixed-length output hash H(M) such that the following properties hold:
  • One-way: Given a hash H(M), it is difficult to find the message M.
     
  • Second preimage resistant: Given a message M1, it is difficult to find another message M2 such that H(M1) = H(M2).
     
  • Collision resistant: It is difficult to find two messages M1 and M2 such that H(M1) = H(M2). 
Hash Algorithm         Output Hash Length (bits)
Message Digest (MD4) -- insecure         128
MD5         128
Secure Hash Algorithm 1 (SHA-1)         160
SHA-256         256
SHA-384         384
SHA-512         512

In detail
"A hash function H is said to be one-way if it is hard to invert, where ``hard to invert'' means that given a hash value h, it is computationally infeasible to find some input x such that H(x) = h. If, given a message x, it is computationally infeasible to find a message y not equal to x such that H(x) = H(y), then H is said to be a weakly collision-free hash function. A strongly collision-free hash function H is one for which it is computationally infeasible to find any two messages x and y such that H(x) = H(y)."

 Perhaps the main role of a cryptographic hash function is in the provision of message integrity checks and digital signatures. Since hash functions are generally faster than encryption or digital signature algorithms, it is typical to compute the digital signature or integrity check to some document by applying cryptographic processing to the document's hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing the contents of the document from which it is derived. This is important in digital timestamping (see Question 7.11) where, using hash functions, one can get a document timestamped without revealing its contents to the timestamping service.

for more info
http://www.rsa.com/rsalabs/node.asp?id=2176
http://www.cs.rit.edu/~ark/lectures/onewayhash/onewayhash.shtml

Saturday, November 24, 2012

Trie data structure implementation in java


The tree data structure is one of the most important data storage mechanism in programming. It's a natural way to represent essential utilities on a computer like the directory structure in a file system.

Many other objects can be stored in a tree data structure resulting in space and/or time efficiency. For example, when we have a huge number of dictionary (and/or non-dictionary) words or strings that we want to store in memory we can use a tree structure to efficiently store the words instead of using a plain Array or Vector type that simply stores each word individually in memory. The space needed to store the words in an Array or Vector is simply the number of words times the average length of the words we need to store.

A Trie, also called a Prefix Tree, is a tree structure that stores words with a common prefix under the same sequence of edges in the tree eliminating the need for storing the same prefix each time for each word. From Wikipedia:
A trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string.
Please read the Wikipedia entry to learn the advantages of a Trie over other data structures in terms of searching and sorting. We will focus on building a Trie in this post.

Today we will build a simple Trie structure that does only two operations - Insert and Search/Find. We will also add a print function for convenience of seeing the tree on screen. For simplicity, let's assume all the words/strings we want to store consist of lowercase letters only.

First, we need a way to represent each Node in the tree. The following simple class represents a Trie node in our case:

class TrieNode
{
    char letter;
    TrieNode[] links;
    boolean fullWord;
  
    TrieNode(char letter, boolean fullWord)
    {
        this.letter = letter;
        links = new TrieNode[26];
        this.fullWord = fullWord;
    }
}

The "letter" member field stores the leading letter in the word starting from the root node the the current node. The "links" field contain references to child nodes that contain the next letter of the child words that share a common prefix (consisting of all the letters beginning from the root node the current node). The "fullWord" field indicates whether the current letter marks the end of a valid word in the dictionary and also at the same time is a valid prefix of one or more other words. For example, the letter "n" in the words "pan" and "pant" both is the ending letter of "pan" and marks a prefix of the word "pant". For simplicity we are statically allocating 26 references to child nodes as that is the maximum number of letters in the English alphabet (lowercase only). It can be optimized by only allocating nodes that have real edge in the tree. But let's not complicate things for optimization now.


Image courtesy: a blogspot blog

In the image above the Trie contains the words an, ant, all, allot, alloy, aloe, are, ate, be. The "grounded" sign indicates a full word ending at this node.

Algorithms for inserting a a word into a Trie:

1. Set current node to root node. The root node does not contain any letter (initialized to the null character for convenience).
2. Set the current letter to the first letter in the word.
3. If the current node already has an existing reference to the current letter (through one of the elements in the "links" field) then set current node to that referenced node; else create a new node, set the letter to current letter, and set current node to this new node.
4. Repeat step 3 until all letters in the current word has been processed.

Algorithm for searching for a word in the Trie:

1. Set current node to root node. Set the current letter to the first letter in the word.
2. If the current node is null then the word does not exist in the Trie.
3. If the current node reference to a valid node containing the current letter then set current node to that referenced node and set current letter to the letter in the new current node.
4. Repeat steps 2 and 3 until all letters in the word has been processed.
5. Now there are two possibilities that may indicate the letter is not there in the tree: a) the current letter is the last letter and there is no valid node containing this letter, and b) there is a valid node containing the last letter but the node does not indicate it contains a full word (marked with the "fullWord" boolean field.
6. If step the conditions in step 5 are not met, then we have a match for the word in the Trie.

At this point you should be able to implement a Trie yourself with the above functionalities. If you still need help then look at the code below (but you better write the code yourself :) ).

Java Code:

public class PrefixTree
{
    static TrieNode createTree()
    {
        return(new TrieNode('\0', false));
    }
   
    static void insertWord(TrieNode root, String word)
    {
        int offset = 97;
        int l = word.length();
        char[] letters = word.toCharArray();
        TrieNode curNode = root;
       
        for (int i = 0; i < l; i++)
        {
            if (curNode.links[letters[i]-offset] == null)
                curNode.links[letters[i]-offset] = new TrieNode(letters[i], i == l-1 ? true : false);
            curNode = curNode.links[letters[i]-offset];
        }
    }

    static boolean find(TrieNode root, String word)
    {
        char[] letters = word.toCharArray();
        int l = letters.length;
        int offset = 97;
        TrieNode curNode = root;
       
        int i;
        for (i = 0; i < l; i++)
        {
            if (curNode == null)
                return false;
            curNode = curNode.links[letters[i]-offset];
        }
       
        if (i == l && curNode == null)
            return false;
       
        if (curNode != null && !curNode.fullWord)
            return false;
       
        return true;
    }
   
    static void printTree(TrieNode root, int level, char[] branch)
    {
        if (root == null)
            return;
       
        for (int i = 0; i < root.links.length; i++)
        {
            branch[level] = root.letter;
            printTree(root.links[i], level+1, branch);   
        }
       
        if (root.fullWord)
        {
            for (int j = 1; j <= level; j++)
                System.out.print(branch[j]);
            System.out.println();
        }
    }
   
    public static void main(String[] args)
    {
        TrieNode tree = createTree();
       
        String[] words = {"an", "ant", "all", "allot", "alloy", "aloe", "are", "ate", "be".};
        for (int i = 0; i < words.length; i++)
            insertWord(tree, words[i]);
       
        char[] branch = new char[50];
        printTree(tree, 0, branch);
       
        String searchWord = "all";
        if (find(tree, searchWord))
        {
            System.out.println("The word was found");
        }
        else
        {
            System.out.println("The word was NOT found");
        }
    }
}

class TrieNode
{
    char letter;
    TrieNode[] links;
    boolean fullWord;
   
    TrieNode(char letter, boolean fullWord)
    {
        this.letter = letter;
        links = new TrieNode[26];
        this.fullWord = fullWord;
    }
}