Saturday, December 28, 2024

SWMM5 hash.c and hash.h Summary

 Below is a step-by-step explanation of hash.c and hash.h, which together implement a simple case-insensitive hash table for storing and retrieving strings mapped to integer values. The key points are how the hash function is computed, how collisions are handled, and how strings are compared in a case‐insensitive way.


1. Overview of the Hash Table

1.1 Purpose

These files implement a hash table that:

  • Stores string keys (case-insensitive) and their associated integer data.
  • Retrieves data given a string key.
  • Handles collisions with a simple linked-list approach (a “chaining” method).

1.2 Data Structures

hash.h defines:

#define HTMAXSIZE 1999
#define NOTFOUND  -1

struct HTentry {
    char* key;            // pointer to the string key
    int   data;           // integer data associated with this key
    struct HTentry* next; // next item in linked list for collisions
};

typedef struct HTentry* HTtable;
  • HTMAXSIZE: The size of the hash table array (1,999 “buckets”).
  • HTentry: A singly-linked list node storing:
    • key: the string pointer,
    • data: the integer value,
    • next: pointer to the next entry in this bucket.
  • HTtable: A pointer to an array of pointers (struct HTentry*). Each pointer is the head of a linked list for that hash bucket.

2. Hash Table API

hash.h declares the following functions:

  1. HTtable* HTcreate(void);
    Creates a new hash table array of size HTMAXSIZE.
  2. int HTinsert(HTtable*, char*, int);
    Inserts (key, data) into the table.
  3. int HTfind(HTtable*, const char*);
    Finds the integer data for a given key. Returns NOTFOUND if absent.
  4. char* HTfindKey(HTtable*, const char*);
    Finds the actual stored key pointer if the key is found (useful if you want the exact stored string). Returns NULL if absent.
  5. void HTfree(HTtable*);
    Frees memory used by the table.

3. Implementation Details in hash.c

3.1 Case‐Insensitive String Comparison: samestr(...)

int samestr(const char *s1, const char *s2)
{
   int i;
   for (i=0; UCHAR(s1[i]) == UCHAR(s2[i]); i++)
     if (!s1[i+1] && !s2[i+1]) return(1);
   return(0);
}
  • Uses UCHAR(x) macro, which:
    #define UCHAR(x) (((x) >= 'a' && (x) <= 'z') ? ((x)&~32) : (x))
    
    ensures letters a-z are converted to uppercase by turning off the 32-bit (lowercase) bit, thus making the comparison case-insensitive.
  • The loop checks if each character is the same in a case-insensitive manner; it stops if they differ or if both strings end simultaneously.

3.2 The Hash Function: hash(const char *str)

unsigned int hash(const char *str)
{
    unsigned int sum1= 0, check1;
    unsigned long sum2= 0L;
    while( '\0' != *str  )
    {
        sum1 += UCHAR(*str);
        str++;
        if (255 <= sum1) sum1 -= 255;
        sum2 += sum1;
    }
    check1 = sum2;
    check1 %= 255;
    check1 = 255 - (sum1+check1) % 255;
    sum1   = 255 - (sum1+check1) % 255;
    return( ( ( check1 << 8 )  |  sum1  ) % HTMAXSIZE );
}
  • This function applies a Fletcher’s checksum approach (or variation of it) to produce a 2-byte hash from the string.
  • Then it takes the result mod HTMAXSIZE (1999) so the final hash index is in range [0..1998].

3.3 HTcreate()

HTtable *HTcreate()
{
    HTtable *ht = (HTtable *) calloc(HTMAXSIZE, sizeof(HTtable));
    ...
    return(ht);
}
  • Allocates an array of size HTMAXSIZE (1999) of HTtable entries (actually pointers to struct HTentry).
  • Initializes each entry to NULL.

3.4 HTinsert(...)

int HTinsert(HTtable *ht, char *key, int data)
{
    unsigned int i = hash(key);
    struct HTentry *entry;
    entry = (struct HTentry *) malloc(sizeof(struct HTentry));
    ...
    entry->key = key;
    entry->data = data;
    entry->next = ht[i];
    ht[i] = entry;
    return(1);
}
  • Computes hash index i = hash(key).
  • Allocates a new HTentry.
  • Stores key, data, and inserts it at the head of the linked list at ht[i].
  • Returns 1 for success.

Note that it does no check for duplicates. If the same key is inserted twice, the new entry is pushed on front of the list.

3.5 HTfind(...)

int HTfind(HTtable *ht, const char *key)
{
    unsigned int i = hash(key);
    struct HTentry *entry = ht[i];
    while (entry != NULL)
    {
        if ( samestr(entry->key, key) ) return(entry->data);
        entry = entry->next;
    }
    return(NOTFOUND);
}
  • Hashes the key to index i.
  • Walks the linked list at ht[i].
  • Uses samestr(...) (case-insensitive) to check if the stored key matches.
  • If found, returns the stored data.
  • If it reaches the end without finding, returns NOTFOUND (-1).

3.6 HTfindKey(...)

char *HTfindKey(HTtable *ht, const char *key)
{
    ...
    while (entry != NULL)
    {
        if ( samestr(entry->key, key) ) return(entry->key);
        entry = entry->next;
    }
    return(NULL);
}
  • Very similar to HTfind, but returns a pointer to the actual stored key string (rather than the data).

3.7 HTfree(...)

void HTfree(HTtable *ht)
{
    int i;
    struct HTentry *entry, *nextentry;
    for (i=0; i<HTMAXSIZE; i++)
    {
        entry = ht[i];
        while (entry != NULL)
        {
            nextentry = entry->next;
            free(entry);
            entry = nextentry;
        }
    }
    free(ht);
}
  • Walks through each bucket’s linked list, freeing each HTentry.
  • Then frees the entire hash table array.
  • Note: this doesn’t free the strings themselves (entry->key), because it expects that those might be managed elsewhere.

4. Collision Handling

  1. The table is a fixed array of size 1999.
  2. If two different keys hash to the same index, we store the new entry by:
    entry->next = ht[i];
    ht[i] = entry;
    
  3. This is a chaining technique with a singly‐linked list at each array slot.

5. Case-Insensitivity

  • The function samestr(...) compares characters by converting them to uppercase with the macro:
    #define UCHAR(x) (((x) >= 'a' && (x) <= 'z') ? ((x)&~32) : (x))
    
  • The hash function also calls UCHAR(*str) so that the same string in different case forms yields the same hash index.

Thus, "Rain" and "rain" or "RAIN" produce the same hash index, and samestr(...) deems them equal.


6. Typical Use Example

HTtable* myTable = HTcreate();

// Insert some strings
HTinsert(myTable, "Example", 100);
HTinsert(myTable, "example", 200);  // same key ignoring case

int val = HTfind(myTable, "EXAMPLE"); // should return 200 (the newest entry)
if (val != NOTFOUND) printf("Found data = %d\n", val);

// Cleanup
HTfree(myTable);

7. Key Takeaways

  1. Case-Insensitive: Both the hash calculation and the key comparison ignore case.
  2. Fletcher’s Checksum: The hash function uses a two‐byte sum, modded by HTMAXSIZE.
  3. Linked-List Collision: Each bucket is a singly‐linked list of HTentry nodes.
  4. No Duplicate Check: Inserting the same key multiple times just adds a new entry in front.
  5. Memory: Only the HTentry structures are freed in HTfree; the string data is not freed, so the calling code must manage that.

This design is straightforward, but you must keep in mind that it lacks rehashing, duplicate detection, or string memory management. It’s still quite effective and simple for moderate use cases needing case‐insensitive hashing of strings.

No comments:

A comprehensive explanation of how minimum travel distance relates to link length in InfoSewer

In hydraulic modeling of sewer networks, the minimum travel distance is a fundamental parameter that affects how accurately the model can si...