Below is a step-by-step explanation of hash.c
and hash.h
, which together implement a simple case-insensitive hash table for storing and retrieving strings mapped to integer values. The key points are how the hash function is computed, how collisions are handled, and how strings are compared in a case‐insensitive way.
1. Overview of the Hash Table
1.1 Purpose
These files implement a hash table that:
- Stores string keys (case-insensitive) and their associated integer data.
- Retrieves data given a string key.
- Handles collisions with a simple linked-list approach (a “chaining” method).
1.2 Data Structures
hash.h
defines:
#define HTMAXSIZE 1999
#define NOTFOUND -1
struct HTentry {
char* key; // pointer to the string key
int data; // integer data associated with this key
struct HTentry* next; // next item in linked list for collisions
};
typedef struct HTentry* HTtable;
HTMAXSIZE
: The size of the hash table array (1,999 “buckets”).HTentry
: A singly-linked list node storing:key
: the string pointer,data
: the integer value,next
: pointer to the next entry in this bucket.
HTtable
: A pointer to an array of pointers (struct HTentry*
). Each pointer is the head of a linked list for that hash bucket.
2. Hash Table API
hash.h
declares the following functions:
HTtable* HTcreate(void);
Creates a new hash table array of sizeHTMAXSIZE
.int HTinsert(HTtable*, char*, int);
Inserts(key, data)
into the table.int HTfind(HTtable*, const char*);
Finds the integerdata
for a given key. ReturnsNOTFOUND
if absent.char* HTfindKey(HTtable*, const char*);
Finds the actual stored key pointer if the key is found (useful if you want the exact stored string). ReturnsNULL
if absent.void HTfree(HTtable*);
Frees memory used by the table.
3. Implementation Details in hash.c
3.1 Case‐Insensitive String Comparison: samestr(...)
int samestr(const char *s1, const char *s2)
{
int i;
for (i=0; UCHAR(s1[i]) == UCHAR(s2[i]); i++)
if (!s1[i+1] && !s2[i+1]) return(1);
return(0);
}
- Uses
UCHAR(x)
macro, which:
ensures letters#define UCHAR(x) (((x) >= 'a' && (x) <= 'z') ? ((x)&~32) : (x))
a-z
are converted to uppercase by turning off the 32-bit (lowercase) bit, thus making the comparison case-insensitive. - The loop checks if each character is the same in a case-insensitive manner; it stops if they differ or if both strings end simultaneously.
3.2 The Hash Function: hash(const char *str)
unsigned int hash(const char *str)
{
unsigned int sum1= 0, check1;
unsigned long sum2= 0L;
while( '\0' != *str )
{
sum1 += UCHAR(*str);
str++;
if (255 <= sum1) sum1 -= 255;
sum2 += sum1;
}
check1 = sum2;
check1 %= 255;
check1 = 255 - (sum1+check1) % 255;
sum1 = 255 - (sum1+check1) % 255;
return( ( ( check1 << 8 ) | sum1 ) % HTMAXSIZE );
}
- This function applies a Fletcher’s checksum approach (or variation of it) to produce a 2-byte hash from the string.
- Then it takes the result mod
HTMAXSIZE
(1999) so the final hash index is in range[0..1998]
.
3.3 HTcreate()
HTtable *HTcreate()
{
HTtable *ht = (HTtable *) calloc(HTMAXSIZE, sizeof(HTtable));
...
return(ht);
}
- Allocates an array of size
HTMAXSIZE
(1999) ofHTtable
entries (actually pointers tostruct HTentry
). - Initializes each entry to
NULL
.
3.4 HTinsert(...)
int HTinsert(HTtable *ht, char *key, int data)
{
unsigned int i = hash(key);
struct HTentry *entry;
entry = (struct HTentry *) malloc(sizeof(struct HTentry));
...
entry->key = key;
entry->data = data;
entry->next = ht[i];
ht[i] = entry;
return(1);
}
- Computes hash index
i = hash(key)
. - Allocates a new
HTentry
. - Stores
key
,data
, and inserts it at the head of the linked list atht[i]
. - Returns
1
for success.
Note that it does no check for duplicates. If the same key is inserted twice, the new entry is pushed on front of the list.
3.5 HTfind(...)
int HTfind(HTtable *ht, const char *key)
{
unsigned int i = hash(key);
struct HTentry *entry = ht[i];
while (entry != NULL)
{
if ( samestr(entry->key, key) ) return(entry->data);
entry = entry->next;
}
return(NOTFOUND);
}
- Hashes the key to index
i
. - Walks the linked list at
ht[i]
. - Uses
samestr(...)
(case-insensitive) to check if the stored key matches. - If found, returns the stored
data
. - If it reaches the end without finding, returns
NOTFOUND (-1)
.
3.6 HTfindKey(...)
char *HTfindKey(HTtable *ht, const char *key)
{
...
while (entry != NULL)
{
if ( samestr(entry->key, key) ) return(entry->key);
entry = entry->next;
}
return(NULL);
}
- Very similar to
HTfind
, but returns a pointer to the actual stored key string (rather than the data).
3.7 HTfree(...)
void HTfree(HTtable *ht)
{
int i;
struct HTentry *entry, *nextentry;
for (i=0; i<HTMAXSIZE; i++)
{
entry = ht[i];
while (entry != NULL)
{
nextentry = entry->next;
free(entry);
entry = nextentry;
}
}
free(ht);
}
- Walks through each bucket’s linked list, freeing each
HTentry
. - Then frees the entire hash table array.
- Note: this doesn’t free the strings themselves (
entry->key
), because it expects that those might be managed elsewhere.
4. Collision Handling
- The table is a fixed array of size 1999.
- If two different keys hash to the same index, we store the new entry by:
entry->next = ht[i]; ht[i] = entry;
- This is a chaining technique with a singly‐linked list at each array slot.
5. Case-Insensitivity
- The function
samestr(...)
compares characters by converting them to uppercase with the macro:#define UCHAR(x) (((x) >= 'a' && (x) <= 'z') ? ((x)&~32) : (x))
- The hash function also calls
UCHAR(*str)
so that the same string in different case forms yields the same hash index.
Thus, "Rain" and "rain" or "RAIN" produce the same hash index, and samestr(...) deems them equal.
6. Typical Use Example
HTtable* myTable = HTcreate();
// Insert some strings
HTinsert(myTable, "Example", 100);
HTinsert(myTable, "example", 200); // same key ignoring case
int val = HTfind(myTable, "EXAMPLE"); // should return 200 (the newest entry)
if (val != NOTFOUND) printf("Found data = %d\n", val);
// Cleanup
HTfree(myTable);
7. Key Takeaways
- Case-Insensitive: Both the hash calculation and the key comparison ignore case.
- Fletcher’s Checksum: The hash function uses a two‐byte sum, modded by
HTMAXSIZE
. - Linked-List Collision: Each bucket is a singly‐linked list of
HTentry
nodes. - No Duplicate Check: Inserting the same key multiple times just adds a new entry in front.
- Memory: Only the HTentry structures are freed in
HTfree
; the string data is not freed, so the calling code must manage that.
This design is straightforward, but you must keep in mind that it lacks rehashing, duplicate detection, or string memory management. It’s still quite effective and simple for moderate use cases needing case‐insensitive hashing of strings.
No comments:
Post a Comment