LeetCode (1): Hash Tables - Patterns, Pitfalls & Three Classic Problems Deep Dive

Chen Kai BOSS

2023-03-25 00:00:00 2023-03-25 00:00

Algorithm

9.3k Words 58 Mins

Hash tables are one of the highest ROI data structures in both interviews and real systems: by trading a small amount of memory overhead for fast membership queries and lookups, you can transform "scan all elements" solutions into near-linear time workflows. This guide builds reusable hash table thinking patterns through three LeetCode classics —Two Sum, Longest Consecutive Sequence, and Group Anagrams— to teach you what to store, how to design keys, and how to avoid common boundary-case bugs. We'll also cover advanced patterns (complement search, frequency counting, sliding window), performance comparisons, interview tips, and a debugging checklist.

📚 LeetCode Algorithm Masterclass Series (10 Parts): 1. → Hash Tables (Two Sum, Longest Consecutive, Group Anagrams) ← You are here 2. Two Pointers Techniques (Collision pointers, fast-slow, sliding window) 3. Linked List Operations (Reverse, cycle detection, merge) 4. Binary Tree Traversal & Recursion (Inorder/Preorder/Postorder, LCA) 5. Dynamic Programming Intro (1D/2D DP, state transition) 6. Backtracking Algorithms (Permutations, combinations, pruning) 7. Binary Search Advanced (Integer/Real binary search, answer binary search) 8. Stack & Queue (Monotonic stack, priority queue, deque) 9. Graph Algorithms (BFS/DFS, topological sort, union-find) 10. Greedy & Bit Manipulation (Greedy strategies, bitwise tricks)

Quick Refresher: What Hash Tables Provide

For a good hash function and typical workload, hash tables offer:

Insertion: Average
Lookup: Average
Deletion: AverageWorst-case can degrade (e.g., adversarial collisions), but for interview problems and most real use cases, the average case is what matters.

In Python:

dict is a hash map (key → value)
set is a hash set (membership only)

In Java:

HashMap<K,V> for key-value pairs
HashSet<E> for unique elements

In C++:

unordered_map<K,V> for key-value
unordered_set<T> for membership

LeetCode 1: Two Sum

Problem Analysis

Core Challenge: The key difficulty lies in finding two numbers that sum to the target value in a single pass. A brute-force approach requirestime to check all pairs, which is inefficient for large datasets.

Solution Approach: Use a hash table to store "visited value → index" mappings. As we traverse the array, for each current element num, the partner we need is target - num (the complement). If this complement is already in the hash table, we've found the answer; otherwise, store the current value for future lookups. This "space-for-time" trade-off reduces time complexity fromto.

Complexity Analysis: - Time Complexity: - Single pass through the array, hash table lookup and insertion are average - Space Complexity: - Worst case requires storingelements in the hash table

Key Insight: The essence of the problem is "finding pairs", and hash tables provideaverage lookup time, perfectly suited for this need. Key: understanding the "complement search" pattern: we're not looking for num itself, but for target - num.

Problem Statement

Given an array of integers nums and an integer target, return the indices of the two numbers such that they add up to target. Each input has exactly one solution, and you may not use the same element twice.

Example:

Input: nums = [2,7,11,15], target = 9
Output: [0,1] (because nums[0] + nums[1] = 9)

Constraints:

- - - Only one valid answer exists

Brute Force Approach (For Comparison)

Naive solution: Check all pairs

def twoSum_bruteforce(nums, target):
    n = len(nums)
    for i in range(n):
        for j in range(i+1, n):
            if nums[i] + nums[j] == target:
                return [i, j]
    return []

Complexity:time,space

Why it's bad: For nums.length = 10,000, that's 50 million operations!

Hash Table Solution: The Complement Pattern

Core insight: When scanning left to right, maintain a mapping:

1	value → index

When you see num, the partner you need is target - num (the complement). If it's already in the map, you're done; otherwise, store the current value and index.

Python Implementation

from typing import List

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        # Hash table: stores value → index mapping
        # This enables O(1) lookup to check if a value has appeared
        seen = {}  # value → index
        
        # Traverse the array once to find the answer
        for i, num in enumerate(nums):
            # Calculate the complement needed for current element
            # If num + complement = target, then complement = target - num
            complement = target - num
            
            # Check if complement is already in hash table
            # If yes, we found the pair, return immediately
            if complement in seen:
                # Return [earlier index, current index]
                # seen[complement] is the first occurrence index of complement
                return [seen[complement], i]
            
            # If complement not in hash table, store current value
            # Note: We store AFTER checking to avoid using same element twice
            seen[num] = i
        
        # According to constraints, there should always be a solution
        # Return empty list as defensive programming
        return []  # No solution (shouldn't happen per constraints)

Algorithm Visualization: The animation below demonstrates the complete process of how the algorithm finds pairs using complement search:

Java Implementation

class Solution {
    public int[] twoSum(int[] nums, int target) {
        Map<Integer, Integer> seen = new HashMap<>();
        for (int i = 0; i < nums.length; i++) {
            int complement = target - nums[i];
            if (seen.containsKey(complement)) {
                return new int[]{seen.get(complement), i};
            }
            seen.put(nums[i], i);
        }
        return new int[]{};
    }
}

Why This Works

One-pass logic: 1. At index, we check if target - nums[i] was seen before 2. If yes, we found the pair immediately 3. If no, store nums[i] → i for future lookups

Time evolution (for nums = [2, 7, 11, 15], target = 9):

Step	`i`	`num`	`complement`	`seen`	Action
1	0	2	7	`{}`	Not found, store `{2: 0}`
2	1	7	2	`{2: 0}`	Found! Return `[0, 1]`

Complexity Analysis

Time: (single pass)
Space: (hash map stores up toentries)

Space-time tradeoff: We sacrificememory to reduce time fromto.

Deep Dive

Algorithm Principle: Why Does This Algorithm Work?

The core of the hash table solution lies in the complement search pattern. When we see element nums[i], we're not looking for it itself, but for its "other half" — target - nums[i]. If this complement has appeared before (stored in the hash table), we've found the complete pair.

Mathematical Proof: Suppose a solution (i, j) exists such that nums[i] + nums[j] = target, with i < j. When the algorithm reaches index j: - nums[i] has already been processed and stored as seen[nums[i]] = i - Current element is nums[j], complement is complement = target - nums[j] - Since nums[i] + nums[j] = target, we have complement = nums[i] - nums[i] is in seen, algorithm returns [i, j] ✅

Time Complexity Proof: Why O(n)?

Outer loop: Traverse array once,
Hash table operations: Each in check and = assignment are average
Total time: Worst case: If all elements hash to the same bucket (adversarial input), hash table operations degrade to, making total time. However, this is extremely rare in practice, and average case is what matters.

Space Optimization: Is There a More Space-Efficient Method?

For this problem, no. Reasons: 1. Need to return indices: Cannot sort (would destroy indices), must store index information 2. Unsorted array: Cannot use two pointers (requires sorting) 3. Single pass requirement: Must store information about visited elements

Variant Extension: If the problem asked for "return the two values" (no indices needed), we could: - Sort first, then use two pointers:time,space - Or still use hash table:time,space

Common Mistakes Analysis

Mistake Type	Wrong Code	Problem	Correct Approach
Check before insert	`seen[num] = i; if complement in seen`	May use same element twice	Check first, then insert
Return order	`return [i, seen[complement]]`	Wrong index order	`return [seen[complement], i]`
Sort destroys indices	Sort then two pointers	Lose original indices	Use hash table or store `(value, index)` tuples

Variant Extensions

Three Sum: Fix first number, use two sum on remaining part
Four Sum: Fix first two numbers, use two sum on remaining part
Two Sum II (Sorted Array): Can use two pointers,space
Two Sum (BST): Use inorder traversal + two pointers

Common Pitfalls & Edge Cases

Pitfall 1: Duplicate Values

Case: nums = [3, 3], target = 6

Question: Does storing value → index work if values repeat?

Answer: Yes, because we check before overwriting:

# First iteration: i=0, num=3
complement = 6 - 3 = 3  # Not in seen yet
seen[3] = 0  # Store {3: 0}

# Second iteration: i=1, num=3
complement = 6 - 3 = 3  # Found in seen!
return [seen[3], 1]  # [0, 1]

Pitfall 2: "Same Element Twice" Misunderstanding

Constraint says: "You may not use the same element twice"

Confusion: Does this mean the same value or the same index?

Clarification: It means the same index. So [3, 3] with target = 6 is valid (different indices).

Pitfall 3: Don't Sort!

Temptation: "Let me sort first, then use two pointers"

Problem: Sorting destroys original indices, but the problem asks for original indices!

Solution: If you must use two-pointer approach, store (value, original_index) tuples and sort by value.

Pitfall 4: Off-by-One in Return Order

Mistake:

1	return [i, seen[complement]] # Wrong order!

Correct:

1	return [seen[complement], i] # Earlier index first

Real-World Analogy

E-commerce: "I have$100. Can these two items fit my budget?"
Trading: "Do any two orders in this stream cancel out to the target exposure?"
Chemistry: "Which two reactants combine to this molecular weight?"

LeetCode 128: Longest Consecutive Sequence

Problem Analysis

Core Challenge: The key difficulty is finding the longest consecutive sequence in time without using sorting (which requires). Sequence elements don't need to be consecutive in the original array, suggesting we need fast existence checks.

Solution Approach: Use a hash set to store all numbers, enablingmembership queries. The key insight is: only count from sequence starts. A numberis a sequence start if and only ifis not in the set. For each sequence start, we expand rightward counting consecutive numbers. This ensures each number is visited at most twice (once as a start candidate, once as part of a sequence), guaranteeingtime complexity.

Complexity Analysis: - Time Complexity: - Although there's an inner while loop, each number is visited at most twice - Space Complexity: - Hash set stores all numbers

Key Insight: Avoiding duplicate counting is the core optimization. If we count from every number, we'd count the same sequence multiple times. By counting only from starts, each sequence is counted exactly once.

Problem Statement

Given an unsorted array of integers, find the length of the longest consecutive elements sequence. Your algorithm must run intime.

Example:

Input: nums = [100, 4, 200, 1, 3, 2]
Output: 4 (sequence is [1, 2, 3, 4])

Constraints:

- -

Why Not Just Sort?

Naive approach: Sort then scan for runs

def longestConsecutive_sort(nums):
    if not nums:
        return 0
    nums.sort()
    longest = 1
    current = 1
    for i in range(1, len(nums)):
        if nums[i] == nums[i-1]:
            continue
        elif nums[i] == nums[i-1] + 1:
            current += 1
        else:
            longest = max(longest, current)
            current = 1
    return max(longest, current)

Complexity:(sorting dominates)

Problem: Violates theconstraint!

Hash Set Solution: Start from Sequence Beginnings

Key insight: A numberis a sequence start if and only ifis not in the set. Only count from sequence starts; otherwise, you'd count the same sequence multiple times.

Algorithm

Put all numbers in a set formembership queries
For each number:
- Ifis in the set, skip (not a start)
- Ifis not in the set, count consecutive numbers

Python Implementation

def longestConsecutive(nums):
    # Edge case: empty array returns 0
    if not nums:
        return 0
    
    # Convert array to set for O(1) membership queries
    # Set automatically removes duplicates, but this doesn't affect the algorithm
    num_set = set(nums)
    longest = 0  # Track longest sequence length
    
    # Iterate through each number in the set
    for num in num_set:
        # Key optimization: only count from sequence starts
        # A number is a sequence start if and only if num-1 is not in the set
        # This avoids counting the same sequence multiple times
        if num - 1 not in num_set:
            current_num = num      # Starting number of current sequence
            current_length = 1    # Current sequence length (at least num itself)
            
            # Expand rightward, counting consecutive numbers
            # Continue as long as current_num + 1 is in the set
            while current_num + 1 in num_set:
                current_num += 1      # Move to next number
                current_length += 1   # Increment sequence length
            
            # Update longest sequence length
            longest = max(longest, current_length)
    
    return longest

Why Is This?

Misconception: "The inner while loop makes it"

Reality: Each number is visited at most twice: 1. Once in the outer loop 2. Once as part of a sequence (inner loop)

Proof: The inner while only runs when num - 1 is not in the set. So each sequenceis scanned exactly once (starting from).

Total operations: (outer) + (inner across all sequences) =

Algorithm Visualization: The animation below demonstrates how the algorithm identifies sequence starts and expands to count consecutive numbers:

Complexity Analysis

Time: (see proof above)
Space: (hash set)

Deep Dive

Algorithm Principle: Why Count Only from Starts?

Key Insight: If we count from every number, we'd count the same sequence multiple times. For example, for sequence [1, 2, 3, 4]: - Counting from 1: count 1, 2, 3, 4 → length 4 - Counting from 2: count 2, 3, 4 → length 3 (duplicate) - Counting from 3: count 3, 4 → length 2 (duplicate) - Counting from 4: count 4 → length 1 (duplicate)

By counting only from starts (num-1 not in set), each sequence is counted exactly once.

Time Complexity Proof: Why O(n)?

Misconception: "The inner while loop makes it"

Reality: Each number is visited at most twice: 1. Once in outer loop: Checked as num whether it's a start 2. Once in inner loop: Counted as part of some sequence

Proof: - Outer loop visits allnumbers: - Inner while only runs when num-1 is not in the set - Each sequenceis scanned exactly once (starting from) - Total length of all sequences ≤(each number belongs to at most one sequence) - Total inner loop operations: Total complexity:✅

Space Optimization: Is There a More Space-Efficient Method?

Method 1: Use Array as Hash Table (Only for Small Range) If number range is known and small (e.g., [0, 1000]), use boolean array:

# Only works for small number ranges
max_num = max(nums)
min_num = min(nums)
present = [False] * (max_num - min_num + 1)
# Space: O(range size), may be larger or smaller than O(n)

Method 2: In-place Marking (Requires Modifying Input) If input array can be modified, use sign marking:

1 2	# Mark visited numbers as negative (requires extra handling) # Space: O(1) extra space, but destroys input

Trade-off: For general cases, hash set'sspace is optimal because: - Number range can be huge (to) - Cannot modify input array - Need fast queries ()

Common Mistakes Analysis

Mistake Type	Wrong Code	Problem	Correct Approach
Count from every number	`for num in nums: while num+1 in set`	Duplicate counting,	Count only from starts
Use list instead of set	`num_list = list(nums)`	Lookup is	Use `set()`
Forget deduplication	Iterate `nums` directly	Duplicate elements cause duplicate counting	Iterate `set(nums)`

Variant Extensions

Longest Consecutive Increasing Sequence (must be consecutive in array): Use dynamic programming
Longest Consecutive Sequence (Allow Duplicates): Need to count frequencies
Longest Consecutive Sequence (2D): Extend to 2D arrays

Example Walkthrough

Input: nums = [100, 4, 200, 1, 3, 2]

Step 1: Build set {100, 4, 200, 1, 3, 2}

Step 2: Iterate through set

`num`	`num-1` in set?	Action	`current_length`
100	No (99 not in set)	Start sequence	Count `100` → length 1
4	Yes (3 in set)	Skip	-
200	No (199 not in set)	Start sequence	Count `200` → length 1
1	No (0 not in set)	Start sequence	Count `1,2,3,4` → length 4
3	Yes (2 in set)	Skip	-
2	Yes (1 in set)	Skip	-

Output: 4

Common Edge Cases

Edge Case 1: Empty Array

1 2	nums = [] # Output: 0

Edge Case 2: All Duplicates

1 2	nums = [1, 1, 1, 1] # Set becomes {1}, output: 1

Edge Case 3: No Consecutive Numbers

1 2	nums = [1, 3, 5, 7, 9] # Each number is its own sequence, output: 1

Edge Case 4: Entire Array is Consecutive

1 2	nums = [5, 1, 3, 4, 2] # Sequence [1,2,3,4,5], output: 5

Edge Case 5: Negative Numbers

1 2	nums = [-1, -2, 0, 1, 2] # Sequence [-2,-1,0,1,2], output: 5

Real-World Applications

Version control: Find longest sequence of consecutive commits
Gaming: Detect longest win streak
Time series: Find longest period without missing data

LeetCode 49: Group Anagrams

Problem Analysis

Core Challenge: The key difficulty lies in designing a hash key such that all anagrams map to the same key, while non-anagrams map to different keys. Anagrams have the same characters but different orders, so we need a canonical form to represent them uniformly.

Solution Approach: There are three main methods to design hash keys: 1. Sorted String: Use sorted string as key ("eat" → "aet") 2. Character Count Tuple: Count frequency of each character, convert to tuple as key 3. Fixed-Size Array: For lowercase English letters (26), use length-26 count array

Complexity Analysis: - Method 1 (Sort):time,space - Method 2 (Count + Sort):time,space - Method 3 (Fixed Array):time,space (fastest)

Key Insight: Choosing the hash key is the core of grouping problems. Sorting is simple and general, but character counting is faster because counting iswhile sorting is.

Problem Statement

Given an array of strings, group the anagrams together. The order of groups doesn't matter.

Example:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Constraints:

- - - strs[i] consists of lowercase English letters

Key Insight: What Makes a Good Hash Key?

Anagrams have the same characters but different orders. We need a canonical form that's identical for all anagrams.

Option 1: Sorted String as Key

from collections import defaultdict

def groupAnagrams(strs):
    # Use defaultdict to avoid manual empty list initialization
    groups = defaultdict(list)
    
    # Iterate through each string
    for s in strs:
        # Sort string and use as hash key
        # Anagrams sort to the same string
        # Example: "eat" → "aet", "tea" → "aet", "ate" → "aet"
        key = ''.join(sorted(s))  # Canonical form
        
        # Add current string to corresponding group
        groups[key].append(s)
    
    # Return all groups (list of lists)
    return list(groups.values())

Example:

"eat" → sorted → "aet"
"tea" → sorted → "aet"
"ate" → sorted → "aet"

All map to the same key!

Complexity:

Time:where,
Space:

Option 2: Character Count as Key (Faster!)

Instead of sorting, count character frequencies:

from collections import defaultdict, Counter

def groupAnagrams_optimized(strs):
    groups = defaultdict(list)
    for s in strs:
        # Count characters: {'e':1, 'a':1, 't':1}
        count = Counter(s)
        # Convert to tuple (hashable): (('a',1), ('e',1), ('t',1))
        key = tuple(sorted(count.items()))
        groups[key].append(s)
    return list(groups.values())

Complexity:

Time:(counting is, sorting 26 letters is)
Space: Faster because: Countingcharacters is, but sorting is.

Option 3: Fixed-Size Array as Key (Best for Lowercase English)

For lowercase English letters (26 total), use a fixed-size count array:

def groupAnagrams_fastest(strs):
    groups = defaultdict(list)
    
    for s in strs:
        # Create count array of length 26
        # Index 0 corresponds to 'a', index 25 to 'z'
        count = [0] * 26
        
        # Count frequency of each character in string
        for c in s:
            # ord(c) - ord('a') converts character to 0-25 index
            # Example: 'a' → 0, 'b' → 1, 'z' → 25
            count[ord(c) - ord('a')] += 1
        
        # Convert list to tuple as hash key (lists are not hashable)
        # Anagrams will have the same count array
        key = tuple(count)
        
        # Add string to corresponding group
        groups[key].append(s)
    
    return list(groups.values())

Complexity:

Time:
Space: Why fastest: No sorting at all, just one pass through each string.

Deep Dive

Algorithm Principle: Why Do These Methods Work?

Core Idea: The essence of anagrams is that they have the same character frequencies but different orders. Therefore, we need an order-independent representation as the hash key.

Method 1 (Sort): - After sorting, all anagrams become the same string - "eat", "tea", "ate" all sort to "aet" - Time complexity: Sorting each string requires Method 2 (Character Count): - Count frequency of each character; anagrams have the same frequency distribution - "eat" → {'e':1, 'a':1, 't':1} → (('a',1), ('e',1), ('t',1)) - Time complexity: Counting is, sorting 26 letters is(constant)

Method 3 (Fixed Array): - Use fixed-size array where index corresponds to character position - "eat" → [1,0,0,...,1,0,1] (a=1, e=1, t=1) - Time complexity: Only one pass needed,

Time Complexity Proof: Why Is Method 3 Fastest?

Method 1: - Sort each string: -strings: Method 2: - Count: - Sort count items (at most 26): -strings: Method 3: - Count:(one pass) - No sorting: -strings: Conclusion: Method 3 is fastest because it avoids sorting overhead.

Space Optimization: Is There a More Space-Efficient Method?

Optimization 1: Use String as Key (Method 1) - Space:(store sorted strings) - Pros: Keys are readable, easy to debug

Optimization 2: Use Tuple as Key (Method 2, 3) - Space:(store tuples) - Pros: More compact, but less readable

Trade-off: For lowercase English letters, Method 3 is optimal: - Fixed 26 integers, small space overhead - No sorting, fastest time - Clean code

Common Mistakes Analysis

Mistake Type	Wrong Code	Problem	Correct Approach
Use list as key	`groups[count].append(s)`	Lists are not hashable	Use `tuple(count)`
Forget to sort count items	`key = tuple(count.items())`	Order may differ	`tuple(sorted(count.items()))`
Index calculation error	`count[ord(c)]`	Index out of range	`count[ord(c) - ord('a')]`

Variant Extensions

Anagram Detection: Check if two strings are anagrams

1 2	# Method: Compare sorted strings or character counts return sorted(s1) == sorted(s2)

Find All Anagrams: Find all anagrams of a pattern in a string
- Use sliding window + character counting
- Time complexity:
Group Anagrams (Case Sensitive):
- Need to distinguish case, extend character set to 52

Complexity Comparison

Method	Time	Space	Notes
Sorted string			Simple, general
Counter + sort			Faster, works for any charset
Fixed array			Fastest, limited to lowercase English

Example Walkthrough

Input: strs = ["eat", "tea", "tan", "ate", "nat", "bat"]

Using sorted string keys:

String	Sorted Key	Group
`"eat"`	`"aet"`	Group A
`"tea"`	`"aet"`	Group A
`"tan"`	`"ant"`	Group B
`"ate"`	`"aet"`	Group A
`"nat"`	`"ant"`	Group B
`"bat"`	`"abt"`	Group C

Output: [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Edge Cases

Edge Case 1: Empty String

1 2	strs = [""] # Output: [[""]]

Edge Case 2: Single Character

1 2	strs = ["a"] # Output: [["a"]]

Edge Case 3: No Anagrams

1 2	strs = ["abc", "def", "ghi"] # Output: [["abc"], ["def"], ["ghi"]]

Edge Case 4: All Anagrams

1 2	strs = ["abc", "bca", "cab"] # Output: [["abc", "bca", "cab"]]

Real-World Applications

Spell checking: Group typos by letter composition
Genomics: Cluster DNA sequences with same nucleotide counts
Data deduplication: Find records with same attributes (different order)

Advanced Hash Table Patterns

Pattern 1: Complement Search (Two Sum Variant)

Use when: You need to find pairs satisfying a condition

Examples:

Three Sum (extend to triplets)
Four Sum
Count pairs with given difference

Pattern 2: Frequency Counting

Use when: You need to track occurrences

Examples:

Top K Frequent Elements
First Unique Character
Valid Anagram (simpler than sorting)

from collections import Counter

def isAnagram(s, t):
    return Counter(s) == Counter(t)

Pattern 3: Sliding Window + Hash Map

Use when: Fixed-size window with tracking

Example: Longest Substring with At Most K Distinct Characters

def lengthOfLongestSubstringKDistinct(s, k):
    from collections import defaultdict
    left = 0
    char_count = defaultdict(int)
    max_len = 0
    
    for right in range(len(s)):
        char_count[s[right]] += 1
        
        # Shrink window if too many distinct chars
        while len(char_count) > k:
            char_count[s[left]] -= 1
            if char_count[s[left]] == 0:
                del char_count[s[left]]
            left += 1
        
        max_len = max(max_len, right - left + 1)
    
    return max_len

Pattern 4: Prefix Sum with Hash Map

Use when: You need subarray sums

Example: Subarray Sum Equals K

def subarraySum(nums, k):
    from collections import defaultdict
    prefix_sum = 0
    count = 0
    sum_count = defaultdict(int)
    sum_count[0] = 1  # Base case
    
    for num in nums:
        prefix_sum += num
        # Check if (prefix_sum - k) exists
        if prefix_sum - k in sum_count:
            count += sum_count[prefix_sum - k]
        sum_count[prefix_sum] += 1
    
    return count

Hash Table Internals (Bonus)

How Python `dict` Works

Under the hood: 1. Hash function: Converts key to integer hash(key) 2. Bucket selection: index = hash(key) % table_size 3. Collision handling: Python uses open addressing with probing

Load factor: When num_entries / table_size > 2/3, Python resizes the table (doubles size).

Collision Resolution

Open Addressing (Python)

When bucket is occupied, probe next slots:

Linear probing: Try index+1, index+2, ...
Quadratic probing: Try index+1^2, index+2^2, ...

Chaining (Java `HashMap`)

Each bucket stores a linked list (or tree if too long) of colliding entries.

Hash Function Quality

Good hash function:

Deterministic (same key → same hash)
Uniform distribution (avoids clustering)
Fast to compute

Python's hash():

Integers: hash(x) = x (for small)
Strings: Uses SipHash (cryptographic quality)
Custom objects: Override __hash__ and __eq__

Common Mistakes & Debugging Checklist

Mistake 1: Forgetting to Check Before Inserting

Wrong:

1
2
3

seen[num] = i
if target - num in seen:
    return [seen[target - num], i]

Right: Check before inserting (to avoid self-pairing)

1
2
3

if target - num in seen:
    return [seen[target - num], i]
seen[num] = i

Mistake 2: Using Unhashable Keys

Error:

1 2	key = [1, 2, 3] # List is not hashable! groups[key].append(value) # TypeError

Fix: Convert to tuple

1	key = tuple([1, 2, 3])

Mistake 3: Mutating Hash Keys

Danger:

1
2
3

key = [1, 2]
groups[tuple(key)] = "value"
key.append(3)  # Don't mutate after hashing!

Rule: Treat hash keys as immutable.

Mistake 4: Off-by-One in Index Calculations

Double-check:

Are you returning [i, j] or [j, i]?
Does the problem ask for 0-indexed or 1-indexed output?

Debugging Checklist

✅ Before coding: 1. What should the key be? (value, count, sorted form?) 2. What should the value be? (index, list, count?) 3. Do I need dict or set?

✅ After coding: 1. Test with empty input 2. Test with single element 3. Test with duplicates 4. Test with all unique elements 5. Print hash table contents at each step (for small inputs)

Interview Tips

Tip 1: Verbalize Your Thought Process

Template: > "I notice we're looking for pairs/groups/matches, which suggests a hash table. I'll store [X] as keys and [Y] as values. The lookup will be [Z]."

Tip 2: Start with Brute Force

Even if you know the hash table solution, say: > "The brute force iswith nested loops. We can optimize tousing a hash map."

Tip 3: Clarify Constraints

Ask:

Can the array have duplicates?
Is the array sorted?
What's the expected size? (Affects space complexity concerns)
Are there negative numbers?

Tip 4: Analyze Trade-offs

After your solution, mention: > "This tradesspace fortime. If memory is tight, we could fall back to thesorting approach."

Tip 5: Handle Edge Cases Out Loud

Walk through:

Empty array
Single element
All duplicates
All unique

Time-Saving Tricks

Trick 1: Use `collections.Counter`

Instead of:

1
2
3

freq = {}
for x in nums:
    freq[x] = freq.get(x, 0) + 1

Write:

1 2	from collections import Counter freq = Counter(nums)

Trick 2: Use `collections.defaultdict`

Instead of:

groups = {}
for item in items:
    if key not in groups:
        groups[key] = []
    groups[key].append(item)

Write:

from collections import defaultdict
groups = defaultdict(list)
for item in items:
    groups[key].append(item)

Trick 3: Use `enumerate` for Index+Value

Instead of:

1 2	for i in range(len(nums)): num = nums[i]

Write:

1	for i, num in enumerate(nums):

Performance Comparison: Hash Table vs Alternatives

Problem	Brute Force	Sorting	Hash Table	Winner
Two Sum				Hash
Longest Consecutive				Hash
Group Anagrams				Hash
Find Duplicates				Hash

When sorting wins:

Need sorted output anyway
Space is extremely constrained (required)
Data already partially sorted

Practice Problems (10 Recommended)

Easy

Contains Duplicate (LeetCode 217)
Valid Anagram (LeetCode 242)
Intersection of Two Arrays (LeetCode 349)

Medium

Group Anagrams (LeetCode 49) ← Covered in this guide
Top K Frequent Elements (LeetCode 347)
Subarray Sum Equals K (LeetCode 560)
Longest Substring Without Repeating Characters (LeetCode 3)

Hard

Longest Consecutive Sequence (LeetCode 128) ← Covered in this guide
Substring with Concatenation of All Words (LeetCode 30)
First Missing Positive (LeetCode 41) ← Tricky: use array as hash table!

Summary: Hash Table in One Page

When to use:

Need fast lookups (average)
Finding pairs/complements
Grouping by key
Frequency counting
Checking membership

Common patterns: 1. Complement search: Store seen values, check for target - current 2. Canonical form: Group by sorted/normalized key 3. Frequency map: Count occurrences with Counter or defaultdict(int) 4. Sliding window: Track state in fixed-size window

Pitfalls to avoid:

Check before inserting (avoid self-pairs)
Use immutable keys (tuples, not lists)
Handle duplicates correctly
Don't forget edge cases (empty, single element)

Interview template: 1. Identify need for fast lookup 2. Decide key and value types 3. Walk through example 4. Code with dict/set 5. Analyze complexity 6. Test edge cases

Memory cheat: > Hash tables trade space for time:memory forlookups. Perfect for "find pair/group/match" problems.

What's Next?

In Part 2 (Two Pointers Techniques), we'll explore:

Collision pointers: Squeeze from both ends (Container With Most Water)
Fast-slow pointers: Detect cycles (Floyd's algorithm)
Sliding window: Dynamic window sizing (Longest Substring)
When to use hash table vs. two pointers

Preview question: How would you solve Three Sum without usingspace? See you in Part 2!

❓ Q&A: Hash Table Interview Tips

Q1: When should I use a hash table instead of an array for lookups?

Answer: Use a hash table when you need non-sequential key access or when keys are sparse (not consecutive integers starting from 0).

Array advantages:

Direct indexing: arr[i] isguaranteed
Better cache locality (contiguous memory)
Lower memory overhead (no hash function, no collision handling)
Predictable performance (no worst-case degradation)

Hash table advantages:

Flexible keys (strings, tuples, custom objects)
Sparse keys without wasting memory
Dynamic resizing without reindexing

Comparison table:

Scenario	Array	Hash Table	Winner
Consecutive integer keys [0..n-1]	direct access	average	Array (simpler, faster)
Sparse integer keys [1, 100, 1000]	but wastes memory	average	Hash Table (space efficient)
String keys	Not possible	average	Hash Table (only option)
Fixed-size, known range	guaranteed	average	Array (predictable)
Unknown/dynamic keys	Not practical	average	Hash Table (flexible)

Example:

# Array: Perfect for consecutive indices
scores = [85, 90, 78, 92]  # Student 0, 1, 2, 3
print(scores[2])  # O(1), direct access

# Hash table: Perfect for sparse/non-integer keys
student_scores = {"Alice": 85, "Bob": 90, "Charlie": 78}
print(student_scores["Bob"])  # O(1) average, flexible keys

Interview tip: If the problem mentions "indices" or "positions", consider arrays first. If it mentions "values" or "identifiers", hash tables are likely better.

Q2: What are the different collision handling strategies, and when should I use each?

Answer: There are two main approaches: chaining and open addressing. Each has trade-offs.

Chaining (Separate Chaining):

Each bucket stores a linked list (or tree) of colliding entries
Used by: Java HashMap, C++ std::unordered_map (default)
Pros: Simple, handles high load factors well, no clustering
Cons: Extra memory for pointers, cache misses from linked lists

# Conceptual representation
buckets = [
    [("key1", value1), ("key2", value2)],  # Bucket 0: collision chain
    [("key3", value3)],                    # Bucket 1: single entry
    [],                                     # Bucket 2: empty
]

Open Addressing:

When collision occurs, probe next available slot
Used by: Python dict, Go map
Probing methods:
- Linear probing: (hash(key) + i) % size for - Quadratic probing: (hash(key) + i^2) % size
- Double hashing: (hash1(key) + i * hash2(key)) % size
Pros: Better cache locality, no extra pointers
Cons: Performance degrades with high load factor, clustering issues

Comparison:

Aspect	Chaining	Open Addressing
Load factor tolerance	High (0.8-1.0)	Lower (0.5-0.7)
Memory overhead	Higher (pointers)	Lower (no pointers)
Cache performance	Worse (scattered)	Better (contiguous)
Deletion complexity	Simple	Complex (tombstones)
Worst-case guarantee	per operation	per operation

When to use:

Chaining: High load factors expected, many deletions, simple implementation
Open Addressing: Memory constrained, cache performance critical, few deletions

Interview note: You rarely implement collision handling yourself. Understanding the trade-offs helps explain why hash table performance can degrade and when to consider alternatives.

Q3: When should I use `dict` vs `set` in Python?

Answer: Use dict when you need key-value pairs, and set when you only need membership testing.

dict (dictionary):

Stores key-value mappings: {key: value}
Use cases: Counting frequencies, storing metadata, mapping relationships

# Counting frequencies
freq = {}
for num in [1, 2, 2, 3, 3, 3]:
    freq[num] = freq.get(num, 0) + 1
# Result: {1: 1, 2: 2, 3: 3}

# Storing indices
index_map = {}
for i, val in enumerate([10, 20, 30]):
    index_map[val] = i
# Result: {10: 0, 20: 1, 30: 2}

set (hash set):

Stores only unique keys: {key1, key2, key3}
Use cases: Deduplication, membership testing, set operations

# Deduplication
unique_nums = set([1, 2, 2, 3, 3, 3])
# Result: {1, 2, 3}

# Fast membership testing
seen = set()
for num in nums:
    if num in seen:  # O(1) average
        print("Duplicate!")
    seen.add(num)

# Set operations
set1 = {1, 2, 3}
set2 = {2, 3, 4}
intersection = set1 & set2  # {2, 3}
union = set1 | set2         # {1, 2, 3, 4}

Decision tree:

Need to store values with keys?
├─ Yes → Use dict
│  └─ Examples: frequency counting, index mapping, caching
│
└─ No → Use set
   └─ Examples: deduplication, membership testing, set operations

Performance comparison:

Operation	`dict`	`set`	Notes
Insertion			Similar performance
Lookup			Similar performance
Memory	Higher (stores values)	Lower (keys only)	`set` saves ~30-40% memory
Use case	Key-value mapping	Membership testing	Different purposes

Common mistake: Using dict when you only need set:

# ❌ Inefficient: storing dummy values
seen = {}
for num in nums:
    if num not in seen:
        seen[num] = True  # Wasteful!

# ✅ Efficient: use set
seen = set()
for num in nums:
    if num not in seen:
        seen.add(num)

Q4: What makes a good hash function, and what are common pitfalls?

Answer: A good hash function should be deterministic, uniform, and fast. Poor hash functions cause collisions and degrade performance.

Properties of a good hash function:

Deterministic: Same input → same output (always)
Uniform distribution: Keys should map evenly across buckets
Fast computation: Should beorwhereis key size
Avalanche effect: Small input changes → large hash changes

Example: String hashing:

# ❌ Bad: Simple sum (many collisions)
def bad_hash(s):
    return sum(ord(c) for c in s) % 1000
# Problem: "abc" and "cba" hash to same value!

# ✅ Good: Polynomial rolling hash
def good_hash(s):
    hash_val = 0
    prime = 31
    for c in s:
        hash_val = (hash_val * prime + ord(c)) % (2**31)
    return hash_val
# Better distribution, fewer collisions

Common pitfalls:

Pitfall 1: Non-uniform distribution

# ❌ Bad: All keys hash to same bucket
def terrible_hash(key):
    return 0  # Everything collides!

# ✅ Good: Use built-in hash() or proper algorithm
hash_val = hash(key)

Pitfall 2: Ignoring key characteristics

# For integers: identity hash is fine
hash(42) == 42  # Works well

# For strings: need to consider all characters
# Python's hash() uses SipHash (cryptographic quality)

Pitfall 3: Hash function that's too slow

# ❌ Bad: Expensive computation per hash
def slow_hash(s):
    import hashlib
    return int(hashlib.sha256(s.encode()).hexdigest(), 16)
# Too slow for frequent lookups!

# ✅ Good: Fast polynomial hash
def fast_hash(s):
    h = 0
    for c in s:
        h = h * 31 + ord(c)
    return h

Hash function quality metrics:

Metric	Good	Bad
Collision rate	Low (< 5% for random keys)	High (> 20%)
Distribution	Uniform across buckets	Clustered
Speed	whereis key size	Slower than
Avalanche	Small input change → large hash change	Small change → small hash change

Interview tip: You rarely implement hash functions. Python's hash() handles most cases. For custom objects, override __hash__() and __eq__():

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def __hash__(self):
        return hash((self.x, self.y))  # Use tuple hash
    
    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

# Now Point can be used as dict key
points = {Point(1, 2): "A", Point(3, 4): "B"}

Q5: What is the memory overhead of hash tables compared to arrays?

Answer: Hash tables have significant memory overhead due to: 1. Hash table structure (buckets, metadata) 2. Collision handling (chaining pointers or extra slots) 3. Load factor (typically 50-70% full to maintain performance) 4. Hash function storage (for some implementations)

Memory breakdown:

Array:

Stores only data:
Example: [1, 2, 3, 4, 5] → 5 integers = 20 bytes (assuming 4-byte ints)

Hash table (Python dict):

Overhead per entry: ~24-48 bytes (key, value, hash, pointers)
Load factor: ~50-66% (table is 1.5-2x larger than entries)
Example: {1: "a", 2: "b", 3: "c"} → ~200-300 bytes (vs 12 bytes for array)

Comparison table:

Data Structure	Memory forintegers	Overhead Factor
Array	bytes	1x (baseline)
Hash Set	~tobytes	6-12x
Hash Map	~tobytes	8-16x

Example calculation:

import sys

# Array: minimal overhead
arr = [i for i in range(1000)]
print(sys.getsizeof(arr))  # ~8040 bytes (1000 ints + small overhead)

# Set: significant overhead
s = set(range(1000))
print(sys.getsizeof(s))    # ~32968 bytes (~4x more!)

# Dict: even more overhead
d = {i: i for i in range(1000)}
print(sys.getsizeof(d))    # ~36968 bytes (~4.5x more!)

When memory matters:

Use arrays when:

Keys are consecutive integers [0..n-1]
Memory is extremely constrained
You need predictable memory usage

Use hash tables when:

Keys are sparse or non-integer
Memory overhead is acceptable
Fast lookups are critical

Space optimization techniques:

Use set instead of dict when possible:

# ❌ Wasteful
seen = {num: True for num in nums}

# ✅ Efficient
seen = set(nums)

Pre-allocate size if known:

1 2	# Python dicts resize automatically, but you can hint d = dict.fromkeys(range(1000)) # Pre-sized

Consider array-based solutions for dense integer keys:

1
2
3

# For keys [0..999], array is better
arr = [0] * 1000  # 4000 bytes
d = {i: 0 for i in range(1000)}  # ~37000 bytes

Interview tip: Always mention the space-time trade-off. Hash tables trade memory for speed. If memory is constrained, consider sorted arrays with binary search (lookup) or other alternatives.

Q6: What's the difference between `OrderedDict` and regular `dict` in Python?

Answer: OrderedDict (Python 3.7+) maintains insertion order, while older dict implementations didn't guarantee order. In Python 3.7+, regular dict also maintains insertion order, but OrderedDict offers additional features.

Historical context:

Python < 3.7: dict had arbitrary order (implementation detail)
Python 3.7+: dict maintains insertion order (language guarantee)
OrderedDict: Always maintained order, even in older Python versions

Current behavior (Python 3.7+):

# Regular dict: maintains insertion order
d = {}
d['first'] = 1
d['second'] = 2
d['third'] = 3
print(list(d.keys()))  # ['first', 'second', 'third']

# OrderedDict: also maintains insertion order
from collections import OrderedDict
od = OrderedDict()
od['first'] = 1
od['second'] = 2
od['third'] = 3
print(list(od.keys()))  # ['first', 'second', 'third']

When to use OrderedDict:

Use OrderedDict when you need:

Reordering operations:

from collections import OrderedDict

od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])

# Move to end
od.move_to_end('a')
print(list(od.keys()))  # ['b', 'c', 'a']

# Move to beginning
od.move_to_end('a', last=False)
print(list(od.keys()))  # ['a', 'b', 'c']

Pop from specific end:

# Pop last item (LIFO)
last = od.popitem(last=True)  # ('c', 3)

# Pop first item (FIFO)
first = od.popitem(last=False)  # ('a', 1)

Equality based on order:

# OrderedDict considers order in equality
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2)  # False (different order)

# Regular dict doesn't care about order (Python 3.7+)
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1 == d2)  # True (same items)

Performance comparison:

Operation	`dict`	`OrderedDict`	Notes
Insertion			Similar
Lookup			Similar
Memory	Lower	Higher (~20% more)	`OrderedDict` stores extra pointers
Reordering	Not supported		`move_to_end()` only in `OrderedDict`

Common use cases:

LRU Cache (Least Recently Used):

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return -1
        # Move to end (most recently used)
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            # Remove least recently used (first item)
            self.cache.popitem(last=False)

Maintaining insertion order (though dict does this now):

# Both work, but OrderedDict is more explicit
def process_items(items):
    result = OrderedDict()  # Clear intent: order matters
    for item in items:
        result[item.id] = item.process()
    return result

Interview tip: In Python 3.7+, regular dict is usually sufficient. Use OrderedDict only when you need move_to_end() or popitem(last=False), or when you need to support older Python versions with guaranteed ordering.

Q7: How do hash tables behave in multi-threaded environments?

Answer: Most hash table implementations are not thread-safe by default. Concurrent modifications can cause data corruption, infinite loops, or crashes. Use synchronization primitives or thread-safe alternatives.

The problem:

Race condition example:

import threading

# Shared hash table
counter = {}

def increment(key):
    if key not in counter:
        counter[key] = 0
    counter[key] += 1  # Not atomic!

# Multiple threads modifying simultaneously
threads = []
for i in range(10):
    t = threading.Thread(target=increment, args=('count',))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(counter['count'])  # Might not be 10! Could be 5, 7, etc.

What can go wrong:

Lost updates: Two threads read, both increment, both write → one update lost
Inconsistent state: Hash table resizing during concurrent access → corruption
Infinite loops: Iteration while another thread modifies → undefined behavior

Solutions:

Option 1: Locks (synchronization):

import threading

counter = {}
lock = threading.Lock()

def increment(key):
    with lock:  # Acquire lock
        if key not in counter:
            counter[key] = 0
        counter[key] += 1
    # Lock released automatically

Option 2: Thread-safe data structures:

Python: Use collections.Queue or external libraries:

# For simple cases, use queue
from queue import Queue
q = Queue()  # Thread-safe

# For dict-like structures, use threading-safe wrappers
from threading import Lock

class ThreadSafeDict:
    def __init__(self):
        self._dict = {}
        self._lock = Lock()
    
    def __getitem__(self, key):
        with self._lock:
            return self._dict[key]
    
    def __setitem__(self, key, value):
        with self._lock:
            self._dict[key] = value

Java: Use ConcurrentHashMap:

import java.util.concurrent.ConcurrentHashMap;

ConcurrentHashMap<String, Integer> map = new ConcurrentHashMap<>();
// Thread-safe operations
map.put("key", 1);
map.get("key");

C++: Use std::shared_mutex or external libraries:

#include <shared_mutex>
#include <unordered_map>

std::unordered_map<std::string, int> map;
std::shared_mutex mtx;

// Read lock (multiple readers allowed)
{
    std::shared_lock<std::shared_mutex> lock(mtx);
    int value = map["key"];
}

// Write lock (exclusive)
{
    std::unique_lock<std::shared_mutex> lock(mtx);
    map["key"] = value;
}

Performance trade-offs:

Approach	Thread Safety	Performance	Complexity
No synchronization	❌ Unsafe	⚡ Fastest	✅ Simple
Coarse-grained lock	✅ Safe	🐌 Slow (serialized)	✅ Simple
Fine-grained locks	✅ Safe	⚡ Faster (parallel reads)	❌ Complex
Lock-free structures	✅ Safe	⚡⚡ Fastest	❌❌ Very complex

Best practices:

Avoid shared mutable state when possible:

# ✅ Good: Each thread has its own dict
def process_chunk(chunk):
    local_dict = {}  # Thread-local
    for item in chunk:
        local_dict[item] = process(item)
    return local_dict

# ❌ Bad: Shared dict
shared_dict = {}
def process_chunk(chunk):
    for item in chunk:
        shared_dict[item] = process(item)  # Race condition!

Use immutable data structures:

# Instead of modifying shared dict, return new dict
def merge_results(results):
    # Combine thread results (no concurrent modification)
    final = {}
    for result in results:
        final.update(result)
    return final

Read-only access is usually safe (but verify implementation):

1
2
3

# Multiple threads reading is typically safe
def lookup(key):
    return shared_dict.get(key)  # Read-only, usually safe

Interview tip: Mention that hash tables are not thread-safe by default. If asked about concurrent access, discuss locks, thread-safe alternatives, or architectural solutions (avoiding shared state).

Q8: What are common interview follow-up questions after solving a hash table problem?

Answer: Interviewers often probe deeper into your understanding. Here are common follow-ups and how to handle them:

Follow-up 1: "Can you optimize the space complexity?"

Example: After solving Two Sum withspace:

# Original: O(n) space
def twoSum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

Optimization: If array is sorted, use two pointers (space):

# Optimized: O(1) space (but requires sorted array)
def twoSum_sorted(nums, target):
    left, right = 0, len(nums) - 1
    while left < right:
        total = nums[left] + nums[right]
        if total == target:
            return [left, right]
        elif total < target:
            left += 1
        else:
            right -= 1

Follow-up 2: "What if we need to return all pairs, not just one?"

Example: Return all pairs that sum to target:

def twoSum_all_pairs(nums, target):
    seen = {}
    result = []
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            # Add all previous indices
            for prev_idx in seen[complement]:
                result.append([prev_idx, i])
        # Store current index
        if num not in seen:
            seen[num] = []
        seen[num].append(i)
    return result

Follow-up 3: "What if the array can have duplicates?"

Answer: The hash table solution handles duplicates correctly (as shown in the Two Sum problem). Explain that you check before inserting to avoid self-pairing.

Follow-up 4: "How would you handle very large datasets that don't fit in memory?"

Answer: Use external hashing or streaming algorithms:

# Streaming approach: process in chunks
def twoSum_streaming(stream, target):
    seen = {}
    chunk_size = 10000
    
    for chunk in read_in_chunks(stream, chunk_size):
        for i, num in enumerate(chunk):
            complement = target - num
            if complement in seen:
                return [seen[complement], current_index]
            seen[num] = current_index
            current_index += 1
        # Optionally: evict old entries if memory constrained

Follow-up 5: "What's the worst-case time complexity?"

Answer: Hash table operations areaverage case, butworst case due to:

All keys hashing to same bucket (adversarial input)
Hash table resizing (amortized)

Follow-up 6: "Can you solve this without extra space?"

Answer: Sometimes yes (if array is sorted or has constraints):

Sorted array: Two pointers (space)
Array with constraints: Use array itself as hash table (First Missing Positive pattern)
Otherwise: Usually needspace fortime

Follow-up 7: "How would you test this solution?"

Answer: Cover these cases:

test_cases = [
    ([2, 7, 11, 15], 9, [0, 1]),           # Normal case
    ([3, 3], 6, [0, 1]),                    # Duplicates
    ([3, 2, 4], 6, [1, 2]),                 # Not at start
    ([-1, -2, -3, -4, -5], -8, [2, 4]),    # Negative numbers
    ([1, 2], 3, [0, 1]),                    # Edge: two elements
]

Follow-up 8: "What if we need to find three numbers that sum to target?"

Answer: Extend the pattern:

def threeSum(nums, target):
    nums.sort()  # O(n log n)
    result = []
    
    for i in range(len(nums) - 2):
        if i > 0 and nums[i] == nums[i-1]:
            continue  # Skip duplicates
        
        left, right = i + 1, len(nums) - 1
        while left < right:
            total = nums[i] + nums[left] + nums[right]
            if total == target:
                result.append([nums[i], nums[left], nums[right]])
                # Skip duplicates
                while left < right and nums[left] == nums[left+1]:
                    left += 1
                while left < right and nums[right] == nums[right-1]:
                    right -= 1
                left += 1
                right -= 1
            elif total < target:
                left += 1
            else:
                right -= 1
    
    return result

Interview strategy: 1. Acknowledge the trade-off: "The hash table solution usesspace fortime. If space is constrained..." 2. Show alternatives: "We could use sorting + two pointers forspace buttime." 3. Ask clarifying questions: "Are we optimizing for time or space? Is the array sorted?"

Q9: What are edge cases in time complexity analysis for hash tables?

Answer: Hash table operations are amortized, but several factors can affect actual performance:

Edge case 1: Hash collisions (worst-case)

Scenario: All keys hash to the same bucket:

# Adversarial input: all keys collide
class BadHash:
    def __hash__(self):
        return 42  # Always same hash!

bad_keys = [BadHash() for _ in range(1000)]
d = {key: i for i, key in enumerate(bad_keys)}

# Lookup becomes O(n) - linear search through chain
value = d[bad_keys[500]]  # O(n) worst case!

Mitigation: Good hash function + load factor management

Edge case 2: Hash table resizing (amortized analysis)

Scenario: When load factor exceeds threshold, table doubles in size:

# Inserting n elements
d = {}
for i in range(1000000):
    d[i] = i  # Triggers resizing at ~66k, ~133k, ~266k, etc.

Cost analysis:

Individual insert: Usually, butwhen resizing
Amortized cost:per insert (resizing happens rarely)

Proof sketch: If table doubles when full, total cost forinserts:

Insertions:operations
Resizings:times, each costs
Total:
Amortized:... wait, that's not right!

Correct analysis:

Resize at sizes:
Cost of resize at size:(rehash all entries)
Total resize cost:
Amortized:✅

Edge case 3: Iteration complexity

Scenario: Iterating through hash table:

d = {i: i*2 for i in range(1000)}

# Iteration: O(n) time, O(n) space for keys/items
for key in d:  # O(n)
    print(key, d[key])

Complexity:time,space (to store iteration state)

Edge case 4: String keys (hash computation cost)

Scenario: Long strings as keys:

# Hash computation: O(k) where k is string length
long_strings = ["a" * 10000 for _ in range(1000)]
d = {s: i for i, s in enumerate(long_strings)}

# Lookup: O(1) + O(k) = O(k) where k is key length
value = d[long_strings[500]]  # O(10000) to compute hash!

Complexity:whereis key length (notif keys are large)

Edge case 5: Custom hash functions

Scenario: Expensive hash computation:

def expensive_hash(obj):
    # Cryptographic hash: slow!
    import hashlib
    return int(hashlib.sha256(str(obj).encode()).hexdigest(), 16)

class ExpensiveKey:
    def __hash__(self):
        return expensive_hash(self)

# Every lookup computes expensive hash
d = {ExpensiveKey(): i for i in range(1000)}
value = d[key]  # Slow due to hash computation

Mitigation: Cache hash values or use faster hash functions

Complexity summary table:

Operation	Average	Amortized	Notes
Insert			Worst: all collisions
Lookup		N/A	Worst: all collisions
Delete		N/A	Worst: all collisions
Iteration		N/A	Always linear
Resize	N/A	per insert	Amortized analysis

Interview tip: Always mention "average case" and acknowledge worst-case. Explain that worst-case is rare with good hash functions and proper load factor management.

Q10: What are space optimization techniques for hash table problems?

Answer: Several techniques can reduce memory usage while maintaining performance:

Technique 1: Use array as hash table (when keys are dense integers)

Scenario: Keys are integers in a known range [0..n-1]:

# ❌ Wasteful: O(n) extra space for hash table
def find_duplicate_dict(nums):
    seen = {}
    for num in nums:
        if num in seen:
            return num
        seen[num] = True

# ✅ Efficient: Use array as hash table
def find_duplicate_array(nums):
    # Array indices act as hash keys
    seen = [False] * (len(nums) + 1)
    for num in nums:
        if seen[num]:
            return num
        seen[num] = True

Space savings: Array usesvs hash table'sbut with lower constant factor (~4x less memory)

Technique 2: Bit manipulation for boolean flags

Scenario: Tracking presence/absence (boolean values):

# ❌ Wasteful: dict stores full integers
seen = {}
for num in nums:
    seen[num] = True  # Stores key + value

# ✅ Efficient: Use set (only keys)
seen = set()
for num in nums:
    if num in seen:
        return True
    seen.add(num)

# ✅✅ Most efficient: Bit vector (if range is small)
def has_duplicate_bitvector(nums):
    # Assuming nums are in range [1..32]
    bits = 0
    for num in nums:
        bit = 1 << num
        if bits & bit:
            return True
        bits |= bit
    return False

Space comparison:

dict: ~48 bytes per entry
set: ~24 bytes per entry
bit vector: 1 bit per entry (32 entries = 4 bytes!)

Technique 3: In-place modification (use input array as hash table)

Scenario: First Missing Positive (LeetCode 41):

# Problem: Find smallest missing positive integer
# Constraint: O(1) extra space

def firstMissingPositive(nums):
    n = len(nums)
    
    # Step 1: Replace negatives/zeros with n+1
    for i in range(n):
        if nums[i] <= 0:
            nums[i] = n + 1
    
    # Step 2: Use array indices as hash keys
    # Mark presence by making value negative
    for i in range(n):
        num = abs(nums[i])
        if num <= n:
            nums[num - 1] = -abs(nums[num - 1])
    
    # Step 3: Find first positive index
    for i in range(n):
        if nums[i] > 0:
            return i + 1
    
    return n + 1

Key insight: Use array indices [0..n-1] to represent numbers [1..n]. Mark presence by negating values.

Technique 4: Sliding window with single pass

Scenario: Longest Substring Without Repeating Characters:

# ❌ Wasteful: Store all seen characters
def lengthOfLongestSubstring_bad(s):
    max_len = 0
    for i in range(len(s)):
        seen = set()  # New set for each starting position
        for j in range(i, len(s)):
            if s[j] in seen:
                break
            seen.add(s[j])
            max_len = max(max_len, len(seen))
    return max_len

# ✅ Efficient: Sliding window, reuse hash table
def lengthOfLongestSubstring_good(s):
    char_index = {}  # Reused across windows
    left = 0
    max_len = 0
    
    for right in range(len(s)):
        if s[right] in char_index:
            # Move left pointer past last occurrence
            left = max(left, char_index[s[right]] + 1)
        char_index[s[right]] = right
        max_len = max(max_len, right - left + 1)
    
    return max_len

Space savings:space reused vstotal space

Technique 5: Frequency counting with array instead of dict

Scenario: Counting characters (limited alphabet):

# ❌ General but wasteful for small alphabet
from collections import Counter
freq = Counter(s)  # Dict overhead

# ✅ Efficient for lowercase English (26 letters)
freq = [0] * 26
for c in s:
    freq[ord(c) - ord('a')] += 1

Space comparison:

Counter (dict): ~26 × 48 bytes = ~1248 bytes
Array: 26 × 4 bytes = 104 bytes (~12x less!)

Technique 6: Two-pass instead of storing all data

Scenario: Finding first unique character:

# ❌ Store all indices
def firstUniqChar_bad(s):
    char_indices = {}
    for i, c in enumerate(s):
        if c not in char_indices:
            char_indices[c] = []
        char_indices[c].append(i)
    
    for c, indices in char_indices.items():
        if len(indices) == 1:
            return indices[0]
    return -1

# ✅ Two passes: count first, then find
def firstUniqChar_good(s):
    freq = {}
    for c in s:
        freq[c] = freq.get(c, 0) + 1
    
    for i, c in enumerate(s):
        if freq[c] == 1:
            return i
    return -1

Space savings: Store counts (integers) instead of lists of indices

Summary table:

Technique	When to Use	Space Savings	Trade-off
Array as hash	Dense integer keys [0..n-1]	4-8x less	Requires known range
Bit vector	Boolean flags, small range	8-32x less	Limited to small ranges
In-place mod	Can modify input array	extra space	Destroys input
Sliding window	Substring/subarray problems	Reuse vs recreate	More complex logic
Array vs dict	Small, fixed alphabet	10-20x less	Less flexible
Two-pass	Can avoid storing indices	Store counts vs lists	Extra pass

Interview tip: Always consider if you can use the input array itself as a hash table, especially when the problem asks forextra space. This is a common pattern in "hard" LeetCode problems.

🎓 Summary: Hash Table Patterns Cheat Sheet

Quick Reference

When to use hash tables:

✅ Fast lookups needed (average)
✅ Finding pairs/complements
✅ Grouping by key
✅ Frequency counting
✅ Membership testing
✅ Sparse or non-integer keys

When NOT to use hash tables:

❌ Dense consecutive integer keys [0..n-1] → Use array
❌ Memory extremely constrained → Consider sorted array + binary search
❌ Need sorted order → Use sorted data structure
❌ Thread-safe required → Use synchronization or thread-safe alternatives

Pattern Library

Pattern	Key Insight	Example Problem
Complement Search	Store seen values, check for `target - current`	Two Sum, Three Sum
Canonical Form	Group by sorted/normalized key	Group Anagrams
Frequency Counting	Count occurrences with `Counter`	Top K Frequent
Sliding Window	Track state in fixed-size window	Longest Substring
Prefix Sum	Store prefix sums, find subarray sums	Subarray Sum Equals K
Sequence Detection	Check `num-1` not in set to find starts	Longest Consecutive
Array as Hash	Use indices as keys for dense integers	First Missing Positive

Complexity Cheat Sheet

Operation	Average	Worst	Space
Insert
Lookup
Delete
Iterate

Python Quick Reference

# Basic operations
d = {}                    # Create dict
d[key] = value            # Insert/update
value = d[key]            # Lookup (KeyError if missing)
value = d.get(key, default)  # Safe lookup
del d[key]                # Delete
key in d                  # Membership test

# Useful imports
from collections import Counter, defaultdict, OrderedDict
freq = Counter(items)                    # Frequency counting
groups = defaultdict(list)               # Group by key
od = OrderedDict()                       # Maintain order

# Common patterns
seen = set()                            # Membership only
seen.add(item)                          # Add to set
if item in seen: ...                    # Check membership

# Key design
key = tuple(sorted(items))              # Immutable, hashable
key = ''.join(sorted(s))                # String canonical form
key = tuple(count_array)                # Array → tuple for hashing

Common Pitfalls Checklist

Check before inserting (avoid self-pairs)
Use immutable keys (tuples, not lists)
Handle duplicates correctly
Test edge cases (empty, single element, all duplicates)
Consider space-time trade-off (mention in interview)
Use set instead of dict when values aren't needed
Don't mutate keys after hashing
Verify return order (indices, not values)

Interview Template

Identify: "This needs fast lookup → hash table"
Design key: "I'll use [X] as the key because..."
Design value: "I'll store [Y] to track..."
Walk through: Trace example with small input
Code: Implement with dict/set
Analyze: Time, Space
Optimize: "If space constrained, we could..."
Test: Edge cases (empty, duplicates, etc.)

Memory Optimization Quick Wins

Use set instead of dict when values aren't needed → ~50% memory savings
Use array instead of dict for dense integer keys → ~75% memory savings
Use bit vector for boolean flags (small range) → ~95% memory savings
Reuse hash table in sliding window → Avoid recreation overhead
Two-pass approach → Store counts instead of indices when possible

Series Navigation

Quick Refresher: What Hash Tables Provide

LeetCode 1: Two Sum

Problem Analysis

Problem Statement

Brute Force Approach (For Comparison)

Hash Table Solution: The Complement Pattern

Python Implementation

Java Implementation

Why This Works

Complexity Analysis

Deep Dive

Algorithm Principle: Why Does This Algorithm Work?

Time Complexity Proof: Why O(n)?

Space Optimization: Is There a More Space-Efficient Method?

Common Mistakes Analysis

Variant Extensions

Common Pitfalls & Edge Cases

Pitfall 1: Duplicate Values

Pitfall 2: "Same Element Twice" Misunderstanding

Pitfall 3: Don't Sort!

Pitfall 4: Off-by-One in Return Order

Real-World Analogy

LeetCode 128: Longest Consecutive Sequence

Problem Analysis

Problem Statement

Why Not Just Sort?

Hash Set Solution: Start from Sequence Beginnings

Algorithm

Python Implementation

Why Is This?

Complexity Analysis

Deep Dive

Algorithm Principle: Why Count Only from Starts?

Time Complexity Proof: Why O(n)?

Space Optimization: Is There a More Space-Efficient Method?

Common Mistakes Analysis

Variant Extensions

Example Walkthrough

Common Edge Cases

Edge Case 1: Empty Array

Edge Case 2: All Duplicates

Edge Case 3: No Consecutive Numbers

Edge Case 4: Entire Array is Consecutive

Edge Case 5: Negative Numbers

Real-World Applications

LeetCode 49: Group Anagrams

Problem Analysis

Problem Statement

Key Insight: What Makes a Good Hash Key?

Option 1: Sorted String as Key

Option 2: Character Count as Key (Faster!)

Option 3: Fixed-Size Array as Key (Best for Lowercase English)

Deep Dive

Algorithm Principle: Why Do These Methods Work?

Time Complexity Proof: Why Is Method 3 Fastest?

Space Optimization: Is There a More Space-Efficient Method?

Common Mistakes Analysis

Variant Extensions

Complexity Comparison

Example Walkthrough

Edge Cases

Edge Case 1: Empty String

Edge Case 2: Single Character

Edge Case 3: No Anagrams

Edge Case 4: All Anagrams

Real-World Applications

Advanced Hash Table Patterns

Pattern 1: Complement Search (Two Sum Variant)

Pattern 2: Frequency Counting

Pattern 3: Sliding Window + Hash Map

Pattern 4: Prefix Sum with Hash Map

Hash Table Internals (Bonus)

How Python dict Works

Collision Resolution

Open Addressing (Python)

Chaining (Java HashMap)

Hash Function Quality

Common Mistakes & Debugging Checklist

Mistake 1: Forgetting to Check Before Inserting

How Python `dict` Works

Chaining (Java `HashMap`)

Trick 1: Use `collections.Counter`

Trick 2: Use `collections.defaultdict`

Trick 3: Use `enumerate` for Index+Value

Q3: When should I use `dict` vs `set` in Python?

Q6: What's the difference between `OrderedDict` and regular `dict` in Python?