Hamming Distance Calculator
Hamming distance is a metric used to compare two strings of equal length. It measures the number of positions at which the corresponding symbols differ. For binary data, it's simply the number of bit flips required to change one string into the other. Named after Richard Hamming, it's fundamental in coding theory for error detection and correction.
Hamming distance determines the error-detecting and error-correcting capabilities of codes. It's used in designing reliable digital communication systems, storage systems, and even in bioinformatics for comparing genetic sequences. Understanding Hamming distance helps engineers and scientists quantify the difference between data strings.
Key Hamming distance concepts:
- Metric properties: d(x,y) ≥ 0, d(x,y)=0 iff x=y, symmetric, triangle inequality
- Minimum distance of a code: d_min = min{d(c_i, c_j) for i≠j}
- Error detection capability: Can detect up to (d_min - 1) errors
- Error correction capability: Can correct up to floor((d_min - 1)/2) errors
- Weight: Hamming weight = number of non-zero bits (distance from zero)
This calculator computes Hamming distance for both binary strings and integers:
- Binary Strings: Enter two binary strings (only 0/1). They are padded to equal length automatically.
- Integers: Enter two integers and optionally choose bit length. Distance computed as popcount(x XOR y).
The calculator provides:
- Hamming distance value (number of differing bits)
- Detailed comparison: For binary mode, shows aligned strings with differences highlighted
- Bitwise statistics: String length, matching bits, differing bits
- Interpretation: Explanation of the result in context of error correction
- Common examples: Preloaded examples for quick reference
Illustrative examples of Hamming distance in various contexts:
| Data A | Data B | Binary Representation | Hamming Distance | Notes |
|---|---|---|---|---|
| 101010 | 110011 | 101010 vs 110011 | 3 | Positions 2,4,6 differ |
| 1111 | 0000 | 1111 vs 0000 | 4 | All bits differ |
| 42 | 57 | 101010 vs 111001 (6 bits) | 4 | 42=101010, 57=111001 |
| 0 | 255 | 00000000 vs 11111111 (8 bits) | 8 | All bits differ |
| Hamming(7,4) 1010 | Hamming(7,4) 1011 | Encoded: 1010101 vs 1011010? | 3 | Minimum distance of code is 3 |
| ASCII 'A' (65) | ASCII 'a' (97) | 01000001 vs 01100001 (8 bits) | 1 | Only bit 5 differs |
d=0: Strings are identical
d=1: Single-bit error – detectable, not correctable without parity
d=2: Can detect single errors, but cannot correct (e.g., simple parity)
d=3: Can correct single errors (e.g., Hamming codes)
d=4: Can detect up to 3 errors, correct single (extended Hamming) or double error detection
Below are answers to frequently asked questions about Hamming distance:
Use XOR and then count set bits (population count).
x = 42 (binary 101010), y = 57 (binary 111001)
x XOR y = 42 ^ 57 = 27 (binary 011011)
Number of 1s in 27 = 4 (because 27 = 16+8+2+1 = 11011? Actually 27 decimal = 11011 binary, but we need 6 bits: 011011 → four 1s).
Hamming distance = 4.
Efficient methods: Use built-in popcount in many languages: `__builtin_popcount` in C/C++, `bin(x^y).count('1')` in Python, `(x ^ y).toString(2).split('1').length-1` in JavaScript.
Hamming distance is defined for equal-length strings. For unequal strings, common approaches:
- Pad with zeros (or a neutral symbol) to equal length – our calculator does this for binary strings.
- Use edit distance (Levenshtein) which allows insertions/deletions.
- Truncate to the shorter length – only if appropriate for the application.
For integers, we pad to the chosen bit length (auto mode uses the minimum bits needed to represent the larger number).
Error-correcting codes are designed to have a minimum Hamming distance d_min between codewords.
| d_min | Error detection (max) | Error correction (max) | Example code |
|---|---|---|---|
| 1 | 0 | 0 | No redundancy |
| 2 | 1 (single error detection) | 0 | Parity bit |
| 3 | 2 | 1 (single error correction) | Hamming(7,4) |
| 4 | 3 | 1 (or detect double) | Extended Hamming |
| 5 | 4 | 2 (double error correction) | BCH, Reed-Solomon |
When a received word is closer (in Hamming distance) to one codeword than any other, it is decoded as that codeword. This is nearest neighbor decoding.
In bioinformatics, Hamming distance compares DNA strings (A,C,G,T) of equal length. It measures genetic distance, useful for:
- Mutation detection: Number of point mutations between sequences
- Phylogenetics: Building evolutionary trees
- Sequence alignment: When sequences are already aligned
- Error correction in DNA storage: Similar to digital error correction
For unequal lengths, other metrics like Levenshtein distance are used.
Hamming weight w(x) is the number of non-zero symbols in a string (for binary, number of 1s). Then Hamming distance d(x,y) = w(x XOR y). Also, d(x,y) = w(x) + w(y) - 2*w(x AND y).
Example: x=1010 (w=2), y=1100 (w=2). x XOR y = 0110 (w=2) = d. x AND y = 1000 (w=1). Then w(x)+w(y)-2*w(AND) = 2+2-2*1 = 2, matches.
Hamming weight is also the distance from the zero vector.
The Hamming bound (sphere-packing bound) gives an upper limit on the size of a code with given length n and minimum distance d.
A(n,d) ≤ 2ⁿ / Σ_{k=0}^{t} C(n,k)
where t = floor((d-1)/2) is the error-correction capability. The sum in denominator counts the number of words in each Hamming ball of radius t. Codes that achieve equality are called perfect codes (e.g., Hamming codes, Golay codes).