There are many different methods for string hashing, but in this course, we’ll discuss one of the most common and intuitive ones to implement. It’s widely used in competitive programming contests and algorithmic interviews.
Imagine we are given a string s with letters . We would like to obtain a method that would allow us to easily compute the hash for any contiguous substring of s.
Picking a Strong Hash Function
One example of a strong hash function that works in practice is a function that takes a string and multiplies each integer value of a character by a power of some prime number and takes it modulo m:
Here both p and m are prime numbers. p can be any prime number like 1997, 127, etc. The larger m, the less likely it is to have a collision. Yet, picking m too large can slow down computations. So, many implementations set m to or . This can be implemented as:
s = input() # User input of arbitrary length
p = 1997 # p: a prime number
m = 1_000_000_007 # m: a sufficiently large prime
h =  * (len(s) + 1) # Initialize h with 0s
for i in range(len(s)):
h[i + 1] = h[i] + ord(s[i]) # Add the current character
h[i + 1] *= p # Raise all the powers of p by 1
h[i + 1] %= m # Take modulo m on every iteration
Here we compute the rolling hash function for each index (which might seem similar to a prefix sum array):
Note that with this implementation the first index of h is a dummy variable equal to 0. On top of that, we can see that each h value depends on the previous one:
Computing the Hash for Any Substring of s
Suppose we want to compare two substrings s[l1; r1) vs s[l2; r2) of the same length and check their equality. We would like to compute a normalized hash for each of those segments and in case those hashes are the same, we’ll say that the substrings are equal.
Now that we have the hash function h, we can calculate the hash for any substring of s. Yet, it’s not as straightforward as taking the difference between the right and the left indices.
We would like the substring s[l; r) to have the same hash as if the computation was started right from the first character of s.
Given the left index l and the right index r, we would like to calculate the hash for the substring s[l; r). Ideally, we would like the resulting hash to be equal to the hash obtained by calculating the hash from scratch - only on the substring s[l; r). As we’ve multiplied each character by the corresponding power of the prime p, we should adjust the left side of the hash by the difference between r and l:
This is exactly what we would obtain if we would compute the hash for s[l; r) from scratch (as a completely new string). Notice that the first power of p has a power r-l, which is the length of the substring, while the last power has a power 1, which is exactly what we would have for the last letter in the hash computation for the whole string.
This assures that when computing the hash function for different intervals like [1; 5) vs [4; 8), the different powers of p don’t affect the final result. If the characters in the interval [1; 5) are the same as the characters in the interval [4; 8), we would like their hashes to be the same. This can be done by multiplying h[l] by the difference between r and l:
# Precompute the powers
pows =  * (len(s) + 1) # Precompute all the powers
for i in range(len(s)): # all the powers 1...n
pows[i + 1] = pows[i] * p # p^i = p^(i-1) * p
pows[i + 1] %= m # Take modulo m on every iteration
# Calculate the hash for [l; r) in O(1)
res = h[r] - h[l] * pows[r - l] # Scale h[l] by the difference (r-l)
res %= m # Take the result modulo m
In case the input string s is hello hellyo, the values of the h and pows would be as follows, and performing some queries would yield the following results:
# s -> h e l l o h e l l y o
h = [0, 207688, 414954633, 664611981, 230332444, 974109122, 295966923, 46148782, 159318707, 159671329, 863857463, 123583173, 795816426]
pows = [1, 1997, 3988009, 964053924, 215672753, 698484731, 873998049, 374091638, 60995857, 808725582, 24975949, 876969810, 308698313]
# i -> 0 1 2 3 4 5 6 7 8 9 10 11 12
# hash for s[0: 3] (hel) -> 664611981 (l = 0, r = e)
# hash for s[6: 9] (hel) -> 664611981 (l = 6, r = 9)
# hash for s[0: 5] (hello) -> 974109122 (l = 0, r = 5)
# hash for s[6: 11] (helly) -> 974129092 (l = 6, r = 11)
Are you ready to hash a string and answer queries like what would be the hash for a specific substring?
The first line of the input contains the string s (1 ≤ |s| ≤ ).
The second line contains a single integer q (1 ≤ q ≤ 100 000) the number of queries.
The next q lines contain pairs of indices (0 ≤ ≤ |s|).
The program should print the hash of the string on the interval .