Given n
strings (consisting of lowercase english letters), each of length m
,
the Blobstore team at Rubrik is considering introducing a deduplication feature to save space when
storing these strings on disk. Your task is to help them maximize these space savings with the following method:
A block is defined as a contiguous, non-overlapping substring of a string. The length of a block, l
,
divides each string into several blocks, with the last block possibly having a length of < l
.
For a block b
in string si
, if an identical block b'
exists
anywhere within the previous or the same string (i.e., string i-1
or i
), then b
can be deduplicated to b'
. In other words, instead of storing the content of block b
,
we store a reference to previously stored b'
, saving storage space. For this problem, we will assume
that a reference does not take any space. The goal is to find the ideal block size l
that maximizes
the storage savings. However, l
must not be smaller than a provided threshold k
.
You should print the ideal block size and the corresponding storage savings in terms of the number of characters saved. If there are multiple block sizes which yield the maximum savings, return the smallest such block size.
Example -
Consider the below 3 string:
If the block length is 2, then strings will be divided into blocks in the following manner:
And they can be stored as the following after deduplication process -
Because of the deduplication, we need not write 7 characters, so we saved 7 character writes.
Function Description
Complete the function unalignedDedupe
in the editor.
unalignedDedupe
has the following parameters:
- 1.
String[] strings
: an array of strings - 2.
int m
: the length of each string - 3.
int k
: the minimum block size
Returns
int[]
: an array of two integers where the first integer is the ideal block size and the second integer is the corresponding storage savings
Example 1:
Input: strings = ["aand", "adaa", "bend"], m = 4, k = 1
Output: [1, 7]
Explanation:n/a
Example 2:
Input: strings = ["abcdef", "defbac", "abdefa"], m = 6, k = 2
Output: [3, 6]
Explanation:n/a
- 1 <= n <= 1.5 * 106
- 1 <= m <= 1.5 * 106
- 1 <= n*m <= 1.5 * 106
- 1 <= k <= m

input:
output: