Data analysts at Amazon are analyzing a data set of n strings in the array
dataSet[], each consisting of lowercase English letters. Each character in a
string corresponds to a particular feature.
The information gain obtained by training a model with two strings, dataSet[i]
and dataSet[j], is the difference between the lengths of the strings i.e.,
|len(dataSet[i]) - len(dataSet[j])|. To avoid too many overlapping features, two
strings can be selected only if the number of common features between them does not exceed a
given threshold, max_common_features. The number of common features here is equal
to the number of common characters between the two strings. For example, "abc" and "bcd" have
2 common features 'b' and 'c'. While "aa" and "aaa" have two common features, the "a" two
'a' characters.
Given dataSet and max_common_features, determine the maximum
information gain possible.
Complete the function getMaxInformationGain in the editor.
getMaxInformationGain takes the following arguments:
String[] dataSet: the strings of featuresint max_common_features: the maximum number of common features allowed between data points
Returns
int: the maximum possible information gain
𓇼 ⋆.˚ 𓆝 𓆡⋆.˚ 𓇼Forever thankful chizzy_elect 🍀
dataSet = ["abofh", "ab", "mo"] max_common_features = 1 return = 3
dataSet = ["a", "bcdef"] max_common_features = 1 return = 4
2 ≤ n ≤ 10001 ≤ len(dataSet[i]) ≤ 10001 ≤ max_common_features ≤ 1000
- Count Promotional PeriodsOA · Seen Jun 2026
- Find Maximum Total Amount (SDE I, Fungible :)Seen Jun 2026
- Get Minimum AmountOA · Seen Jun 2026
- Find Minimum CostOA · Seen Jun 2026
- Get Smallest Base SegmentOA · Seen Jun 2026
- Maximum Non-Adjacent House ValueONSITE INTERVIEW · Seen Jun 2026
- Running Delivery Time MediansONSITE INTERVIEW · Seen Jun 2026
- Select Least Resource TasksOA · Seen Jun 2026
public int getMaxInformationGain(String[] dataSet, int max_common_features) {
// write your code here
}