In many cases, to use distance-based methods (such as KNN), we need to make the features used in the process comparable to one another. Otherwise, the distance-based algorithms struggle to compare those features to each other and produce inferior results. Standardization is one of the most commonly used techniques to "make the features comparable". It works by processing each feature in the following way:
Subtract its average from each value to center around zero.
Then divide by how much the values usually "wiggle" around that average (the standard deviation).
The result answers: “How many usual steps (standard deviation) above or below average is this value?”.
Values near 0 are typical; +1 is one usual step above; -2 is two below.
Challenge: Fair Fruit Match
You’re building a tiny KNN matcher for fruits using two features: sweetness and weight. Sweetness and weight live on different scales, so one can overpower the other. To compare fairly, you decide to standardize those features.
You are asked to read some preexisting data and then standardize new entries using the averages and the standard deviation computed from the preexisting data only. In case, the feature has a constant value, the standardized value should be 0.
The first line of the input contains a single integer n representing how many rows of preexisting data you have. Each of the next n lines contains two floating-point numbers: sweetness and weight.
The next line contains a single integer q representing how many new rows you want to standardize. Each of the next q lines contains two floating-point numbers: sweetness and weight - to be transformed using the averages and the standard deviation from the preexisting data.
The program should print q lines. Each line should contain two floating-point numbers: the standardized sweetness and the standardized weight for the corresponding new row.