Prepares query expression data by:
Finding genes overlapping with reference
Validating data properties (non-negative, integer counts)
Scaling columns by standard deviation (without centering)
The scaling step matches the preprocessing in Python starCAT: sklearn.preprocessing.scale(X, with_mean=False)