python - Sci-kit-learn Normalization removes column headers -


i have pandas data frame 22 columns, index datetime.

i trying normalize data using following code:

from sklearn.preprocessing import minmaxscaler  # normalization scaler = minmaxscaler(copy = false) normal_data = scaler.fit_transform(all_data2) 

the problem lose lot of data applying function, example, here before:

all_data2.head(n = 5) out[105]:                       btc_price  btc_change  btc_change_label  eth_price  \ time                                                                       2017-09-02 21:54:00  4537.8338   -0.066307                 0    330.727    2017-09-02 22:29:00  4577.6050   -0.056294                 0    337.804    2017-09-02 23:04:00  4566.3600   -0.059716                 0    336.938    2017-09-02 23:39:00  4590.0313   -0.056242                 0    342.929    2017-09-03 00:14:00  4676.1925   -0.035857                 0    354.171                           block_size    difficulty  estimated_btc_sent  \ time                                                                  2017-09-02 21:54:00  142521291.0  8.880000e+11        2.040000e+13    2017-09-02 22:29:00  136524566.0  8.880000e+11        2.030000e+13    2017-09-02 23:04:00  134845546.0  8.880000e+11        2.010000e+13    2017-09-02 23:39:00  133910638.0  8.880000e+11        1.990000e+13    2017-09-03 00:14:00  130678099.0  8.880000e+11        2.010000e+13                          estimated_transaction_volume_usd     hash_rate  \ time                                                                   2017-09-02 21:54:00                       923315359.5  7.417412e+09    2017-09-02 22:29:00                       918188066.9  7.152505e+09    2017-09-02 23:04:00                       910440915.6  7.240807e+09    2017-09-02 23:39:00                       901565929.9  7.284958e+09    2017-09-03 00:14:00                       922422228.4  7.152505e+09                          miners_revenue_btc        ...         n_blocks_mined  \ time                                           ...                           2017-09-02 21:54:00              2395.0        ...                  168.0    2017-09-02 22:29:00              2317.0        ...                  162.0    2017-09-02 23:04:00              2342.0        ...                  164.0    2017-09-02 23:39:00              2352.0        ...                  165.0    2017-09-03 00:14:00              2316.0        ...                  162.0                          n_blocks_total   n_btc_mined      n_tx  nextretarget  \ time                                                                         2017-09-02 21:54:00        483207.0  2.100000e+11  241558.0      483839.0    2017-09-02 22:29:00        483208.0  2.030000e+11  236661.0      483839.0    2017-09-02 23:04:00        483216.0  2.050000e+11  238682.0      483839.0    2017-09-02 23:39:00        483220.0  2.060000e+11  237159.0      483839.0    2017-09-03 00:14:00        483223.0  2.030000e+11  237464.0      483839.0                          total_btc_sent  total_fees_btc      totalbtc  \ time                                                                 2017-09-02 21:54:00    1.620000e+14    2.959788e+10  1.650000e+15    2017-09-02 22:29:00    1.600000e+14    2.920230e+10  1.650000e+15    2017-09-02 23:04:00    1.600000e+14    2.923498e+10  1.650000e+15    2017-09-02 23:39:00    1.580000e+14    2.899158e+10  1.650000e+15    2017-09-03 00:14:00    1.580000e+14    2.917904e+10  1.650000e+15                          trade_volume_btc  trade_volume_usd   time                                                      2017-09-02 21:54:00         102451.92       463497284.7   2017-09-02 22:29:00         102451.92       463497284.7   2017-09-02 23:04:00         102451.92       463497284.7   2017-09-02 23:39:00         102451.92       463497284.7   2017-09-03 00:14:00          96216.78       440710136.1    [5 rows x 22 columns] 

afterwards, numpy array new index has been normalized (which not date column) , of column headers removed.

can somehow normalize select columns of original data frame while keeping them in-place?

if not, how can select desired columns froms normalized numpy array , insert them original df?

try sklearn.preprocessing.scale. no need class-based scaler here.

standardize dataset along axis. center mean , component wise scale unit variance.

you can use so:

from sklearn.preprocessing import scale df = pd.dataframe({'col1' : np.random.randn(10),                    'col2' : np.arange(10, 30, 2),                    'col3' : np.arange(10)},                   index=pd.date_range('2017', periods=10))  # specify columns scale n~(0,1) to_scale = ['col2', 'col3'] df.loc[:, to_scale] = scale(df[to_scale]) print(df)                col1     col2     col3 2017-01-01 -0.28292 -1.56670 -1.56670 2017-01-02 -1.55172 -1.21854 -1.21854 2017-01-03  0.51800 -0.87039 -0.87039 2017-01-04 -1.75596 -0.52223 -0.52223 2017-01-05  1.34857 -0.17408 -0.17408 2017-01-06  0.12600  0.17408  0.17408 2017-01-07  0.21887  0.52223  0.52223 2017-01-08  0.84924  0.87039  0.87039 2017-01-09  0.32555  1.21854  1.21854 2017-01-10  0.54095  1.56670  1.56670 

to return modified copy:

new_df = df.copy() new_df.loc[:, to_scale] = scale(df[to_scale]) 

as warning: hard without seeing data, have large values (7.417412e+09). warning here, , venture it's safe ignore--it's being thrown because there's tolerance test, testing whether new mean equal 0, that's failing. see if it's failing, use new_df.mean() , new_df.std() check columns have been normalized n~(0,1).


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -