Python新手求教:怎样把两个列向量合并成一个 n*2 的矩阵?

最近学习随即森林分类算法,碰到一个问题,试了各种互联网上的方法,都不能得到正确结果,只好在这里求助大家了.
是这样:test_lables 是测试样本二分类的真实标签,有 692 个样本,test_hat 是预测值,现在我想把这两个合并在一块,组成一个 6922 的矩阵,每个预测值对应一个真实值。源代码如下:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
#from sklearn import datasets

dataframe = pd.read_csv( “D:/Research/TuPo_sel0.Train.csv”, header = None )
train_features = dataframe.iloc[ :, 0:24]
train_lables = dataframe.iloc[:, 24]

test_data = pd.read_csv( “D:/Research/TuPo_sel0.Valid.csv”, header = None )
test_features = test_data.iloc[ :, 0:24 ]
test_lables = test_data.iloc[ :, 24 ]

dummy = DummyClassifier( strategy = ‘uniform’, random_state = 1 )
dummy.fit( train_features, train_lables )
print( “dummy_score =”, dummy.score( test_features, test_lables ) )

style = 1

if style == 1:
max_features = 19
n_estimators = 400
randomforest = RandomForestClassifier( max_features = max_features, n_estimators = n_estimators, random_state=1, n_jobs=-1 )
model = randomforest.fit( train_features, train_lables )
test_hat = model.predict( test_features )
test_hat1 = np.hstack( ( test_hat, test_lables ) )
test_hat1.reshape( -1, 2 )
print( test_hat1.shape )
print( test_hat1 )
print( “max_features =”, max_features, “; n_estimators =”, n_estimators,
“; randomforest_score =”, randomforest.score( test_features, test_lables ) )

运算结果如下:
runfile(‘D:/Python Programs/TryLoadData.py’, wdir=‘D:/Python Programs’)
dummy_score = 0.5447976878612717
(1384,)
[0 0 1 … 0 0 0]
max_features = 19 ; n_estimators = 400 ; randomforest_score = 0.6416184971098265

求教各位怎么修改才能得到正确结果?
Python新手求教:怎样把两个列向量合并成一个 n
2 的矩阵?


5 回复

另外再顺便问一下:怎样计算测试集中的预测精度,即所有预测为 1 的样本的预测正确率。


import numpy as np

假设你有两个列向量

col1 = np.array([1, 2, 3]) col2 = np.array([4, 5, 6])

方法1:使用np.column_stack() - 最直观

matrix1 = np.column_stack((col1, col2))

方法2:使用np.hstack() - 水平堆叠

matrix2 = np.hstack((col1.reshape(-1, 1), col2.reshape(-1, 1)))

方法3:使用np.concatenate()

matrix3 = np.concatenate((col1.reshape(-1, 1), col2.reshape(-1, 1)), axis=1)

方法4:先转置再转置(如果向量是行向量形式)

假设向量是行向量

row1 = np.array([[1, 2, 3]]) row2 = np.array([[4, 5, 6]]) matrix4 = np.vstack((row1, row2)).T

print(“方法1结果:”) print(matrix1) print("\n形状:", matrix1.shape)

验证所有方法结果相同

print("\n所有方法结果是否一致:", np.array_equal(matrix1, matrix2) and np.array_equal(matrix2, matrix3) and np.array_equal(matrix3, matrix4))

推荐用np.column_stack(),最符合直觉。

test_hat1 = np.hstack((test_hat.reshape(-1, 1), test_lables.reshape(-1, 1)))

查看训练结果可以看简报,metrics.classification_report

谢谢 enenaaa,搞定!

np.vstack([a, b]).T

回到顶部