Python “获取错误”；应为2D数组，但改为1D数组"，拆分数据集（csv）以生成线性_Python_Jupyter Notebook

Python “获取错误”；应为2D数组，但改为1D数组"，拆分数据集（csv）以生成线性

python jupyter-notebook

Python “获取错误”；应为2D数组，但改为1D数组"，拆分数据集（csv）以生成线性,python,jupyter-notebook,Python,Jupyter Notebook,我已经像这样分割了数据集 X = [] y = [] # first, compute the number of samples in the training set: n_train = int(len(df) * 0.7) # The training set is the first n_train samples in the dataset X_train = df[: n_train] Y_train = df[: n_train] # INSERT YOUR CODE HER

我已经像这样分割了数据集

X = []
y = []
# first, compute the number of samples in the training set:
n_train = int(len(df) * 0.7)

# The training set is the first n_train samples in the dataset
X_train = df[: n_train]
Y_train = df[: n_train] # INSERT YOUR CODE HERE

# The test set is the remaining samples in the dataset
X_test = df[n_train:] 
Y_test = df[n_train:]

# Print the number of samples in the training set
print('The number of samples in the training set:')
# INSERT YOUR CODE HERE
print(len(Y_train))

# Print the number of samples in the test set
print('The number of samples in the test set:')
# INSERT YOUR CODE HERE
print(len(Y_test))

接下来，我创建了一个这样的线性模型

lr = linear_model.LinearRegression()

但当我尝试将我的列车数据与之匹配时

lr.fit(X_train, Y_train)

我得到这个错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-9d85ca185925> in <module>
      2 
      3 # INSERT YOUR CODE HERE
----> 4 lr.fit(X_train, Y_train)

~\Anaconda3\ana01\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight)
    456         n_jobs_ = self.n_jobs
    457         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 458                          y_numeric=True, multi_output=True)
    459 
    460         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

~\Anaconda3\ana01\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    754                     ensure_min_features=ensure_min_features,
    755                     warn_on_dtype=warn_on_dtype,
--> 756                     estimator=estimator)
    757     if multi_output:
    758         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\Anaconda3\ana01\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    550                     "Reshape your data either using array.reshape(-1, 1) if "
    551                     "your data has a single feature or array.reshape(1, -1) "
--> 552                     "if it contains a single sample.".format(array))
    553 
    554         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

---------------------------------------------------------------------------
ValueError回溯（最近一次调用上次）
在里面
2.
3#在此处插入代码
---->4 lr.装配（X_系列、Y_系列）
~\Anaconda3\ana01\lib\site packages\sklearn\linear\u model\base.py适合（自身、X、y、样本重量）
456 n_作业=自我n_作业
457 X，y=检查X_y（X，y，接受稀疏=['csr'，'csc'，'coo']，
-->458 y_数值=真，多输出=真）
459
460如果样本重量不是无且np.至少1d（样本重量）。ndim>1：
~\Anaconda3\ana01\lib\site packages\sklearn\utils\validation.py in check\u X\u y（X，y，accept\u sparse，accept\u large\u sparse，dtype，order，copy，force\u all\u finite，sure\u 2d，allow\u nd，multi\u output，sure\u min\u samples，sure\u minu features，y\u numeric，warn\u on\u dtype，estimator）
754确保最小功能=确保最小功能，
755 warn_on_dtype=warn_on_dtype，
-->756估算器=估算器）
757如果多输出：
758 y=检查数组（y，'csr'，强制所有有限=真，确保2d=假，
检查数组中的~\Anaconda3\ana01\lib\site packages\sklearn\utils\validation.py（数组、接受稀疏、接受大稀疏、数据类型、顺序、复制、强制所有有限、确保2d、允许nd、确保最小样本、确保最小特征、警告数据类型、估算器）
550“使用数组重塑您的数据。如果”
551“您的数据只有一个特征或数组。重塑（1，-1）”
-->552“如果它包含单个样本。”。格式（数组））
553
554#在未来，灵活的数据类型将像对象数据类型一样处理
ValueError:应为2D数组，而应为1D数组：
数组=[]。
使用数组重塑数据。如果数据具有单个特征或数组，则重塑（-1，1）。如果数据包含单个样本，则重塑（1，-1）。

数据集

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2938 entries, 0 to 2937
Data columns (total 22 columns):
Country                            2938 non-null object
Year                               2938 non-null int64
Status                             2938 non-null object
Life                               2938 non-null float64
Adult Mortality                    2938 non-null float64
infant deaths                      2938 non-null int64
Alcohol                            2938 non-null float64
percentage expenditure             2938 non-null float64
Hepatitis B                        2938 non-null float64
Measles                            2938 non-null int64
BMI                                2938 non-null float64
under-five deaths                  2938 non-null int64
Polio                              2938 non-null float64
Total expenditure                  2938 non-null float64
Diphtheria                         2938 non-null float64
HIV/AIDS                           2938 non-null float64
GDP                                2938 non-null float64
Population                         2938 non-null float64
thinness  1-19 years               2938 non-null float64
thinness 5-9 years                 2938 non-null float64
Income composition of resources    2938 non-null float64
Schooling                          2938 non-null float64
dtypes: float64(16), int64(4), object(2)
memory usage: 505.0+ KB
None


范围索引：2938个条目，0到2937
数据列（共22列）：
国家/地区2938非空对象
年份2938非空int64
状态2938非空对象
Life 2938非空浮点64
成人死亡率2938非零64
婴儿死亡2938非空int64
酒精2938非零浮动64
支出百分比2938非空浮动64
乙型肝炎2938非零64
麻疹2938非空int64
BMI 2938非空浮点64
五岁以下死亡2938非空int64
脊髓灰质炎2938非空浮点64
支出总额2938非零浮动64
白喉2938非零型64
艾滋病毒/艾滋病2938非零艾滋病毒64
GDP 2938非零浮动64
人口2938非空浮点64
瘦1-19岁2938非零漂64
苗条5-9年2938非零漂64
资源收入构成2938非零浮动64
学校教育2938非空浮动64
数据类型：float64（16）、int64（4）、object（2）
内存使用率：505.0+KB
没有一个

样本数据集

请遵循下面给出的程序

import pandas as pd
df = pd.read_csv("example.csv")

X = df.drop('Target_variable' , axis = 1)
Y = df['Target_variable']

n_train = int(len(df) * 0.7)

# The training set is the first n_train samples in the dataset
X_train = X[: n_train]
Y_train = Y[: n_train] # INSERT YOUR CODE HERE

# The test set is the remaining samples in the dataset
X_test = df[n_train:] 
Y_test = df[n_train:]

# Print the number of samples in the training set
print('The number of samples in the training set:')
# INSERT YOUR CODE HERE
print(len(Y_train))

# Print the number of samples in the test set
print('The number of samples in the test set:')
# INSERT YOUR CODE HERE
print(len(Y_test))

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, Y_train)

请遵循下面给出的程序

import pandas as pd
df = pd.read_csv("example.csv")

X = df.drop('Target_variable' , axis = 1)
Y = df['Target_variable']

n_train = int(len(df) * 0.7)

# The training set is the first n_train samples in the dataset
X_train = X[: n_train]
Y_train = Y[: n_train] # INSERT YOUR CODE HERE

# The test set is the remaining samples in the dataset
X_test = df[n_train:] 
Y_test = df[n_train:]

# Print the number of samples in the training set
print('The number of samples in the training set:')
# INSERT YOUR CODE HERE
print(len(Y_train))

# Print the number of samples in the test set
print('The number of samples in the test set:')
# INSERT YOUR CODE HERE
print(len(Y_test))

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, Y_train)

请打印X_-train和y_-train数组的形状。当我执行“np.ma.shape（X_-train）”时，我得到“（0，）”而对于y_-train，我得到“（2056，）“你只有一个功能吗？请您将示例数据集excel或csv添加到您的问题中好吗？我现在已将我的数据集添加到我的帖子中。我已添加了示例数据集请打印X_列和y_列数组的形状。当我执行“np.ma.shape（X_列）”时，我得到了“（0，）”而y_列我得到了“（2056，）“您只有一个功能吗？您能将样本数据集excel或csv添加到您的问题中吗？我现在已将我的数据集添加到我的帖子中。我已添加了样本数据集