是否可以使用java将字符串时间戳转换为float或datetime_Java_Python_Pandas_Scikit Learn_Random Forest

是否可以使用java将字符串时间戳转换为float或datetime

java python pandas scikit-learn

是否可以使用java将字符串时间戳转换为float或datetime,java,python,pandas,scikit-learn,random-forest,Java,Python,Pandas,Scikit Learn,Random Forest,我正在编写一个java代码，它可以生成带有时间戳的从1到1000的随机数。我已经用以下源代码表示了时间戳 DateFormat DateFormat=newsimpledateformat（“yyyy/MM/dd HH:MM:sss”）；日期=新日期（）；字符串a=dateFormat.format（日期）；系统输出打印项次（a）；我能够将数据存储为.txt文件，其中包含1000个随机数及其相应的时间戳当我尝试使用pandas dataframe在python中加载特定的.txt文件

我正在编写一个java代码，它可以生成带有时间戳的从1到1000的随机数。我已经用以下源代码表示了时间戳

DateFormat DateFormat=newsimpledateformat（“yyyy/MM/dd HH:MM:sss”）；
日期=新日期（）；
字符串a=dateFormat.format（日期）；
系统输出打印项次（a）；

我能够将数据存储为.txt文件，其中包含1000个随机数及其相应的时间戳当我尝试使用pandas dataframe在python中加载特定的.txt文件时。文件已成功加载，并使用dataframe显示，如下所示

    HR   Age    RR  SPo2    Temperature     Timestamp
0   89   70     15  100     36  2020/09/22 12:46:009
1   130  27     15  96      37  2020/09/22 12:46:009
2   93   47     13  100     36  2020/09/22 12:46:009
3   116  53     15  98      36  2020/09/22 12:46:009
4   100  63     14  98      36  2020/09/22 12:46:009

之后，我尝试在训练/测试分割后拟合一个随机林：

从sklearn.model\u选择导入列车\u测试\u分割
X_系列，X_测试，y_系列，y_测试=系列测试分割（X，y，测试尺寸=0.2）
从sklearn.employ导入随机林分类器
分类器=RandomForestClassifier（n_估计器=100，准则=gini，随机状态=1，最大深度=3）
分类器。配合（X_系列，y_系列）

但我有一个错误：

ValueError                                Traceback (most recent call last)
<ipython-input-52-8f779aefd162> in <module>
     20 #Create a Gaussian Classifier
     21 classifier=RandomForestClassifier(n_estimators=100, criterion='gini', random_state=1, max_depth=3)
---> 22 classifier.fit(X_train,y_train)
     23 
     24 #y_pred=classifier.predict(X_test)

~/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
    293         """
    294         # Validate or convert input data
--> 295         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
    296         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
    297         if sample_weight is not None:

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    529                     array = array.astype(dtype, casting="unsafe", copy=False)
    530                 else:
--> 531                     array = np.asarray(array, order=order, dtype=dtype)
    532             except ComplexWarning:
    533                 raise ValueError("Complex data not supported\n"

~/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: '2020/09/22 12:46:009'

ValueError回溯（最近一次调用）
在里面
20#创建高斯分类器
21分类器=RandomForestClassifier（n_估计器=100，准则=gini，随机状态=1，最大深度=3）
--->22.装配（X_系列、y_系列）
23
24#y#u pred=分类器.预测（X#u检验）
~/anaconda3/lib/python3.7/site-packages/sklearn/employee//u forest.py适合（自身、X、y、样本重量）
293         """
294#验证或转换输入数据
-->295 X=检查数组（X，接受sparse=“csc”，dtype=dtype）
296 y=检查数组（y，接受sparse='csc'，确保2d=False，dtype=None）
297如果样品重量不是无：
检查数组中的~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py（数组、接受稀疏、接受大稀疏、数据类型、顺序、复制、强制所有有限、确保2d、允许nd、确保最小样本、确保最小特征、警告数据类型、估计器）
529 array=array.astype（dtype，casting=“不安全”，copy=False）
530其他：
-->531数组=np.asarray（数组，顺序=order，dtype=dtype）
532除复杂警告外：
533 raise VALUERROR（“不支持复杂数据\n”
~/anaconda3/lib/python3.7/site-packages/numpy/core//\u asarray.py in asarray（a，数据类型，顺序）
83
84     """
--->85返回数组（a，数据类型，副本=False，顺序=order）
86
87
ValueError:无法将字符串转换为浮点：“2020/09/22 12:46:009”

我对此感到困惑。有谁能帮我摆脱这个问题吗？

这里的问题是，你必须将分类数据（日期）编码为数字数据，因为分类器不能处理你的日期，但需要数字

在将数据传递到分类器之前，可以使用from sklearn处理所有日期

但如前所述，保持日期的周期性是有益的：

你想保持你输入的周期性。一种方法将datetime变量分为四个变量：年、月、日、，然后，将这些变量（年除外）分解为两个

您可以为这三个变量中的每一个创建正弦和余弦方面（即月、日、小时），将保留24小时为更接近0小时而不是21小时，第12个月更接近月份比第10个月晚1个月

因此，本质上，您需要考虑如何将日期时间转换为数字，以便分类器可以使用它