Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 与cuDF中的pd.Series.str.slice()和pd.Series.apply()等效_Python_Pandas_Series_Rapids_Cudf - Fatal编程技术网

Python 与cuDF中的pd.Series.str.slice()和pd.Series.apply()等效

Python 与cuDF中的pd.Series.str.slice()和pd.Series.apply()等效,python,pandas,series,rapids,cudf,Python,Pandas,Series,Rapids,Cudf,我想将以下在pandas中运行的代码转换为在cuDF中运行的代码 正在操作的序列头的样本数据插入第三个代码单元下的OG代码中-应该能够复制/粘贴运行 熊猫的原始代码 被操纵的数据 代码已调整为从此示例数据开始 下面是使用上面提供的数据而不是整个数据帧时代码的外观 根据尝试转换时遇到的错误,此问题属于系列级别,因此将下面的单元格转换为在cuDF中执行应该可以解决此问题 import pandas as pd # series of values from df_train.['rawcensus

我想将以下在pandas中运行的代码转换为在cuDF中运行的代码

正在操作的序列头的样本数据插入第三个代码单元下的OG代码中-应该能够复制/粘贴运行

熊猫的原始代码 被操纵的数据 代码已调整为从此示例数据开始 下面是使用上面提供的数据而不是整个数据帧时代码的外观

根据尝试转换时遇到的错误,此问题属于系列级别,因此将下面的单元格转换为在cuDF中执行应该可以解决此问题

import pandas as pd

# series of values from df_train.['rawcensustractandblock'].head()
data = pd.Series([60371066.461001, 60590524.222024, 60374638.00300401, 
                  60372963.002002, 60590423.381006])

# how the first line looks using the series
s_rawcensustractandblock = data.apply(lambda x: str(x))

# adjust/set new tract number 
census_tractnumber = s_rawcensustractandblock.str.slice(4,11)

# adjust block number
block_number = s_rawcensustractandblock.str.slice(start=11)
block_number = block_number.apply(lambda x: x[:4]+'.'+x[4:]+'0' )
block_number = block_number.apply(lambda x: int(round(float(x),0)) )
block_number = block_number.apply(lambda x: str(x).ljust(4,'0') )
预期的产出变化 df_列车['census_TrackNumber']车长

# out
0    1066.46
1    0524.22
2    4638.00
3    2963.00
4    0423.38
Name: census_tractnumber, dtype: object
0    1001
1    2024
2    3004
3    2002
4    1006
Name: block_number, dtype: object
df_列车[‘闭塞编号’]车头

# out
0    1066.46
1    0524.22
2    4638.00
3    2963.00
4    0423.38
Name: census_tractnumber, dtype: object
0    1001
1    2024
2    3004
3    2002
4    1006
Name: block_number, dtype: object
for循环解决方案 熊猫原始代码

cuDF解决方案代码


您可以通过nvStrings使用cuDF字符串方法来完成几乎所有您尝试执行的操作。在cuDF中将这些浮点数转换为字符串可能会丢失一些精度,尽管在上面的示例中这可能无关紧要,所以对于这个示例,我只是事先进行了转换。如果可能的话,我建议首先将rawcensustractandblock创建为字符串列,而不是浮点列

导入cudf 作为pd进口熊猫 ​ gdata=cudf.from_pandaspd_data.astype'str' ​ tractnumber=gdata.str.4,11 blocknumber=gdata.str.11 blocknumber=blocknumber.str.slice0,4.str.catblocknumber.str.slice4'.' blocknumber=blocknumber.astype'float'.round0.astype'int' blocknumber=blocknumber.aType'str'。str.ljust4,'0' ​ 行号 0 1066.46 1 0524.22 2 4638.00 3 2963.00 4 0423.38 数据类型:对象 区块编号 0 1001 1 2024 2 3004 3 2002 4 1006 数据类型:对象
谢谢你,尼克。刚刚测试过,gdata=cudf.Seriescudf_data.values_to_string也可以工作。
import pandas as pd

# data from df_train.rawcensustractandblock.head()
pd_data = pd.Series([60371066.461001, 60590524.222024, 60374638.00300401, 
                     60372963.002002, 60590423.381006])

# using series instead of dataframe
pd_raw_block = pd_data.apply(lambda x: str(x))

# adjust/set new tract number 
pd_tractnumber = pd_raw_block.str.slice(4,11)

# set/adjust block number
pd_block_number = pd_raw_block.str.slice(11)
pd_block_number = pd_block_number.apply(lambda x: x[:4]+'.'+x[4:]+'0')
pd_block_number = pd_block_number.apply(lambda x: int(round(float(x),0)))
pd_block_number = pd_block_number.apply(lambda x: str(x).ljust(4,'0'))


# print(list(pd_tractnumber))
# print(list(pd_block_number))
import cudf

# data from df_train.rawcensustractandblock.head()
cudf_data = cudf.Series([60371066.461001, 60590524.222024, 60374638.00300401, 
                         60372963.002002, 60590423.381006])

# using series instead of dataframe
cudf_tractnumber = cudf_data.values_to_string()
# adjust/set new tract number
for i in range(len(cudf_tractnumber)):
  funct = slice(4,11)
  cudf_tractnumber[i] = cudf_tractnumber[i][funct]

# using series instead of dataframe
cudf_block_number = cudf_data.values_to_string()
# set/adjust block number
for i in range(len(cudf_block_number)):
  funct = slice(11, None)
  cudf_block_number[i] = cudf_block_number[i][funct]
  cudf_block_number[i] = cudf_block_number[i][:4]+'.'+cudf_block_number[i][4:]+'0'
  cudf_block_number[i] = int(round(float(cudf_block_number[i]), 0))
  cudf_block_number[i] = str(cudf_block_number[i]).ljust(4,'0')


# print(cudf_tractnumber)
# print(cudf_block_number)