Python 将时间戳的numpy数组格式化为串联字符串_Python_Pandas_String_Numpy

Python 将时间戳的numpy数组格式化为串联字符串

python pandas string numpy

Python 将时间戳的numpy数组格式化为串联字符串,python,pandas,string,numpy,Python,Pandas,String,Numpy,我有一组unix时间戳： d = {'timestamp': [1551675611, 1551676489, 1551676511, 1551676533, 1551676554]} df = pd.DataFrame(data=d) timestamps = df[['timestamp']].values 我想将其格式化为串联字符串，如下所示： '1551675611;1551676489;1551676511;1551676533;1551676554' 到目前为止，我已经准备好：

我有一组unix时间戳：

d = {'timestamp': [1551675611, 1551676489, 1551676511, 1551676533, 1551676554]}
df = pd.DataFrame(data=d)
timestamps = df[['timestamp']].values

我想将其格式化为串联字符串，如下所示：

'1551675611;1551676489;1551676511;1551676533;1551676554'

到目前为止，我已经准备好：

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp:f}" for timestamp in timestamps])
    return timestamps

运行：

format_timestamps(timestamps)

给出以下错误：

TypeError: unsupported format string passed to numpy.ndarray.__format__

由于我是python新手，我很难理解如何修复错误

对您的代码进行快速修复：

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp[0]}" for timestamp in timestamps])
    return timestamps

这里我只用

timestamp[0]

替换了

timestamp:f

，所以每个时间戳都是一个标量而不是一个数组

这是因为在您的列表理解中，

timestamp

是一个

numpy.ndarray

对象。只需先展平并转换为字符串：

>>> ";".join(timestamps.flatten().astype(str))
'1551675611;1551676489;1551676511;1551676533;1551676554'

既然你有熊猫，为什么不考虑一个大的解决方案，用<代码> STR.CAT/<代码>：

df['timestamp'].astype(str).str.cat(sep=';')
# '1551675611;1551676489;1551676511;1551676533;1551676554'

如果可能出现NAN或无效数据，您可以使用

pd.to\u numeric

：

(pd.to_numeric(df['timestamp'], errors='coerce')
   .dropna()
   .astype(int)
   .astype(str)
   .str.cat(sep=';'))
# '1551675611;1551676489;1551676511;1551676533;1551676554'

另一个想法是迭代时间戳列表并加入：

';'.join([f'{t}' for t in  df['timestamp'].tolist()])
# '1551675611;1551676489;1551676511;1551676533;1551676554'

为什么会出错？之所以出现此错误，是因为您如何使用以下行提取

'timestamp'

列值：

timestamp=df['timestamp']]。值

访问传递列名列表的DataFrame列值（如此处所示）将返回多维ndarray，顶级ndarray对象包含DataFrame中每行列出的每个列名的值。这种方法通常仅在按名称选择多个列时有用

函数正在抛出错误，因为此处的每个

时间戳都是：
“；”.join（[f”{timestamp:f}表示时间戳中的时间戳]）

当时间戳
在原始帖子中定义时，是一个包含单个值的数据数组，其中str
值是理想的/预期的
对错误的解释
要纠正代码中的此错误，只需替换：
timestamp=df['timestamp']]。值

与：
timestamp=df['timestamp'].值

通过传递单个str
来从数据帧中提取单个列，时间戳
在这里将被定义为一个一维数组，其中存储的每一行的'timestamp'
列值将无误地传递原始格式时间戳

格式化\u时间戳
使用上述方法运行format\u timestamps（timestamps）
，您最初的format\u timestamps实现将返回：
'1551675611.000000;1551676489.000000;1551676511.000000;1551676533.000000;1551676554.000000'

这是更好的（至少没有错误），但仍然不是你想要的。这个问题的根源在于，当加入timestamp
值时，您将f
作为格式说明符传递，这将把每个值格式化为float
，而实际上您希望将每个值格式化为int
（格式说明符d
）
您可以在函数定义中将格式说明符从f
更改为d

def格式\u时间戳（时间戳：np.array）->str:
timestamp=“；”.join（[f”{timestamp:d}表示时间戳中的时间戳]）
返回时间戳

或者干脆不传递格式说明符-因为时间戳
值已经是numpy.int64
类型
def格式\u时间戳（时间戳：np.array）->str:
timestamp=“；”.join（[f”{timestamp}表示时间戳中的时间戳]）
返回时间戳

使用上述任一定义运行format\u timestamps（timestamps）
将返回您想要的内容：
'1551675611;1551676489;1551676511;1551676533;1551676554'

将{timestamp:f}替换为{timestamp[0]}，行吗？.str.cat
啊，永远忘了那个家伙。