Python/NiFi:ExecuteScript Python，用于将UTF-16文本文件转换为UTF-8_Python_Utf 8_Apache Nifi

Python/NiFi:ExecuteScript Python，用于将UTF-16文本文件转换为UTF-8

python utf-8 apache-nifi

Python/NiFi:ExecuteScript Python，用于将UTF-16文本文件转换为UTF-8,python,utf-8,apache-nifi,Python,Utf 8,Apache Nifi,我有我的ExecuteScript处理器，我正在尝试将任何通过的文件转换为utf-8，如果它们最初是utf-16 迄今为止： flowFileList = session.get(100) if not flowFileList.isEmpty(): for flowFile in flowFileList: # Process each FlowFile here: flowFileList.decode("utf-16").encode("utf-8") 我觉得这

我有我的

ExecuteScript

处理器，我正在尝试将任何通过的文件转换为utf-8，如果它们最初是utf-16

迄今为止：

flowFileList = session.get(100)
if not flowFileList.isEmpty():
  for flowFile in flowFileList: 
     # Process each FlowFile here:
     flowFileList.decode("utf-16").encode("utf-8")

我觉得这应该是一个相当简单的操作，正如这些答案所定义的：，和

这会引发一个错误，“对象中没有“decode”属性”

如果这是一个愚蠢的问题，请随便说。谢谢

NiFi ExecuteScript的Cookbook:

问题在于您正在对flowfileList对象而不是单个FlowFile调用

decode

此外，您需要实际访问flowfile内容，然后使用新编码设置内容。现在，您将flowfile对象视为一个字符串，但它不是。我不在电脑旁，但稍后会有工作示例代码

更新

我将提供可运行的Python代码来演示这一点，但是为什么不能直接使用处理器呢？它接受输入字符集和输出字符集

下面是将传入的流文件内容从UTF-16转换为UTF-8的工作代码。您应该尝试筛选已有的UTF-8内容以跳过此处理器，或者添加代码以识别它，而不进行任何处理。对于相同的行为，您可能也对以下内容感兴趣

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_16)
        outputStream.write(bytearray(text.encode('utf-8')))
# end class

flowFileList = session.get(100)
if not flowFileList.isEmpty():
    for flowFile in flowFileList:
        flowFile = session.write(flowFile, PyStreamCallback())
        flowFile = session.putAttribute(flowFile, 'script_character_set', 'UTF-8')
        session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end

问题是您正在对flowfileList对象而不是单个flowfiles调用

decode

更新

我将提供可运行的Python代码来演示这一点，但是为什么不能直接使用处理器呢？它接受输入字符集和输出字符集

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_16)
        outputStream.write(bytearray(text.encode('utf-8')))
# end class

flowFileList = session.get(100)
if not flowFileList.isEmpty():
    for flowFile in flowFileList:
        flowFile = session.write(flowFile, PyStreamCallback())
        flowFile = session.putAttribute(flowFile, 'script_character_set', 'UTF-8')
        session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end

不幸的是，我对Python一无所知。我感谢你的帮助，这是一个很好的学习机会。我将测试明天的长话短说，如果您知道什么传入内容是UTF-16，什么不是UTF-16，只需将UTF-16路由到配置了显式输入和输出字符集的

ConvertCharacterSet

处理器。如果不这样做，则必须使用代码确定字符集，然后使用上面的代码有选择地对其进行转换。要回答为什么

ConvertCharacterSet

不起作用，它返回的内容完全超出了pale，因此

ExecuteScript

它在for循环的第18行抛出了一个错误，

flowfile=session.write（flowfile，PyStreamCallback（）

，表示无法将

TypeError:write（）：第一个参数配置为byte[]

。这门课有什么关系吗？我觉得很有趣，我删除了text.encode之前的

bytearray

，然后将文件移动了。但是，就像

ConvertCharacterSet

，它返回了随机的中文字符。不幸的是，我对Python一无所知。我感谢你的帮助，这是一个很好的例子在学习机会。我将测试明天的长话短说，如果你知道什么传入内容是UTF-16，什么不是UTF-16，只需将UTF-16路由到配置了显式输入和输出字符集的

ConvertCharacterSet

处理器。如果你不知道，你必须使用代码确定字符集，然后有选择地转换它使用上面的代码。为了回答为什么

ConvertCharacterSet

不起作用，它返回的内容完全超出了pale，因此

ExecuteScript

它在for循环的第18行抛出了一个错误，

flowfile=session.write（flowfile，PyStreamCallback（）

，表示

TypeError:write（）：1st arg无法配置为byte[]

。与该类有关吗？我觉得很有趣，我删除了text.encode之前的

bytearray

，并移动了文件。但是，与

ConvertCharacterSet

一样，它返回了随机的中文字符