Javascript 什么';在node.js转换流中处理背压的正确方法是什么? 简介
这是我在编写node.js服务器端的第一次冒险。一直都是这样 到目前为止很有趣,但我在理解正确的方法上有些困难 实现与node.js流相关的内容 问题 出于测试和学习的目的,我正在处理大文件 内容是zlib压缩的。压缩的内容是二进制数据,每个 数据包的长度为38字节。我正在尝试创建一个结果文件 除了有一个 每1024个38字节数据包的未压缩31字节报头 原始文件内容(已解压缩) 结果文件内容 正如你所看到的,这是一个翻译问题。意思是,我 将一些源流作为输入,然后稍微转换它 输入到某个输出流中。因此,实施 该类仅尝试完成以下任务:Javascript 什么';在node.js转换流中处理背压的正确方法是什么? 简介,javascript,node.js,zlib,Javascript,Node.js,Zlib,这是我在编写node.js服务器端的第一次冒险。一直都是这样 到目前为止很有趣,但我在理解正确的方法上有些困难 实现与node.js流相关的内容 问题 出于测试和学习的目的,我正在处理大文件 内容是zlib压缩的。压缩的内容是二进制数据,每个 数据包的长度为38字节。我正在尝试创建一个结果文件 除了有一个 每1024个38字节数据包的未压缩31字节报头 原始文件内容(已解压缩) 结果文件内容 正如你所看到的,这是一个翻译问题。意思是,我 将一些源流作为输入,然后稍微转换它 输入到某个输出流中。因
this.push(块)
var fs = require('fs');
var me = require('./me'); // Where my Transform stream code sits
var inp = fs.createReadStream('depth_1000000');
var out = fs.createWriteStream('depth_1000000.out');
inp.pipe(me.createMyTranslate()).pipe(out);
问题:
假设Transform是这个用例的一个好选择,我似乎是这样的
遇到可能的背压问题。我对this.push(chunk)
在\u转换中
不断返回false
。为什么会这样,怎么会这样
处理这样的事情 我认为
Transform
适用于此,但我会将充气作为管道中的一个单独步骤来执行
下面是一个未经测试的快速示例:
var zlib = require('zlib');
var stream = require('stream');
var transformer = new stream.Transform();
// Properties used to keep internal state of transformer.
transformer._buffers = [];
transformer._inputSize = 0;
transformer._targetSize = 1024 * 38;
// Dump one 'output packet'
transformer._dump = function(done) {
// concatenate buffers and convert to binary string
var buffer = Buffer.concat(this._buffers).toString('binary');
// Take first 1024 packets.
var packetBuffer = buffer.substring(0, this._targetSize);
// Keep the rest and reset counter.
this._buffers = [ new Buffer(buffer.substring(this._targetSize)) ];
this._inputSize = this._buffers[0].length;
// output header
this.push('HELLO WORLD');
// output compressed packet buffer
zlib.deflate(packetBuffer, function(err, compressed) {
// TODO: handle `err`
this.push(compressed);
if (done) {
done();
}
}.bind(this));
};
// Main transformer logic: buffer chunks and dump them once the
// target size has been met.
transformer._transform = function(chunk, encoding, done) {
this._buffers.push(chunk);
this._inputSize += chunk.length;
if (this._inputSize >= this._targetSize) {
this._dump(done);
} else {
done();
}
};
// Flush any remaining buffers.
transformer._flush = function() {
this._dump();
};
// Example:
var fs = require('fs');
fs.createReadStream('depth_1000000')
.pipe(zlib.createInflate())
.pipe(transformer)
.pipe(fs.createWriteStream('depth_1000000.out'));
如果要写入的流(在本例中为文件输出流)缓冲的数据过多,
push
将返回false。因为您正在向磁盘写入数据,所以这是有意义的:您处理数据的速度比写入数据的速度快
当
out
的缓冲区已满时,转换流将无法推送,并开始缓冲数据本身。如果该缓冲区应该填满,则inp
将开始填满。事情应该是这样的。管道流处理数据的速度只有链中最慢的链路能够处理数据的速度(一旦缓冲区已满)。2013年的这个问题是我所能找到的关于如何处理“背压”的所有问题
创建节点变换流时
从node 7.10.0和文档中我收集了什么
如果push
返回false,则在执行\u read
之前不应推送任何其他内容
打电话来
转换文档没有提到\u read
,只是提到了基本转换
类实现它(并且_write)。我发现有关push
返回false的信息
以及在文档中调用的\u read
我在Transform back pressure上找到的唯一其他权威评论只提到
这是一个问题,在节点文件顶部的注释中
以下是该评论中关于背压的部分:
// This way, back-pressure is actually determined by the reading side,
// since _read has to be called to start processing a new chunk. However,
// a pathological inflate type of transform can cause excessive buffering
// here. For example, imagine a stream where every byte of input is
// interpreted as an integer from 0-255, and then results in that many
// bytes of output. Writing the 4 bytes {ff,ff,ff,ff} would result in
// 1kb of data being output. In this case, you could write a very small
// amount of input, and end up with a very large amount of output. In
// such a pathological inflating mechanism, there'd be no way to tell
// the system to stop doing the transform. A single 4MB write could
// cause the system to run out of memory.
//
// However, even in such a pathological case, only a single written chunk
// would be consumed, and then the rest would wait (un-transformed) until
// the results of the previous transformed chunk were consumed.
解决方案示例
下面是我拼凑的解决方案,用于处理变换流中的背压
我敢肯定这是有效的。(我没有写任何真正的测试,这需要
写入可写流以控制背压。)
这是一个基本的线变换,需要像线变换一样工作,但不需要
演示如何处理“背压”
有用的调试提示
最初写这篇文章的时候,我没有意识到\u read
可以在之前被称为
\u transform
返回,所以我没有实现这个
获取以下错误:
Error: no writecb in Transform class
at afterTransform (_stream_transform.js:71:33)
at TransformState.afterTransform (_stream_transform.js:54:12)
at LineTransform._continueTransform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:44:13)
at LineTransform._transform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:46:21)
at LineTransform.Transform._read (_stream_transform.js:167:10)
at LineTransform._read (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:56:15)
at LineTransform.Transform._write (_stream_transform.js:155:12)
at doWrite (_stream_writable.js:331:12)
at writeOrBuffer (_stream_writable.js:317:5)
at LineTransform.Writable.write (_stream_writable.js:243:11)
查看节点实现时,我意识到这个错误意味着回调
给定给\u transform
被多次调用。没有多少信息
要发现这个错误,所以我想我应该在这里包括我发现的。最近遇到了类似的问题,需要处理膨胀的转换流中的背压-处理push()
返回false的秘密是注册并处理流上的'drain'
事件
_transform(data, enc, callback) {
const continueTransforming = () => {
// ... do some work / parse the data, keep state of where we're at etc
if(!this.push(event))
this._readableState.pipes.once('drain', continueTransforming); // will get called again when the reader can consume more data
if(allDone)
callback();
}
continueTransforming()
}
注意,当我们深入到内部时,pipes
甚至可以是可读的数组,但它在…pipe(transform).pipe(…
如果节点社区的人能够建议一种处理.push()
返回false的“正确”方法,那就太好了。我最终按照Ledion的例子创建了一个实用程序转换类,该类可以帮助处理背压。该实用程序添加了一个名为addData的异步方法,实现转换可以等待
'use strict';
const { Transform } = require('stream');
/**
* The BackPressureTransform class adds a utility method addData which
* allows for pushing data to the Readable, while honoring back-pressure.
*/
class BackPressureTransform extends Transform {
constructor(...args) {
super(...args);
}
/**
* Asynchronously add a chunk of data to the output, honoring back-pressure.
*
* @param {String} data
* The chunk of data to add to the output.
*
* @returns {Promise<void>}
* A Promise resolving after the data has been added.
*/
async addData(data) {
// if .push() returns false, it means that the readable buffer is full
// when this occurs, we must wait for the internal readable to emit
// the 'drain' event, signalling the readable is ready for more data
if (!this.push(data)) {
await new Promise((resolve, reject) => {
const errorHandler = error => {
this.emit('error', error);
reject();
};
const boundErrorHandler = errorHandler.bind(this);
this._readableState.pipes.on('error', boundErrorHandler);
this._readableState.pipes.once('drain', () => {
this._readableState.pipes.removeListener('error', boundErrorHandler);
resolve();
});
});
}
}
}
module.exports = {
BackPressureTransform
};
“严格使用”;
const{Transform}=require('stream');
/**
*BackPressureTransform类添加了一个实用程序方法addData,该方法
*允许将数据推送到可读位置,同时承受背压。
*/
类BackPressureTransform扩展了转换{
构造函数(…参数){
超级(…args);
}
/**
*异步地向输出中添加一块数据,以承受背压。
*
*@param{String}数据
*要添加到输出的数据块。
*
*@returns{Promise}
*添加数据后的承诺解析。
*/
异步添加数据(数据){
//如果.push()返回false,则表示可读缓冲区已满
//当这种情况发生时,我们必须等待内部可读文件发出
//“drain”事件,表示可读文件已准备好接收更多数据
如果(!this.push(数据)){
等待新的承诺((决定,拒绝)=>{
const errorHandler=错误=>{
这个.emit('error',error);
拒绝();
};
const boundErrorHandler=errorH
const fs = require('fs');
let inStrm = fs.createReadStream("testdata/largefile.txt", { encoding: "utf8" });
let lineStrm = new LineTransform({ encoding: "utf8", decodeStrings: false });
inStrm.pipe(lineStrm).pipe(process.stdout);
Error: no writecb in Transform class
at afterTransform (_stream_transform.js:71:33)
at TransformState.afterTransform (_stream_transform.js:54:12)
at LineTransform._continueTransform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:44:13)
at LineTransform._transform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:46:21)
at LineTransform.Transform._read (_stream_transform.js:167:10)
at LineTransform._read (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:56:15)
at LineTransform.Transform._write (_stream_transform.js:155:12)
at doWrite (_stream_writable.js:331:12)
at writeOrBuffer (_stream_writable.js:317:5)
at LineTransform.Writable.write (_stream_writable.js:243:11)
_transform(data, enc, callback) {
const continueTransforming = () => {
// ... do some work / parse the data, keep state of where we're at etc
if(!this.push(event))
this._readableState.pipes.once('drain', continueTransforming); // will get called again when the reader can consume more data
if(allDone)
callback();
}
continueTransforming()
}
'use strict';
const { Transform } = require('stream');
/**
* The BackPressureTransform class adds a utility method addData which
* allows for pushing data to the Readable, while honoring back-pressure.
*/
class BackPressureTransform extends Transform {
constructor(...args) {
super(...args);
}
/**
* Asynchronously add a chunk of data to the output, honoring back-pressure.
*
* @param {String} data
* The chunk of data to add to the output.
*
* @returns {Promise<void>}
* A Promise resolving after the data has been added.
*/
async addData(data) {
// if .push() returns false, it means that the readable buffer is full
// when this occurs, we must wait for the internal readable to emit
// the 'drain' event, signalling the readable is ready for more data
if (!this.push(data)) {
await new Promise((resolve, reject) => {
const errorHandler = error => {
this.emit('error', error);
reject();
};
const boundErrorHandler = errorHandler.bind(this);
this._readableState.pipes.on('error', boundErrorHandler);
this._readableState.pipes.once('drain', () => {
this._readableState.pipes.removeListener('error', boundErrorHandler);
resolve();
});
});
}
}
}
module.exports = {
BackPressureTransform
};
'use strict';
const { BackPressureTransform } = require('./back-pressure-transform');
/**
* The Formatter class accepts the transformed row to be added to the output file.
* The class provides generic support for formatting the result file.
*/
class Formatter extends BackPressureTransform {
constructor() {
super({
encoding: 'utf8',
readableObjectMode: false,
writableObjectMode: true
});
this.anyObjectsWritten = false;
}
/**
* Called when the data pipeline is complete.
*
* @param {Function} callback
* The function which is called when final processing is complete.
*
* @returns {Promise<void>}
* A Promise resolving after the flush completes.
*/
async _flush(callback) {
// if any object is added, close the surrounding array
if (this.anyObjectsWritten) {
await this.addData('\n]');
}
callback(null);
}
/**
* Given the transformed row from the ETL, format it to the desired layout.
*
* @param {Object} sourceRow
* The transformed row from the ETL.
*
* @param {String} encoding
* Ignored in object mode.
*
* @param {Function} callback
* The callback function which is called when the formatting is complete.
*
* @returns {Promise<void>}
* A Promise resolving after the row is transformed.
*/
async _transform(sourceRow, encoding, callback) {
// before the first object is added, surround the data as an array
// between each object, add a comma separator
await this.addData(this.anyObjectsWritten ? ',\n' : '[\n');
// update state
this.anyObjectsWritten = true;
// add the object to the output
const parsed = JSON.stringify(sourceRow, null, 2).split('\n');
for (const [index, row] of parsed.entries()) {
// prepend the row with 2 additional spaces since we're inside a larger array
await this.addData(` ${row}`);
// add line breaks except for the last row
if (index < parsed.length - 1) {
await this.addData('\n');
}
}
callback(null);
}
}
module.exports = {
Formatter
};
_transform(buf, enc, callback) {
// prepend any unused data from the prior chunk.
if (this.prev) {
buf = Buffer.concat([ this.prev, buf ]);
this.prev = null;
}
// will keep transforming until buf runs low on data.
if (buf.length < this.requiredData) {
this.prev = buf;
return callback();
}
var result = // do something with data...
var nextbuf = buf.slice(this.requiredData);
if (this.push(result)) {
// Continue transforming this chunk
this._transform(nextbuf, enc, callback);
}
else {
// Node is warning us to slow down (applying "backpressure")
// Temporarily override _read request to continue the transform
this._read = function() {
delete this._read;
this._transform(nextbuf, enc, callback);
};
}
}
// a transform stream is a readable/writable stream where you do
// something with the data. Sometimes it's called a "filter",
// but that's not a great name for it, since that implies a thing where
// some bits pass through, and others are simply ignored. (That would
// be a valid example of a transform, of course.)
//
// While the output is causally related to the input, it's not a
// necessarily symmetric or synchronous transformation. For example,
// a zlib stream might take multiple plain-text writes(), and then
// emit a single compressed chunk some time in the future.
//
// Here's how this works:
//
// The Transform stream has all the aspects of the readable and writable
// stream classes. When you write(chunk), that calls _write(chunk,cb)
// internally, and returns false if there's a lot of pending writes
// buffered up. When you call read(), that calls _read(n) until
// there's enough pending readable data buffered up.
//
// In a transform stream, the written data is placed in a buffer. When
// _read(n) is called, it transforms the queued up data, calling the
// buffered _write cb's as it consumes chunks. If consuming a single
// written chunk would result in multiple output chunks, then the first
// outputted bit calls the readcb, and subsequent chunks just go into
// the read buffer, and will cause it to emit 'readable' if necessary.
//
// This way, back-pressure is actually determined by the reading side,
// since _read has to be called to start processing a new chunk. However,
// a pathological inflate type of transform can cause excessive buffering
// here. For example, imagine a stream where every byte of input is
// interpreted as an integer from 0-255, and then results in that many
// bytes of output. Writing the 4 bytes {ff,ff,ff,ff} would result in
// 1kb of data being output. In this case, you could write a very small
// amount of input, and end up with a very large amount of output. In
// such a pathological inflating mechanism, there'd be no way to tell
// the system to stop doing the transform. A single 4MB write could
// cause the system to run out of memory.
//
// However, even in such a pathological case, only a single written chunk
// would be consumed, and then the rest would wait (un-transformed) until
// the results of the previous transformed chunk were consumed.