C++ 如果不执行协程，pthread seg fault之间的共享Lua状态_C++_Multithreading_Lua

C++ 如果不执行协程，pthread seg fault之间的共享Lua状态

c++ multithreading lua

C++ 如果不执行协程，pthread seg fault之间的共享Lua状态,c++,multithreading,lua,C++,Multithreading,Lua,首先，我知道我的问题看起来很熟悉，但实际上我并不是在问为什么在不同的pthread之间共享lua状态时会发生seg错误。我实际上是在问，为什么他们在下面描述的特定情况下不分离故障。我试着把它组织得尽可能好，但我意识到它很长。很抱歉。一点背景：我正在编写一个程序，它使用Lua解释器作为用户执行指令的基础，并使用根库（）显示图形、直方图等。。。所有这些都可以正常工作，但我尝试实现一种方法，让用户在启动后台任务的同时保持在Lua提示符中输入命令的能力，在任务完成时能够完全执行其他操作，或者请求

首先，我知道我的问题看起来很熟悉，但实际上我并不是在问为什么在不同的pthread之间共享lua状态时会发生seg错误。我实际上是在问，为什么他们在下面描述的特定情况下不分离故障。我试着把它组织得尽可能好，但我意识到它很长。很抱歉。一点背景：我正在编写一个程序，它使用Lua解释器作为用户执行指令的基础，并使用根库（）显示图形、直方图等。。。所有这些都可以正常工作，但我尝试实现一种方法，让用户在启动后台任务的同时保持在Lua提示符中输入命令的能力，在任务完成时能够完全执行其他操作，或者请求停止任务。我的第一次尝试是：首先在Lua端，我加载一些helper函数并初始化全局变量

-- Lua script
RootTasks = {}
NextTaskToStart = nil

function SetupNewTask(taskname, fn, ...)
  local task = function(...)
      local rets = table.pack(fn(...))

      RootTasks[taskname].status = "done"

      return table.unpack(rets)
    end

  RootTasks[taskname] = {
    task = SetupNewTask_C(task, ...),
    status = "waiting",
  }

  NextTaskToStart = taskname
end

然后在C端

// inside the C++ script
int SetupNewTask_C ( lua_State* L )
{
    // just a function to check if the argument is valid
    if ( !CheckLuaArgs ( L, 1, true, "SetupNewTask_C", LUA_TFUNCTION ) ) return 0;

    int nvals = lua_gettop ( L );

    lua_newtable ( L );

    for ( int i = 0; i < nvals; i++ )
    {
        lua_pushvalue ( L, 1 );
        lua_remove ( L, 1 );
        lua_seti ( L, -2, i+1 );
    }

    return 1;
}

// In the C++ script
// lua, called below, is a pointer to the lua_State 
// created when starting the Lua interpreter

void* NewTaskFn ( void* arg )
{
    // helper function to get global fields from 
    // strings like "something.field.subfield"
    // Retrieve the name of the task to be started (has been pushed as 
    // a global variable by previous call to SetupNewTask_C)
    TryGetGlobalField ( lua, "NextTaskToStart" );

    if ( lua_type ( lua, -1 ) != LUA_TSTRING )
    {
        cerr << "Next task to schedule is undetermined..." << endl;
        return nullptr;
    }

    string nextTask = lua_tostring ( lua, -1 );
    lua_pop ( lua, 1 );

    // Now we get the actual table with the function to execute 
    // and the arguments
    TryGetGlobalField ( lua, ( string ) ( "RootTasks."+nextTask ) );

    if ( lua_type ( lua, -1 ) != LUA_TTABLE )
    {
        cerr << "This task does not exists or has an invalid format..." << endl;
        return nullptr;
    }

    // The field "task" from the previous table contains the 
    // function and arguments
    lua_getfield ( lua, -1, "task" );

    if ( lua_type ( lua, -1 ) != LUA_TTABLE )
    {
        cerr << "This task has an invalid format..." << endl;
        return nullptr;
    }

    lua_remove ( lua, -2 );

    int taskStackPos = lua_gettop ( lua );

    // The first element of the table we retrieved is the function so the
    // number of arguments for that function is the table length - 1
    int nargs = lua_rawlen ( lua, -1 ) - 1;

    // That will be the function
    lua_geti ( lua, taskStackPos, 1 );

    // And the arguments...
    for ( int i = 0; i < nargs; i++ )
    {
        lua_geti ( lua, taskStackPos, i+2 );
    }

    lua_remove ( lua, taskStackPos );

    // I just reset the global variable NextTaskToStart as we are 
    // about to start the scheduled one.
    lua_pushnil ( lua );
    TrySetGlobalField ( lua, "NextTaskToStart" );

    // Let's go!
    lua_pcall ( lua, nargs, LUA_MULTRET, 0 );
}

int StartNewTask_C ( lua_State* L )
{
    pthread_t newTask;

    pthread_create ( &newTask, nullptr, NewTaskFn, nullptr );

    return 0;
}

在C端

// inside the C++ script
int SetupNewTask_C ( lua_State* L )
{
    // just a function to check if the argument is valid
    if ( !CheckLuaArgs ( L, 1, true, "SetupNewTask_C", LUA_TFUNCTION ) ) return 0;

    int nvals = lua_gettop ( L );

    lua_newtable ( L );

    for ( int i = 0; i < nvals; i++ )
    {
        lua_pushvalue ( L, 1 );
        lua_remove ( L, 1 );
        lua_seti ( L, -2, i+1 );
    }

    return 1;
}

// In the C++ script
// lua, called below, is a pointer to the lua_State 
// created when starting the Lua interpreter

void* NewTaskFn ( void* arg )
{
    // helper function to get global fields from 
    // strings like "something.field.subfield"
    // Retrieve the name of the task to be started (has been pushed as 
    // a global variable by previous call to SetupNewTask_C)
    TryGetGlobalField ( lua, "NextTaskToStart" );

    if ( lua_type ( lua, -1 ) != LUA_TSTRING )
    {
        cerr << "Next task to schedule is undetermined..." << endl;
        return nullptr;
    }

    string nextTask = lua_tostring ( lua, -1 );
    lua_pop ( lua, 1 );

    // Now we get the actual table with the function to execute 
    // and the arguments
    TryGetGlobalField ( lua, ( string ) ( "RootTasks."+nextTask ) );

    if ( lua_type ( lua, -1 ) != LUA_TTABLE )
    {
        cerr << "This task does not exists or has an invalid format..." << endl;
        return nullptr;
    }

    // The field "task" from the previous table contains the 
    // function and arguments
    lua_getfield ( lua, -1, "task" );

    if ( lua_type ( lua, -1 ) != LUA_TTABLE )
    {
        cerr << "This task has an invalid format..." << endl;
        return nullptr;
    }

    lua_remove ( lua, -2 );

    int taskStackPos = lua_gettop ( lua );

    // The first element of the table we retrieved is the function so the
    // number of arguments for that function is the table length - 1
    int nargs = lua_rawlen ( lua, -1 ) - 1;

    // That will be the function
    lua_geti ( lua, taskStackPos, 1 );

    // And the arguments...
    for ( int i = 0; i < nargs; i++ )
    {
        lua_geti ( lua, taskStackPos, i+2 );
    }

    lua_remove ( lua, taskStackPos );

    // I just reset the global variable NextTaskToStart as we are 
    // about to start the scheduled one.
    lua_pushnil ( lua );
    TrySetGlobalField ( lua, "NextTaskToStart" );

    // Let's go!
    lua_pcall ( lua, nargs, LUA_MULTRET, 0 );
}

int StartNewTask_C ( lua_State* L )
{
    pthread_t newTask;

    pthread_create ( &newTask, nullptr, NewTaskFn, nullptr );

    return 0;
}

将在接下来的10秒内每秒打印一张“Hello”。然后它将从执行中返回，一切都是美好的。现在，如果我在任务运行时按ENTER键，程序会在可怕的seg故障中死掉（我不在这里复制，因为每次它出现seg故障时，错误日志都是不同的，有时根本没有错误）。所以我在网上读了一点可能的事情，我发现有几个人提到lua_状态不是线程安全的。我真的不明白为什么只要按回车键就会把它翻出来，但这不是重点

我意外地发现，只要稍加修改，这种方法就可以在没有任何seg故障的情况下工作。如果执行了一个协同程序，那么我上面写的一切都可以正常工作，而不是直接运行函数

将以前的Lua侧功能设置NewTask替换为

function SetupNewTask(taskname, fn, ...)
  local task = coroutine.create( function(...)
      local rets = table.pack(fn(...))

      RootTasks[taskname].status = "done"

      return table.unpack(rets)
    end)

  local taskfn = function(...)
    coroutine.resume(task, ...)
  end

  RootTasks[taskname] = {
    task = SetupNewTask_C(taskfn, ...),
    routine = task,
    status = "waiting",
  }

  NextTaskToStart = taskname
end

我可以在较长时间内一次执行多个任务，而不会出现任何seg故障。所以我们最后来问我的问题：为什么使用协同程序有效？这种情况的根本区别是什么？我只是打电话给coroutine.resume，我不做任何收益（或任何其他重要的事情）。那就等着协同程序完成，就这样了。 Corroutine正在做一些我不怀疑的事情吗？

似乎什么都没坏并不意味着它真的能工作，所以

盒子里有什么？（这就是协同程序。）

一个

lua\u状态

存储这个协同程序的状态——最重要的是它的堆栈、列表、指向

全局\u状态的指针以及一堆其他东西
如果您点击回车键，解释器将尝试运行您键入的代码。（一个空行也是一个程序。）这涉及到把它放在Lua堆栈上，调用一些函数等等。如果你的代码运行在一个不同的操作系统线程中，而这个线程也使用相同的Lua堆栈/状态……那么，我想这很清楚为什么会中断，对吗？（问题的一部分是缓存“不”/“不应该”更改的内容（但会更改，因为另一个线程也在处理它）。两个线程都在同一堆栈上推/弹出内容，并相互踩在对方的脚上。如果您想深入研究代码，这可能是一个很好的起点。）
因此，现在您正在使用两个不同的协同程序，所有明显的问题源都消失了。现在它工作了，对吧…？不，因为协同程序共享状态
这个
这就是“注册表”、字符串缓存和所有与垃圾收集相关的东西所在的位置。虽然您已经消除了主要的“高频”错误源（堆栈处理），但许多其他“低频”错误源仍然存在。其中一些的简要（非详尽！）列表：

您可以通过任何分配触发垃圾收集步骤，然后运行GC一段时间，使用其共享结构。虽然分配通常不会触发GC，但控制GC的GCdebt
计数器是全局状态的一部分，因此一旦超过阈值，同时在多个线程上的分配很有可能同时在多个线程上启动GC。（如果发生这种情况，它几乎肯定会猛烈爆发。）任何分配手段，包括

创建表、协程、用户数据等
连接字符串，从文件读取，tostring（）
调用函数（！）（如果需要增加堆栈或分配新的CallInfo
slot）
等等

（重新）设置对象的元表可能会修改GC结构。（如果元表具有\uuuu gc
或\uuu mode
，则会将其添加到列表中。）
向表中添加新字段，这可能会触发调整大小。如果您在调整大小期间也从另一个线程访问它（即使只是读取现有字段），那么…*boom*。（或者不是boom，因为虽然数据可能已经移动到了另一个区域，但它之前所在的内存可能仍然可以访问。因此它可能“工作”或者只会导致无声损坏。）
即使停止了GC，创建新字符串也是不安全的，因为它可能会修改字符串缓存

然后可能还有很多其他的事情
让它失败
为了好玩，您可以重新构建Lua和定义HARDSTACKTESTS
和HARDMEMTESTS
（例如在luaconf.h
的最顶端）。这将启用一些代码来重新分配堆栈，并在许多地方运行完整的GC循环。（对我来说，在弹出提示之前，它会重新分配260个堆栈和235个集合。只需点击return（运行一个空程序）就会执行13个堆栈重新分配和6个集合。）运行您的程序，如果它似乎在启用该选项的情况下工作，可能会使它崩溃……或者可能不会
为什么它可能仍然“有效”
例如，在Lua解释器中调用
> StartNewTask("PeriodicPrint", function(str) for i=1,10 print(str);
>> sleep(1); end end, "Hello")

StartNewTask("PeriodicPrint", function(str)
  for i=1,10  print(str); sleep(1);  end
end, "Hello")

将在接下来的10秒内生成一个pr