C++ Linux不尊重SCHED_FIFO优先级？（正常或GDB执行）_C++_Linux_C++11_Gdb_Pthreads

C++ Linux不尊重SCHED_FIFO优先级？（正常或GDB执行）

c++ linux c++11 gdb

C++ Linux不尊重SCHED_FIFO优先级？（正常或GDB执行）,c++,linux,c++11,gdb,pthreads,C++,Linux,C++11,Gdb,Pthreads,TL；DR 在多处理器/多核引擎上，可以在多个执行单元上调度多个RT SCHED_FIFO线程。所以优先级为60的线程和优先级为40的线程可以同时在两个不同的内核上运行这可能与直觉相反，尤其是在模拟（通常和今天一样）在单核处理器上运行并依赖严格优先级执行的嵌入式系统时请参阅本文中的摘要原始问题描述即使使用非常简单的代码，我也很难通过调度策略SCHED_FIFO使Linux尊重线程的优先级见问题末尾的MCVE 请参见修改后的MCVE 这种情况源于需要在Linux PC下模拟嵌入式代

TL；DR

在多处理器/多核引擎上，可以在多个执行单元上调度多个RT SCHED_FIFO线程。所以优先级为60的线程和优先级为40的线程可以同时在两个不同的内核上运行

这可能与直觉相反，尤其是在模拟（通常和今天一样）在单核处理器上运行并依赖严格优先级执行的嵌入式系统时

请参阅本文中的摘要

原始问题描述

即使使用非常简单的代码，我也很难通过调度策略SCHED_FIFO使Linux尊重线程的优先级

见问题末尾的MCVE
请参见修改后的MCVE

这种情况源于需要在Linux PC下模拟嵌入式代码以执行集成测试

具有fifo优先级的

main

线程

将启动线程

divisior

和

ratio

除数

线程应该获得

优先级2

，这样

比率

优先级1的线程在b获得合适的值之前不会对a/b求值（这是一个完全假设的场景，仅适用于MCVE，而不是带有信号量或条件变量的实际情况）

潜在先决条件：您需要是root或更好的用户才能设置CAP程序，以便可以更改调度策略和优先级

sudo setcap cap\u sys\u nice+ep main

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ getcap main
main = cap_sys_nice+ep

第一个实验是在Virtualbox环境下进行的，使用2个VCPU（gcc（Ubuntu 7.5.0-3ubuntu1~18.04）7.5.0、GNU gdb（Ubuntu 8.1-0ubuntu3.2）8.1.0.20180409-git）进行，代码行为在正常执行下几乎是
```
OK
```
，但在gdb下是
```
NOK
```
在本机Ubuntu 20.04上的其他实验显示，即使在I3-1005 2C/4T（gcc（Ubuntu 9.3.0-10ubuntu2）9.3.0、GNU gdb（Ubuntu 9.1-0ubuntu1）9.1的正常执行中，NOK的行为也非常频繁

基本上：

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ g++ main.cc -o main -pthread

正常执行有时正常有时不正常如果没有根或setcap

johndoe@VirtualBox:~/Code/gdb_sched_fifo$ ./main
Problem with setschedparam: Operation not permitted(1)  <<-- err msg if no root or setcap
Result: 0.333333 or Result: Inf                         <<-- 1/3 or div by 0

现在，如果您想调试这个程序，您会再次收到错误消息

(gdb) run
Starting program: /home/johndoe/Code/gdb_sched_fifo/main 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f929a6a9700 (LWP 2633)]
Problem with setschedparam: Operation not permitted(1)     <<--- ERROR MSG
Result: inf                                                <<--- DIV BY 0
[New Thread 0x7f9299ea8700 (LWP 2634)]
[Thread 0x7f929a6a9700 (LWP 2633) exited]
[Thread 0x7f9299ea8700 (LWP 2634) exited]
[Inferior 1 (process 2629) exited normally]

因此结论和问题

我以为唯一的问题来自GDB
在另一个（非虚拟）目标上的测试显示，在正常执行下，结果甚至更糟

我看到其他与RT SCHED_FIFO相关的问题没有得到尊重，但我发现答案没有或不清楚的结论。我的MCVE也更小，潜在副作用更少

评论带来了一些答案，但我仍然不相信。。。（…应该是这样工作的）

MCVE：

#include <iostream>
#include <thread>
#include <cstring>

double a = 1.0F;
double b = 0.0F;

void ratio(void)
{
    struct sched_param param;
    param.sched_priority = 1;
    int ret = pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);
        if ( 0 != ret )
    std::cout << "Problem with setschedparam: " << std::strerror(errno) << '(' << errno << ')' << "\n" << std::flush;

    std::cout << "Result: " << a/b << "\n" << std::flush;
}

void divisor(void)
{
    struct sched_param param;
    param.sched_priority = 2;
    pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);

    b = 3.0F;

    std::this_thread::sleep_for(std::chrono::milliseconds(2000u));
}


int main(int argc, char * argv[])
{
    struct sched_param param;
    param.sched_priority = 10;
    pthread_setschedparam(pthread_self(),SCHED_FIFO,&param);

    std::thread thr_ratio(ratio);
    std::thread thr_divisor(divisor);

    thr_ratio.join();
    thr_divisor.join();

    return 0;
}

#包括
#包括
#包括
双a=1.0F；
双b=0.0F；
空隙率（空隙率）
{
结构sched_param param；
参数sched_优先级=1；
int-ret=pthread_setschedparam（pthread_self（）、SCHED_FIFO和¶m）；
如果（0！=ret）
std：：cout您的MCVE有一些明显的问题：
您在b
上有一个数据竞争，即未定义的行为，因此任何事情都可能发生

您期望除数
线程在比率
线程开始计算比率之前完成pthread_setschedparam
调用
但绝对不能保证第一个线程在第二个线程创建之前不会运行到完成
事实上，这就是在GDB下可能发生的情况：它必须捕获线程创建和销毁事件，以便跟踪所有线程，因此在GDB下创建线程的速度要比在GDB之外慢得多

要解决第二个问题，请添加一个计数信号量，并在每次执行pthread\u setschedparam
调用后让两个线程randevu。
您的MCVE有一些明显的问题：
您在b
上有一个数据竞争，即未定义的行为，因此任何事情都可能发生

您期望除数
线程在比率
线程开始计算比率之前完成pthread_setschedparam
调用
但绝对不能保证第一个线程在第二个线程创建之前不会运行到完成
事实上，这就是在GDB下可能发生的情况：它必须捕获线程创建和销毁事件，以便跟踪所有线程，因此在GDB下创建线程的速度要比在GDB之外慢得多

要解决第二个问题，请添加一个计数信号量，并在每次执行pthread\u setschedparam
调用后让两个线程randevu。
我尝试了许多解决方案，但从未得到“无缺陷”代码。另请参阅本文
速率最好但不完美的代码是下面的代码，它使用传统的pthread C语言，允许从一开始就创建具有正确属性的线程
我仍然惊讶地看到，即使使用此代码，我仍然会出错（与问题MCVE相同，但使用纯pthread…API）
为了强调代码，我找到了以下顺序
$ seq 1000 | parallel ./main | grep inf
Result: inf
Result: inf
....

inf
表示错误的0除结果。在我的例子中，缺陷大约是10/1000
类似于{1..1000}；do./main；done | grep inf

中i的

命令更长
线程从较高优先级启动到较低优先级
现在是除数线

是先创建的
具有更高的RT优先级（2>1>主要停留在SCHED_或其他非RT调度）

所以我想知道为什么我仍然得到除法0
最后，我尝试减少任务集。当
$ taskset -pc 0 $$
pid 2414's current affinity list: 0,1
pid 2414's new affinity list: 0
$ for i in {1..1000}; do ./main_oss ; done   <<-- no need for parallel in this case
Result: 0.333333
Result: 0.333333
Result: 0.333333
Result: 0.333333
Result: 0.333333
...

$taskset-pc 0$$
pid 2414的当前亲和力l
$ seq 1000 | parallel ./main | grep inf
Result: inf
Result: inf
....

$ taskset -pc 0 $$
pid 2414's current affinity list: 0,1
pid 2414's new affinity list: 0
$ for i in {1..1000}; do ./main_oss ; done   <<-- no need for parallel in this case
Result: 0.333333
Result: 0.333333
Result: 0.333333
Result: 0.333333
Result: 0.333333
...

$ taskset -pc 0,1 $$
pid 2414's current affinity list: 0
pid 2414's new affinity list: 0,1
$ seq 1000 | parallel ./main_oss
Result: 0.333333          | <<-- display by group of 2
Result: 0.333333          |
Result: inf             |   <<--
Result: 0.333333        |
...

#include <iostream>
#include <thread>
#include <cstring>
#include <pthread.h>

double a = 1.0F;
double b = 0.0F;

void * ratio(void*)
{
    std::cout << "Result: " << a/b << "\n" << std::flush;
    return nullptr;
}

void * divisor(void*)
{
    b = 3.0F;
    std::this_thread::sleep_for(std::chrono::milliseconds(500u));
    return nullptr;
}


int main(int agrc, char * argv[])
{
    struct sched_param param;

    pthread_t thr[2];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setschedpolicy(&attr,SCHED_FIFO);
    pthread_attr_setinheritsched(&attr,PTHREAD_EXPLICIT_SCHED);

    param.sched_priority = 2;
    pthread_attr_setschedparam(&attr,&param);
    pthread_create(&thr[0],&attr,divisor,nullptr);

    param.sched_priority = 1;
    pthread_attr_setschedparam(&attr,&param);
    pthread_create(&thr[1],&attr,ratio,nullptr);  

    pthread_join(thr[0],nullptr);
    pthread_join(thr[1],nullptr);

    return 0;
} 

$ ./main
Result: inf

taskset -c 0 ./main
Result: 0.333333

gdb -ex 'set exec-wrapper taskset -c 0' ./main
--> mixed result depending on conditions (native/virtualized ? Number of cores ? ) 
sometimes 0.333333 sometimes inf
--> problem to set breakpoints
--> still work to do for me to summarize this issue

taskset -c 0 gdb main
...
(gdb) r
...
Result: inf

taskset -c N chrt 99 gdb main <<-- where N is a core number (*)
...                           <<-- 99 denotes here "your higher prio in your system"
(gdb) r
...
Result: 0.333333

taskset -c N chrt 99 code