在pythonshell中执行awk_Python_Shell_Subprocess

在pythonshell中执行awk

python shell

在pythonshell中执行awk,python,shell,subprocess,Python,Shell,Subprocess,我有一个shell命令，它解析特定的内容并给出所需的输出。我需要在python中实现这一点，但是shell命令有一个新行字符“\n”，当通过python命令运行时，它不会被执行在输出日志中的许多行中，所需的行看起来像-configurationFile=/app/log/conf/the_jvm_name.4021.logback.xml 我只需要上面的jvm名称。语法将始终相同。shell命令运行良好 Shell命令- ps -ef | grep 12345 | tr " " "\n" |

我有一个

shell

命令，它解析特定的内容并给出所需的输出。我需要在python中实现这一点，但是shell命令有一个新行字符

“\n”

，当通过python命令运行时，它不会被执行

在输出日志中的许多行中，所需的行看起来像-

configurationFile=/app/log/conf/the_jvm_name.4021.logback.xml

我只需要上面的jvm名称。语法将始终相同。shell命令运行良好

Shell命令-

ps -ef | grep 12345 | tr " " "\n" | grep logback.configurationFile | awk -F"/" '{print $NF}'| cut -d. -f1

Python（转义了所有必需的双引号）——

使用python，我无法获得所需的输出。它只打印命令中的configurationFile。

我错过了什么。还有其他更好的方法可以获得这些细节吗？

您可以使用Python中的正则表达式替换实现您想要的：

output = subprocess.check_output(["ps", "-ef"])
for line in output.splitlines():
  if re.search("12345", line):
    output = re.sub(r".*configurationFile=.*/([^.]+).*", r"\1", line)

这将捕获配置文件路径中最后一个

之后的零件，直到下一个

您可以通过仅检查第二列（PID）中的

，或者通过在空白处拆分每一行，使其更加健壮：

cols = re.split("\s+", line) 
if len(cols) > 1 and cols[1] == "12345":

或者使用更好的正则表达式，例如：

if re.match(r"\S+\s+12345\s", line):

请注意，您也可以通过执行以下操作来缩短管道：

ps -ef | sed -nE '/12345/ { s/.*configurationFile=.*\/([^.]*).*/\1/; p }'

您的shell命令可以工作，但它必须处理太多的输出行和每行太多的字段。一个更简单的解决方案是告诉

ps

命令只给你一行，在那一行上，只给你一个你关心的字段。例如，在我的系统上：

ps -o cmd h 979

将输出：

/usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3

-o cmd

标志将仅输出输出的cmd列，而

参数表示命令，指示

ps

忽略标题。最后，

是进程ID，它告诉

ps

仅为此进程输出信息

这个输出与您的问题不完全相同，但非常相似。一旦我们限制了输出，就不再需要其他命令，如

grep

，

awk

。。。此时，我们可以使用正则表达式来提取我们想要的：

from __future__ import print_function
import re
import subprocess

pid = '979'
command = ['ps', '-o', 'cmd', 'h', pid]
output = subprocess.check_output(command)

pattern = re.compile(r"""
    config-file=  # Literal string search
    .+\/          # Everything up to the last forward slash
    ([^.]+)       # Non-dot chars, this is what we want
""", re.VERBOSE)

matched = pattern.search(output)

if matched:
    print(matched.group(1))

笔记

对于正则表达式，我使用了详细的表单，允许我使用注释来注释我的模式。我喜欢这种方式，因为正则表达式可能很难阅读
在您的系统上，请调整“配置文件”部分以处理您的输出

最简单的方法是使用python拆分和解析

子流程的输出，而不是依赖grep
+tr
+grep
+awk
我已经尝试过了，但它正在中断输出。我们可以使用正则表达式吗？汤姆，数字12345是动态的，它根据我从另一个命令得到的输出不断变化。这是一个进程ID。我会检查这个。tq@sdgd在这种情况下，您可以动态构建正则表达式，或者只使用将行拆分为列并直接进行比较（我更新了答案）。感谢您提供的详细答案。在正则表达式中，我们不应该也写它的结尾部分吗？更像是左边界和右边界configurationFile=/app/log/conf/the_jvm_name.4021.logback.xml
。。我需要介于configurationFile=/app/log/conf
和.4021.logback.xml
之间，因为数字4021也是动态的。因此，最终应该是从
到xml我们可以做到，但如果这是可行的，那么我们为什么要做得更多呢？
from __future__ import print_function
import re
import subprocess

pid = '979'
command = ['ps', '-o', 'cmd', 'h', pid]
output = subprocess.check_output(command)

pattern = re.compile(r"""
    config-file=  # Literal string search
    .+\/          # Everything up to the last forward slash
    ([^.]+)       # Non-dot chars, this is what we want
""", re.VERBOSE)

matched = pattern.search(output)

if matched:
    print(matched.group(1))