如何使用python3提取示例前面的文本？_Python_Regex_Python 3.x_Text Processing_Python Textprocessing

如何使用python3提取示例前面的文本？

python regex python-3.x

如何使用python3提取示例前面的文本？,python,regex,python-3.x,text-processing,python-textprocessing,Python,Regex,Python 3.x,Text Processing,Python Textprocessing,这是我的样本记录 Record ID: 9211 User name: Administrator first User principal name: Administrator@example.com When created: 1999-12-23 3:8:52 When changed: 2000-06-10 4:8:55 Account expires: Never 我想从值的前面提取数据。输出必须如下

这是我的样本记录

Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never

我想从值的前面提取数据。输出必须如下所示：

9211
Administrator first
Administrator
first
Administrator@example.com
1999-12-23 3:8:52
2000-06-10 4:8:55
Never

必须先提取和分离单词

管理员

，如上所示。
为了从示例中提取

用户名

，我尝试了以下操作，但没有得到任何输出

re.findall(r'User name:           (\w+)', i)

请告诉我如何才能做到这一点？应该只有提取的数据，而不是在数据之前给出的空间

请告诉我如何才能做到这一点

您可以将每一行转换为一个列表，并使用列表上的

.split（）

方法将字符串拆分为两个单独的列表索引。如。如果我将短语“好人”拆分为“”（空格），那么我将得到一个包含两个索引的列表，第0个索引为“People”，第一个索引为“People”

我可能已经解释得很透彻了，所以你可以去看看其他关于拆分方法的帖子。

你可以使用dict理解

import re

string = """
Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never
"""

rx = re.compile(r'^(?P<key>[^:\n]+):\s*(?P<value>.+)', re.MULTILINE)
result = {m.group('key'): m.group('value') for m in rx.finditer(string)}
print(result)

您可以使用简单的方法：

text = """Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never"""

# cut text at newline chars
for line in text.splitlines():
    # find the first ':'
    idx=line.index(':')
    # remove spaces from the start
    strippedLine = line[idx+1:].lstrip()
    if 'User name' in line:
        print (strippedLine)

使用

r'User name:\s*（\w+\s*\w*）”

作为正则表达式字符串；问题似乎是字段名与引起和问题的值之间的空格，以及值中第一个和最后一个字之间的空格（对于具有这些字的值，因此匹配了

）

为什么不拆分

：

，然后拆分

strip（）

？如果拆分产生的项目超过2个，则丢弃第一个项目，并将其他项目重新连接在一起。如果正好是2项，则只需要第二项（然后将其剥离）。@mpf82无需连接

行。拆分（'：'，1）[1]。剥离（）

@Chris_Rands但是带时间的行（或其他具有多个

：

的行）如何？e、 g.

2000-06-10 4:8:55

@mpf82它可以工作，我正在使用

str.split（）

的第二个参数尝试

'When created:1999-12-23 3:8:52'.split（'：'，1）[1]。strip（）

@Chris u Rands对，对不起，我没有注意到你在

split上设置了maxslit参数您不“使用列表上的拆分方法”。您可以对字符串使用split方法来获取列表。如果我只需要用户名怎么办？我修改了答案来回答这个问题，如果我有一个这样的记录文件怎么办。我试着一行一行地通过它，但什么也没有得到。如果你能在答案中添加一些东西，你能帮我吗？@JafferWilson:你每次都有所有的项目吗？大部分时间都是这样。@JafferWilson:更新了，引入了一个类。你能添加一行代码，显示使用正则表达式并将输出存储在变量中的re.search
或re.findall
的用法吗？如果可能的话，请让我知道。在pythonshell中，在导入re并将提供的示例放入sample\u string
之后，感谢您：>>用户名\u regex=re.compile（r'User name:\s*（\w+\s*\w*））>>>找到用户名=用户名\u regex.findall（示例\u string）>>找到用户名['Administrator first']
，这样您就可以从找到用户名[0]中获取它了。感谢您的回复。
text = """Record ID:           9211
User name:           Administrator first
User principal name: Administrator@example.com
When created:         1999-12-23 3:8:52
When changed:         2000-06-10 4:8:55
Account expires:      Never"""

# cut text at newline chars
for line in text.splitlines():
    # find the first ':'
    idx=line.index(':')
    # remove spaces from the start
    strippedLine = line[idx+1:].lstrip()
    if 'User name' in line:
        print (strippedLine)