将path、filename、ext字符串分离为三个独立变量的最稳定和最具Python风格的跨平台方法是什么？_Python_String_Filenames_Filepath_File Extension

将path、filename、ext字符串分离为三个独立变量的最稳定和最具Python风格的跨平台方法是什么？

python string

将path、filename、ext字符串分离为三个独立变量的最稳定和最具Python风格的跨平台方法是什么？,python,string,filenames,filepath,file-extension,Python,String,Filenames,Filepath,File Extension,我试图将一个字符串分成三个变量，C:\Example\readme.txt可以被解读为C:\Example、readme和.txt，以便于编写脚本。它可以部署在Windows和Unix环境中，并且可以处理Windows或Unix路径，因此我需要找到一种符合这两个标准的方法；我已经读过一些实现类似功能的函数，但是我想了解一些如何最好地处理函数中的单个字符串的信息 *注意，我正在这个环境中运行IronPython2.6，我不确定这是否与标准Python2.7有很大的不同，以至于我需要调整我的用法编

我试图将一个字符串分成三个变量，

C:\Example\readme.txt

可以被解读为

C:\Example

、

readme

和

.txt

，以便于编写脚本。它可以部署在Windows和Unix环境中，并且可以处理Windows或Unix路径，因此我需要找到一种符合这两个标准的方法；我已经读过一些实现类似功能的函数，但是我想了解一些如何最好地处理函数中的单个字符串的信息

*注意，我正在这个环境中运行IronPython2.6，我不确定这是否与标准Python2.7有很大的不同，以至于我需要调整我的用法

编辑：我知道使用

os.path.splitext

从文件名中获取扩展名，但找到一种平台无关的方法来获取路径和文件名（我后来使用

splitext

打开）让我感到奇怪。

我倾向于使用

os.path

模块，这取决于您正在运行的操作系统。但是导入

os.path

应该总能找到正确的路径。如果可以，您可以手动检查正在使用的操作系统：

import platform
platform.platform()

然后从

os

导入适当的路径工具包。但是只需导入os.path确实容易得多

因此，您感兴趣的是：

os.path.basename(path) # To get the name of the file with extension.
os.path.basename(path).split('.')[0] # To get just the name.
os.path.dirname(path) # To get the directory leading to the file.

希望这有帮助

警告：我不保证这是最好的方法。

您需要

os.path.split

os.path.splitext

。下次请花点时间阅读文档，它会比在这里发布快

考虑到ChrisP的评论，我修改了代码：

import re
from os.path import sep
rs = re.escape(sep)

basepat = ('(/?.*?)(?=%s?[^%s]*\Z)'
           '(?:%s([^.]*)(\.[^.]+)?)?\Z')

print '* On a Windows platform'
sep = '\\'
print 'sep: %s  repr(s): %r' % (sep,sep)
print 'rs = re.escape(sep)'
rs = re.escape(sep)
print 'rs: %s   repr(rs): %r' % (rs,rs)
rgx = re.compile(basepat % (rs,rs,rs))
for fn in (r'C:\Example\readme.txt',
           r'C:\Example\.txt',
           r'C:\Example\readme',
           'C:\Example\\readme\\',
           'C:\Example\\rod\pl\\',
           'C:\Example\\rod\p2',
           r'C:\Egz\rod\pl\zu.pdf',
           'C:\Example\\',
           'C:\Example',
           'C:\\'):
    m = rgx.match(fn)
    if m:  print '%-21s  %r' %(fn,m.groups(''))
    else:  print fn
print

print '\n* On a Linux platform'
sep = '/'
print 'sep: %s  repr(s): %r' % (sep,sep)
print 'rs = re.escape(sep)'
rs = re.escape(sep)
print 'rs: %s   repr(rs): %r' % (rs,rs)
rgx = re.compile(basepat % (rs,rs,rs))
for fn in ('/this/is/a/unix/folder.txt',
           '/this/is/a/unix/.txt',
           '/this/is/a/unix/folder',
           '/this/is/a/unix/folder/',
           '/this/', 
           '/this'):
    m = rgx.match(fn)
    if m:  print '%-21s  %r' %(fn,m.groups(''))
    else:  print fn

结果

* On a Windows platform
sep: \  repr(s): '\\'
rs = re.escape(sep)
rs: \\   repr(rs): '\\\\'
C:\Example\readme.txt  ('C:\\Example', 'readme', '.txt')
C:\Example\.txt        ('C:\\Example', '', '.txt')
C:\Example\readme      ('C:\\Example', 'readme', '')
C:\Example\readme\     ('C:\\Example\\readme', '', '')
C:\Example\rod\pl\     ('C:\\Example\\rod\\pl', '', '')
C:\Example\rod\p2      ('C:\\Example\\rod', 'p2', '')
C:\Egz\rod\pl\zu.pdf   ('C:\\Egz\\rod\\pl', 'zu', '.pdf')
C:\Example\            ('C:\\Example', '', '')
C:\Example             ('C:', 'Example', '')
C:\                    ('C:', '', '')


* On a Linux platform
sep: /  repr(s): '/'
rs = re.escape(sep)
rs: \/   repr(rs): '\\/'
/this/is/a/unix/folder.txt  ('/this/is/a/unix', 'folder', '.txt')
/this/is/a/unix/.txt   ('/this/is/a/unix', '', '.txt')
/this/is/a/unix/folder  ('/this/is/a/unix', 'folder', '')
/this/is/a/unix/folder/  ('/this/is/a/unix/folder', '', '')
/this/                 ('/this', '', '')
/this                  ('/this', '', '')

C:\Example\readme.txt       ('C:\\Example', 'readme', '.txt')
C:\Example\.txt             ('C:\\Example', '', '.txt')
C:\Example\readme           ('C:\\Example', 'readme', '')
C:\Example\readme\          ('C:\\Example\\readme', '', '')
C:\Example\rod\pl\          ('C:\\Example\\rod\\pl', '', '')
C:\Example\rod\p2           ('C:\\Example\\rod', 'p2', '')
C:\Egz\rod\pl\zu.pdf        ('C:\\Egz\\rod\\pl', 'zu', '.pdf')
C:\Example\                 ('C:\\Example', '', '')
C:\Example                  ('C:', 'Example', '')
C:\                         ('C:', '', '')
/this/is/a/unix/folder.txt  ('/this/is/a/unix', 'folder', '.txt')
/this/is/a/unix/.txt        ('/this/is/a/unix', '', '.txt')
/this/is/a/unix/folder      ('/this/is/a/unix', 'folder', '')
/this/is/a/unix/folder/     ('/this/is/a/unix/folder', '', '')
/this/                      ('/this', '', '')
/this                       ('/this', '', '')
\machine\share\folder       ('\\machine\\share', 'folder', '')
c:/folderolder2            ('c:', 'folder\x0colder2', '')
c:\folder\..\folder2        ('c:\\folder\\..', 'folder2', '')
c:\folder\..\fofo2.txt      ('c:\\folder\\..', 'fofo2', '.txt')
c:\folder\..\ki/fofo2.txt   ('c:\\folder\\..\\ki', 'fofo2', '.txt')

basepat

对于上述代码中的Windos或Linux平台这两种情况是相同的。那么真正的代码将是：

import re
from os.path import sep
rs = re.escape(sep)
rgx = re.compile('(/?.*?)(?=%s?[^%s]*\Z)'
                 '(?:%s([^.]*)(\.[^.]+)?)?\Z'
                 % (rs,rs,rs))
etc...

# 正则表达式模式

'（.*）\\\\（[^.\\]*）（\.[^.]+）？\Z'

可以在以下表单下更容易阅读：

('(.*)'
 '\\\\'
 '([^.\\\\]*)'
 '(\.[^.]+)?'
 '\Z')

1）

（.+）

表示“尽可能多的字符，此字符序列必须保留为组（1）”

是一个量词。如果后面没有

？

，它就是一个贪婪的量词->“尽可能多…”
括号是确定要保留在组中的字符的匹配序列的符号。由于这些参数是模式中的第一个，因此组将编号为1。
请注意，分析文本中必须存在字符串char

，用

\\\\\

表示，其结果是

的贪婪性受到限制：事实上，正则表达式电机在文本中lmast

前面停止，以保持编译正则表达式匹配的可能性，如果

贪婪地匹配整个分析的字符串，情况就不会是这样了

2）

\\\

：当

re.compile（）

看到此系列时，它会将其解释为“一个字符串\必须在分析的字符串的这个位置上”。
为什么需要在表示正则表达式模式的字符串中用4个字符串

表示字符串

，这一点比较难理解。

re.compile（）

函数必须提供正则表达式符号。但是我们必须为

re.compile（）

指定符号的唯一方法是将字符串作为参数传递给它。这就是为什么我在上面写了表示正则表达式模式的字符串。因为

re.compile（）

不直接处理字符串，所以它编译一系列符号，这些符号是它首先从字符串模式的解释中获得的。
用于常见字符，如

或

，它很简单：表示

的符号仅由字符串

表示，对于

它是

对于对正则表达式电机具有特殊意义的特殊字符，如*
，+
，？
等，其表达式必须使用特殊字符\
：因此\.
将表示“一个点”，“\？
将表示“一个问号”等

但是对于\
字符串，它本身在字符串中有一个特殊的含义：它转义下面的字符。例如，在“\”中，字符串char\
转义”
字符，以确保没有错误。

问题是字符串char\
会自动转义。然后，如果将aaa\\bbb
写入正则表达式模式，则双反斜杠将被解释为表示字符串char\
，而不是字符串char\
的正则表达式符号

然后，\\
对字符串字符进行符号化效率很低\
，该符号化是用4个字符串字符完成的：\\\\

3） 

[a4::
表示“一个字符可以是a
或4
或：
”

[^a4:][/code>表示“任何字符，三个字符a
、4
和中的一个除外：


然后，[^.\\\]*
表示“除点和字符以外的任何字符，\”，星形表示“前面定义的字符可以重复或不存在。”
请注意，括号中的点失去了它的特殊含义，没有必要逃避它。

由于这是在两个参数之间，由[^a4:][/code>匹配的连续字符将保留在第二个组对象中
5） 

（\.[^.]+）？
表示“一个点后跟一系列字符，这些字符可以是除点以外的任何现有字符”。在这里，图案中的点必须转义以表示一个点符号，该点符号仅表示“一个点字符”，而不是“每个字符”。

由于存在+，因此点（如果存在于分析的
C:\Example\readme.txt       ('C:\\Example', 'readme', '.txt')
C:\Example\.txt             ('C:\\Example', '', '.txt')
C:\Example\readme           ('C:\\Example', 'readme', '')
C:\Example\readme\          ('C:\\Example\\readme', '', '')
C:\Example\rod\pl\          ('C:\\Example\\rod\\pl', '', '')
C:\Example\rod\p2           ('C:\\Example\\rod', 'p2', '')
C:\Egz\rod\pl\zu.pdf        ('C:\\Egz\\rod\\pl', 'zu', '.pdf')
C:\Example\                 ('C:\\Example', '', '')
C:\Example                  ('C:', 'Example', '')
C:\                         ('C:', '', '')
/this/is/a/unix/folder.txt  ('/this/is/a/unix', 'folder', '.txt')
/this/is/a/unix/.txt        ('/this/is/a/unix', '', '.txt')
/this/is/a/unix/folder      ('/this/is/a/unix', 'folder', '')
/this/is/a/unix/folder/     ('/this/is/a/unix/folder', '', '')
/this/                      ('/this', '', '')
/this                       ('/this', '', '')
\machine\share\folder       ('\\machine\\share', 'folder', '')
c:/folderolder2            ('c:', 'folder\x0colder2', '')
c:\folder\..\folder2        ('c:\\folder\\..', 'folder2', '')
c:\folder\..\fofo2.txt      ('c:\\folder\\..', 'fofo2', '.txt')
c:\folder\..\ki/fofo2.txt   ('c:\\folder\\..\\ki', 'fofo2', '.txt')