Powershell 将行拆分为单词,然后另存为新文件

Powershell 将行拆分为单词,然后另存为新文件,powershell,Powershell,假设我在C驱动器中有一个文本文件test.txt On the face of things, we seem to be merely talking about text-based files, containing only the letters of the English Alphabet (and the occasional punctuation mark). On deeper inspection, of course, this isn't quite the cas

假设我在C驱动器中有一个文本文件test.txt

On the face of things, we seem to be merely talking about text-based files, containing only 
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128 
characters that the American Standard Code for Information Interchange (ASCII) 
 allowed them. The focus is on mid-1980's textfiles and the world as it was then, 
but even these files are sometime retooled 1960s and 1970s works, and offshoots 
 of this culture exist to this day.
我想将所有行拆分为单词,然后将其另存为新文件。在新文件中,每行仅包含一个单词

因此,新文件将是:

       On
       the
       face
       of
       things
       we
       seem
       to
       ....

分隔符是空白,请跳过所有标点符号。

您甚至还没有尝试过。下次我投票支持封闭式问题。Powershell使用99%的c#语法,并且“all”.Net类可用,因此,如果您了解c#,您可以在google上花5分钟,并尝试一些命令,在Powershell中取得长足的进步

#create array
$words = @()

#read file
$lines = [System.IO.File]::ReadAllLines("C:\Users\Frode\Desktop\in.txt")

#split words
foreach ($line in $lines) {
    $words += $line.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
}

#save words
[System.IO.File]::WriteAllLines("C:\Users\Frode\Desktop\out.txt", $words)
在PowerShell中,您也可以这样做:

Get-Content .\in.txt | ForEach-Object { 
    $_.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries) 
} | Set-Content out.txt

你甚至都没试过。下次我投票支持封闭式问题。Powershell使用99%的c#语法,并且“all”.Net类可用,因此,如果您了解c#,您可以在google上花5分钟,并尝试一些命令,在Powershell中取得长足的进步

#create array
$words = @()

#read file
$lines = [System.IO.File]::ReadAllLines("C:\Users\Frode\Desktop\in.txt")

#split words
foreach ($line in $lines) {
    $words += $line.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
}

#save words
[System.IO.File]::WriteAllLines("C:\Users\Frode\Desktop\out.txt", $words)
在PowerShell中,您也可以这样做:

Get-Content .\in.txt | ForEach-Object { 
    $_.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries) 
} | Set-Content out.txt

你甚至都没试过。下次我投票支持封闭式问题。Powershell使用99%的c#语法,并且“all”.Net类可用,因此,如果您了解c#,您可以在google上花5分钟,并尝试一些命令,在Powershell中取得长足的进步

#create array
$words = @()

#read file
$lines = [System.IO.File]::ReadAllLines("C:\Users\Frode\Desktop\in.txt")

#split words
foreach ($line in $lines) {
    $words += $line.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
}

#save words
[System.IO.File]::WriteAllLines("C:\Users\Frode\Desktop\out.txt", $words)
在PowerShell中,您也可以这样做:

Get-Content .\in.txt | ForEach-Object { 
    $_.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries) 
} | Set-Content out.txt

你甚至都没试过。下次我投票支持封闭式问题。Powershell使用99%的c#语法,并且“all”.Net类可用,因此,如果您了解c#,您可以在google上花5分钟,并尝试一些命令,在Powershell中取得长足的进步

#create array
$words = @()

#read file
$lines = [System.IO.File]::ReadAllLines("C:\Users\Frode\Desktop\in.txt")

#split words
foreach ($line in $lines) {
    $words += $line.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries)
}

#save words
[System.IO.File]::WriteAllLines("C:\Users\Frode\Desktop\out.txt", $words)
在PowerShell中,您也可以这样做:

Get-Content .\in.txt | ForEach-Object { 
    $_.Split(" ,.", [System.StringSplitOptions]::RemoveEmptyEntries) 
} | Set-Content out.txt

下面是一个使用正则表达式的解决方案,它将:

  • 删除特殊字符
  • 基于单词边界解析单词(regex中的
    \b
代码:

$Text = @'
On the face of things, we seem to be merely talking about text-based files, containing only 
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128 
characters that the American Standard Code for Information Interchange (ASCII) 
 allowed them. The focus is on mid-1980's textfiles and the world as it was then, 
but even these files are sometime retooled 1960s and 1970s works, and offshoots 
 of this culture exist to this day.
'@;

# Remove special characters
$Text = $Text -replace '\(|\)|''|\.|,','';
# Match words
$MatchList = ([Regex]'(?<word>\b\w+\b)').Matches($Text);
# Get just the text values of the matches
$WordList = $MatchList | % { $PSItem.Groups['word'].Value; };
# Examine the 'Count' of words
$WordList.Count

下面是一个使用正则表达式的解决方案,它将:

  • 删除特殊字符
  • 基于单词边界解析单词(regex中的
    \b
代码:

$Text = @'
On the face of things, we seem to be merely talking about text-based files, containing only 
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128 
characters that the American Standard Code for Information Interchange (ASCII) 
 allowed them. The focus is on mid-1980's textfiles and the world as it was then, 
but even these files are sometime retooled 1960s and 1970s works, and offshoots 
 of this culture exist to this day.
'@;

# Remove special characters
$Text = $Text -replace '\(|\)|''|\.|,','';
# Match words
$MatchList = ([Regex]'(?<word>\b\w+\b)').Matches($Text);
# Get just the text values of the matches
$WordList = $MatchList | % { $PSItem.Groups['word'].Value; };
# Examine the 'Count' of words
$WordList.Count

下面是一个使用正则表达式的解决方案,它将:

  • 删除特殊字符
  • 基于单词边界解析单词(regex中的
    \b
代码:

$Text = @'
On the face of things, we seem to be merely talking about text-based files, containing only 
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128 
characters that the American Standard Code for Information Interchange (ASCII) 
 allowed them. The focus is on mid-1980's textfiles and the world as it was then, 
but even these files are sometime retooled 1960s and 1970s works, and offshoots 
 of this culture exist to this day.
'@;

# Remove special characters
$Text = $Text -replace '\(|\)|''|\.|,','';
# Match words
$MatchList = ([Regex]'(?<word>\b\w+\b)').Matches($Text);
# Get just the text values of the matches
$WordList = $MatchList | % { $PSItem.Groups['word'].Value; };
# Examine the 'Count' of words
$WordList.Count

下面是一个使用正则表达式的解决方案,它将:

  • 删除特殊字符
  • 基于单词边界解析单词(regex中的
    \b
代码:

$Text = @'
On the face of things, we seem to be merely talking about text-based files, containing only 
the letters of the English Alphabet (and the occasional punctuation mark).
On deeper inspection, of course, this isn't quite the case. What this site
offers is a glimpse into the history of writers and artists bound by the 128 
characters that the American Standard Code for Information Interchange (ASCII) 
 allowed them. The focus is on mid-1980's textfiles and the world as it was then, 
but even these files are sometime retooled 1960s and 1970s works, and offshoots 
 of this culture exist to this day.
'@;

# Remove special characters
$Text = $Text -replace '\(|\)|''|\.|,','';
# Match words
$MatchList = ([Regex]'(?<word>\b\w+\b)').Matches($Text);
# Get just the text values of the matches
$WordList = $MatchList | % { $PSItem.Groups['word'].Value; };
# Examine the 'Count' of words
$WordList.Count

我不会费心拆分字符串,因为不管怎样,都要将结果写回文件。只需将所有标点符号(可能还有括号)替换为空格,将所有连续的空格替换为换行符,然后将所有内容写回文件:

$in  = 'C:\test.txt'
$out = 'C:\test2.txt'

(Get-Content $in | Out-String) -replace '[.,;:?!()]',' ' -replace '\s+',"`r`n" |
  Set-Content $out

我不会费心拆分字符串,因为不管怎样,都要将结果写回文件。只需将所有标点符号(可能还有括号)替换为空格,将所有连续的空格替换为换行符,然后将所有内容写回文件:

$in  = 'C:\test.txt'
$out = 'C:\test2.txt'

(Get-Content $in | Out-String) -replace '[.,;:?!()]',' ' -replace '\s+',"`r`n" |
  Set-Content $out

我不会费心拆分字符串,因为不管怎样,都要将结果写回文件。只需将所有标点符号(可能还有括号)替换为空格,将所有连续的空格替换为换行符,然后将所有内容写回文件:

$in  = 'C:\test.txt'
$out = 'C:\test2.txt'

(Get-Content $in | Out-String) -replace '[.,;:?!()]',' ' -replace '\s+',"`r`n" |
  Set-Content $out

我不会费心拆分字符串,因为不管怎样,都要将结果写回文件。只需将所有标点符号(可能还有括号)替换为空格,将所有连续的空格替换为换行符,然后将所有内容写回文件:

$in  = 'C:\test.txt'
$out = 'C:\test2.txt'

(Get-Content $in | Out-String) -replace '[.,;:?!()]',' ' -replace '\s+',"`r`n" |
  Set-Content $out


你有没有试过自己解决这个问题?你能分享一下你的尝试和结果吗?我对powershell不在行,我用的是c#,但代码不简洁。你自己有没有尝试过解决这个问题?你能分享一下你的尝试和结果吗?我对powershell不在行,我用的是c#,但代码不简洁。你自己有没有尝试过解决这个问题?你能分享一下你的尝试和结果吗?我对powershell不在行,我用的是c#,但代码不简洁。你自己有没有尝试过解决这个问题?你能分享一下你的尝试和结果吗?我对powershell不太了解,我用的是c#,但代码并不简洁。答案不错,但对于某些单词,如“不是”,它会在撇号上分开。这似乎不是理想的结果,因为“is”和“t”不是单词。在我的回答中,我使用正则表达式首先去掉所有特殊字符,然后再匹配单词。这一点很好。转型不是为了不也有问题。具有挑战性的问题。它可能需要不止一个正则表达式。同意,它有自己的问题,但它符合请求者的规范。回答得好,但对于某些单词,如“不是”,它将在撇号上分开。这似乎不是理想的结果,因为“is”和“t”不是单词。在我的回答中,我使用正则表达式首先去掉所有特殊字符,然后再匹配单词。这一点很好。转型不是为了不也有问题。具有挑战性的问题。它可能需要不止一个正则表达式。同意,它有自己的问题,但它符合请求者的规范。回答得好,但对于某些单词,如“不是”,它将在撇号上分开。这似乎不是理想的结果,因为“is”和“t”不是单词。在我的回答中,我使用正则表达式首先去掉所有特殊字符,然后再匹配单词。这一点很好。转型不是为了不也有问题。具有挑战性的问题。它可能需要不止一个正则表达式。同意,它有自己的问题,但它符合请求者的规范。回答得好,但对于某些单词,如“不是”,它将在撇号上分开。这似乎不是理想的结果,因为“is”和“t”不是单词。在我的回答中,我使用正则表达式首先去掉所有特殊字符,然后再匹配单词。这一点很好。转型不是为了不也有问题。具有挑战性的问题。它可能需要不止一个正则表达式。同意,它有自己的问题,但它符合请求者的规范。