Linux Sed不使用特殊字符
我有一个密码,其中一些数字编码为非Ascii字符。我使用sed对文件进行编码,但是当我使用sed替换这些字符时,我得到了意想不到的结果 像这样Linux Sed不使用特殊字符,linux,ubuntu,sed,non-ascii-characters,Linux,Ubuntu,Sed,Non Ascii Characters,我有一个密码,其中一些数字编码为非Ascii字符。我使用sed对文件进行编码,但是当我使用sed替换这些字符时,我得到了意想不到的结果 像这样 ¤»¤ ¡ 1 3 3ô1ô ôôôôô1ô ôôôô ôôôôô¤ôôôôô»ôôôôô¤ôôôôôô ô¡ ô 1 3ô 我试过的命令如下 sed -r 's/`echo ô`/5/g' new.txt sed -r 's/\ô/5/g' new.txt 还有perl perl -pe 's/\
¤»¤ ¡ 1 3
3ô1ô ôôôôô1ô
ôôôô
ôôôôô¤ôôôôô»ôôôôô¤ôôôôôô ô¡ ô 1 3ô
我试过的命令如下
sed -r 's/`echo ô`/5/g' new.txt
sed -r 's/\ô/5/g' new.txt
还有perl
perl -pe 's/\ô/5/g' < new.txt
perl-pe's/\ô/5/g'
我需要帮忙。谢谢。我认为解决这个问题的方法是首先以明确的形式获取字符(在两个文件中)。然后遍历映射文件,将每个明确的字符添加到具有所述值的哈希中。最后,循环遍历无歧义的示例字符(无歧义字符的大小为16),用其哈希值替换每个字符。如果示例文件包含ASCII字符(即,其明确形式的长度不是16),则可能会出现这种情况。您可能需要根据您的输入修复此问题,但如果您的示例文本指示您的实际文件,则不会有任何问题。如果结果不是你所期望的,请告诉我 运行方式如下:
./translate.pl CharMap.txt sample.txt
translate.pl
的内容:
#!/usr/bin/perl
use strict;
use warnings;
# open the files up for reading.
# ARGV[0] points to the first file listed, 'CharMap.txt'
# ARGV[1] points to the second file listed, 'sample.txt'
open CHARMAP, $ARGV[0] or die;
open SAMPLE, $ARGV[1] or die;
# execute `sed -n 'l0'` on each file and capture output into two arrays
# the '-n' flag suppresses printing of pattern space
# the 'l0' command simply means print the pattern space in an unambiguous form
my @charmap = `sed -n 'l0' $ARGV[0]`;
my @sample = `sed -n 'l0' $ARGV[1]`;
# declare a hash
my %charhash;
# loop through the array of character mappings
for (@charmap) {
# use a subroutine to sanitize each element
$_ = sanitize($_);
# add each unambiguous character to a hash with its mapping pair
$charhash{ substr $_, 2 } = substr $_, 0, 1;
}
# now loop through the unambiguous sample data
# in your sample file there is only a single element so the loop is unnecessary
for (@sample) {
# use a subroutine to sanitize each element
$_ = sanitize($_);
# so each unambiguous character is 16 readable characters longs.
# so we need to loop through 16 chars at a time. These can be stored in $1.
# then we ask the hash 'what is the value of the element $1?
# we then print this value.
print $charhash{$1} while $_ =~ /(.{16})/g;
# print a newline char to replace the chomped input
print "\n";
}
close CHARMAP;
close SAMPLE;
sub sanitize {
# read in the element passed to the subroutine
my $line = shift;
# remove newline endings
chomp $line;
# for some reason your files contained this transparent 12 digit unreadable
# unambiguous character right at the start of the two files. I do not know
# what it is or what it looks like, but for convenience, I simply remove it
# from every line, even if I only found on the first line.
$line =~ s/^\\357\\273\\277//;
# trim off a trailing line ending
$line =~ s/\$$//;
# trim off a trailing newline ending
$line =~ s/\\r$//;
return $line;
}
结果:
3177191281013,997,094
在中可以找到有关sed l0的更多信息,我认为解决这个问题的方法是首先以明确的形式获取字符(在两个文件中)。然后遍历映射文件,将每个明确的字符添加到具有所述值的哈希中。最后,循环遍历无歧义的示例字符(无歧义字符的大小为16),用其哈希值替换每个字符。如果示例文件包含ASCII字符(即,其明确形式的长度不是16),则可能会出现这种情况。您可能需要根据您的输入修复此问题,但如果您的示例文本指示您的实际文件,则不会有任何问题。如果结果不是你所期望的,请告诉我 运行方式如下:
./translate.pl CharMap.txt sample.txt
translate.pl
的内容:
#!/usr/bin/perl
use strict;
use warnings;
# open the files up for reading.
# ARGV[0] points to the first file listed, 'CharMap.txt'
# ARGV[1] points to the second file listed, 'sample.txt'
open CHARMAP, $ARGV[0] or die;
open SAMPLE, $ARGV[1] or die;
# execute `sed -n 'l0'` on each file and capture output into two arrays
# the '-n' flag suppresses printing of pattern space
# the 'l0' command simply means print the pattern space in an unambiguous form
my @charmap = `sed -n 'l0' $ARGV[0]`;
my @sample = `sed -n 'l0' $ARGV[1]`;
# declare a hash
my %charhash;
# loop through the array of character mappings
for (@charmap) {
# use a subroutine to sanitize each element
$_ = sanitize($_);
# add each unambiguous character to a hash with its mapping pair
$charhash{ substr $_, 2 } = substr $_, 0, 1;
}
# now loop through the unambiguous sample data
# in your sample file there is only a single element so the loop is unnecessary
for (@sample) {
# use a subroutine to sanitize each element
$_ = sanitize($_);
# so each unambiguous character is 16 readable characters longs.
# so we need to loop through 16 chars at a time. These can be stored in $1.
# then we ask the hash 'what is the value of the element $1?
# we then print this value.
print $charhash{$1} while $_ =~ /(.{16})/g;
# print a newline char to replace the chomped input
print "\n";
}
close CHARMAP;
close SAMPLE;
sub sanitize {
# read in the element passed to the subroutine
my $line = shift;
# remove newline endings
chomp $line;
# for some reason your files contained this transparent 12 digit unreadable
# unambiguous character right at the start of the two files. I do not know
# what it is or what it looks like, but for convenience, I simply remove it
# from every line, even if I only found on the first line.
$line =~ s/^\\357\\273\\277//;
# trim off a trailing line ending
$line =~ s/\$$//;
# trim off a trailing newline ending
$line =~ s/\\r$//;
return $line;
}
结果:
3177191281013,997,094
可以在中找到更多关于
sed l0
的信息,这项工作很好,但我不懂perl,所以请您浏览一下代码好吗?这项工作很好,但我不懂perl,所以请您浏览一下代码好吗?