Perl(或Python)和Excel有没有办法确定单元格中多行文本中使用的字体类型?
我的数据来自Excel文件,其中一些单元格包含字符串,这些字符串包含以前版本的数据,这些数据表示为删除线字符。我知道如何使用Perl和OLE解析/操作Excel文件,但我只看到在单元格级别可以访问文本格式。是否有一种方法可以逐个字符访问格式?我们的目标是找到并删除所有格式为删除线的文本。这是一个VBA解决方案,因为我的机器上没有安装Python。希望它能显示访问单个字符格式的方法 以下是Perl(或Python)和Excel有没有办法确定单元格中多行文本中使用的字体类型?,excel,perl,formatting,ole,strikethrough,Excel,Perl,Formatting,Ole,Strikethrough,我的数据来自Excel文件,其中一些单元格包含字符串,这些字符串包含以前版本的数据,这些数据表示为删除线字符。我知道如何使用Perl和OLE解析/操作Excel文件,但我只看到在单元格级别可以访问文本格式。是否有一种方法可以逐个字符访问格式?我们的目标是找到并删除所有格式为删除线的文本。这是一个VBA解决方案,因为我的机器上没有安装Python。希望它能显示访问单个字符格式的方法 以下是范围(“A1”): 给出输出: strikethrough at character 17 striketh
范围(“A1”)
:
给出输出:
strikethrough at character 17
strikethrough at character 18
strikethrough at character 19
使用
电子表格::ParseExcel
访问单个单元格以及具有多种格式的复杂单元格。复杂单元格将使用富文本格式,您可以使用$cell->get_Rich_Text()
方法访问该格式。下面是一个示例,用于在单个单元格中以及作为多格式单元格的一部分(改编自的大纲)中查找删除项格式
parse\u lazy\u dog.pl
#!/usr/bin/env perl
use warnings;
use strict;
use Spreadsheet::ParseExcel;
my $file = 'lazy_dog.xls';
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($file);
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;
print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value(), "\n";
print "Unformatted Value = ", $cell->unformatted(), "\n";
if ( my $rich = $cell->get_rich_text() ) {
# Multiple formats inside one cell
print " STRIKEOUT -> ";
my $pos = 0;
for my $rich_elem (@$rich) {
my ($char_pos, $font) = @$rich_elem;
if ($font->{Strikeout}) {
while ($pos++ < $char_pos) {
print " ";
}
} else {
while ($pos++ <= $char_pos) {
print "^";
}
}
}
print "\n";
} else {
# Entire cell has same format
my $format = $cell->get_format();
my $is_strikeout = $format->{Font}->{Strikeout};
if ($is_strikeout) {
print " STRIKEOUT -> ";
print "^"x(length($cell->unformatted()));
print "\n";
}
print "\n";
}
}
}
}
看起来您可以使用Win32::OLE访问单个字符和字符范围:
#!/usr/bin/env perl
use warnings;
use strict;
use Spreadsheet::ParseExcel;
my $file = 'lazy_dog.xls';
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($file);
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;
print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value(), "\n";
print "Unformatted Value = ", $cell->unformatted(), "\n";
if ( my $rich = $cell->get_rich_text() ) {
# Multiple formats inside one cell
print " STRIKEOUT -> ";
my $pos = 0;
for my $rich_elem (@$rich) {
my ($char_pos, $font) = @$rich_elem;
if ($font->{Strikeout}) {
while ($pos++ < $char_pos) {
print " ";
}
} else {
while ($pos++ <= $char_pos) {
print "^";
}
}
}
print "\n";
} else {
# Entire cell has same format
my $format = $cell->get_format();
my $is_strikeout = $format->{Font}->{Strikeout};
if ($is_strikeout) {
print " STRIKEOUT -> ";
print "^"x(length($cell->unformatted()));
print "\n";
}
print "\n";
}
}
}
}
Row, Col = (0, 0)
Value = The
Unformatted Value = The
Row, Col = (0, 1)
Value = quick
Unformatted Value = quick
Row, Col = (0, 2)
Value = brown
Unformatted Value = brown
Row, Col = (0, 3)
Value = fox
Unformatted Value = fox
Row, Col = (0, 4)
Value = jumped
Unformatted Value = jumped
Row, Col = (0, 5)
Value = under
Unformatted Value = under
STRIKEOUT -> ^^^^^
Row, Col = (0, 6)
Value = over
Unformatted Value = over
Row, Col = (0, 7)
Value = the
Unformatted Value = the
Row, Col = (0, 8)
Value = lazy
Unformatted Value = lazy
Row, Col = (0, 9)
Value = dog.
Unformatted Value = dog.
Row, Col = (1, 0)
Value = THE QUICK BROWN FOX JUMPED UNDER OVER THE LAZY DOG.
Unformatted Value = THE QUICK BROWN FOX JUMPED UNDER OVER THE LAZY DOG.
STRIKEOUT -> ^^^^^