Java 与StuartMacKay';s变换swf库
我需要从一些swf文件中提取所有文本。我使用Java,因为我有很多用这种语言开发的模块。 因此,我在网上搜索了所有用于处理SWF文件的免费Java库。 最后,我找到了StuartMacKay开发的图书馆。名为transform swf的库可以通过单击在GitHub上找到 问题是:一旦我从Java 与StuartMacKay';s变换swf库,java,text,flash,text-extraction,Java,Text,Flash,Text Extraction,我需要从一些swf文件中提取所有文本。我使用Java,因为我有很多用这种语言开发的模块。 因此,我在网上搜索了所有用于处理SWF文件的免费Java库。 最后,我找到了StuartMacKay开发的图书馆。名为transform swf的库可以通过单击在GitHub上找到 问题是:一旦我从文本span中提取字形索引es,我如何将glyps转换为字符? 请提供一个完整的工作和测试示例。不会接受任何理论上的答案,也不会接受“不可能”、“不可能”等答案 我所知道的和我所做的 我知道GlyphIndex是
文本span
中提取字形索引
es,我如何将glyps转换为字符?
请提供一个完整的工作和测试示例。不会接受任何理论上的答案,也不会接受“不可能”、“不可能”等答案
我所知道的和我所做的
我知道GlyphIndex
是通过使用TextTable
构建的,该表是通过循环使用一个整数来构建的,该整数表示字体大小和DefineFont2
对象提供的字体描述,但是当我解码所有DefineFont2时,所有的长度都为零
下面是我所做的
//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));
//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
if (object instanceof DefineFont2) {
DefineFont2 df2 = (DefineFont2) object;
fonts.put(df2.getIdentifier(), df2);
}
}
//Now I retrieve all the texts
for (MovieTag object : list) {
if (object instanceof DefineText2) {
DefineText2 dt2 = (DefineText2) object;
for (TextSpan ts : dt2.getSpans()) {
Integer fontIdentifier = ts.getIdentifier();
if (fontIdentifier != null) {
int fontSize = ts.getHeight();
// Here I try to create an object that should
// reverse the process done by a TextTable
ReverseTextTable rtt =
new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
System.out.println(rtt.charactersForText(ts.getCharacters()));
}
}
}
}
不幸的是,
DefineFont2
的推进列表是空的,ReverseTableText
的构造函数得到了一个ArrayIndexOutOfBoundException
,它似乎很难实现,你试图重新编译文件,但很抱歉,这是不可能的,我建议你做的是把它转换成一些位图(如果可能的话),或者通过任何其他方法尝试使用
有一些这样做,你也可以检查一些有关的。因为swf的一次编译版本非常困难(据我所知是不可能的)。如果您愿意,您可以检查这个,或者尝试使用其他一些语言,比如项目老实说,我不知道如何在Java中做到这一点。我并不是说这是不可能的,我也相信有办法做到这一点。然而,你说有很多图书馆都是这样做的。你还建议建立一个图书馆,即。因此,我建议再次使用该库从flash文件中提取文本。要做到这一点,您可以使用仅执行命令行来运行该库 就个人而言,我更喜欢JDK发布的标准库,而不是它。好吧,让我告诉你该怎么做。您应该使用的可执行文件是“swfstrings.exe”。假设它放在“
C:\
”中。假设您可以在同一文件夹中找到闪存文件,例如page.swf
。然后,我尝试了以下代码(效果很好):
我知道,这不完全是你要求的答案,但效果很好 我在使用transform swf库处理长字符串时遇到了类似的问题 获取源代码并进行调试。
我相信类
com.flagstone.transform.coder.SWFDecoder
中有一个小错误
第540行(适用于3.0.2版),更改
dest+=长度
与
dest+=计数
这应该能帮到你(这是关于提取字符串的)。
我也通知了斯图尔特。这个问题只有在字符串非常大的情况下才会出现。我碰巧现在正在用Java反编译SWF,在研究如何将原始文本反向工程时遇到了这个问题 看了源代码之后,我意识到它非常简单。每个字体都有一个指定的字符序列,可以通过调用
DefineFont2.getCodes()
来检索,glyphIndex是DefineFont2.getCodes()
中匹配字符的索引
但是,在单个SWF文件中使用多个字体的情况下,很难将每个DefineText
与相应的DefineFont2
匹配,因为没有属性来标识用于每个DefineText
的DefineFont2
为了解决这个问题,我提出了一个自学习算法,它将尝试为每个DefineText
猜测正确的DefineFont2
,从而正确导出原始文本
为了对原始文本进行反向工程,我创建了一个名为FontLearner
的类:
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* @param font The font to add to the learner
* @param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* @param text The DefineText to retrieve the original String from
* @return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
公共类{
私有最终ArrayList字体=新建ArrayList();
private final HashMap advancesMap=新HashMap();
/**
*来自同一字体的相同字符将具有相似的高级值。
*此常量定义两个前进值之间的允许差值
*在它们被视为同一个字符之前
*/
专用静态最终int-ADVANCE_阈值=10;
/**
*尽管进行了比较,但某些字符仍具有异常提前值
*一模一样
*该常数定义每个字符串的最小精度级别
*在它与给定字体关联之前
*/
专用静态最终双精度_阈值=0.9;
/**
*此方法向学习者添加DefineFont2和DefineText
*与字体关联,向学习者教授给定字体。
*
*@param font要添加到学习者的字体
*@param text与字体关联的文本
*/
专用void addFont(定义字体2,定义文本文本){
字体。添加(字体);
HashMap advances=新的HashMap();
advancesMap.put(font.getIdentifier(),advances);
列表代码=font.getCodes();
List span=text.getspan();
用于(文本跨度:跨度){
List characters=span.getCharacters();
用于(索引字符:个字符){
int glyphIndex=character.getGlyphIndex();
char c=(char)(int)code.get(glyphIndex);
int advance=character.getAdvance();
预付款。投入(c,预付款);
}
}
}
/**
*
Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
commandLine.addArgument("\"" + swfFile.toString() + "\"");
DefaultExecutor executor = new DefaultExecutor();
executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
//0 for file not found, -1 for error
ByteArrayOutputStream stdout = new ByteArrayOutputStream();
PumpStreamHandler psh = new PumpStreamHandler(stdout);
executor.setStreamHandler(psh);
int exitValue;
try{
exitValue = executor.execute(commandLine);
}catch(org.apache.commons.exec.ExecuteException ex){
psh.stop();
}
if(!executor.isFailure(exitValue)){
String out = stdout.toString("UTF-8"); // here you have the extracted text
}
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* @param font The font to add to the learner
* @param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* @param text The DefineText to retrieve the original String from
* @return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());
FontLearner learner = new FontLearner();
DefineFont2 font = null;
List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
font = (DefineFont2) object;
} else if (object instanceof DefineText) {
DefineText text = (DefineText) object;
if (font != null) {
learner.addFont(font, text);
font = null;
}
String line = learner.getString(text); // reverse engineers the line
}