Java 与StuartMacKay';s变换swf库

Java 与StuartMacKay';s变换swf库,java,text,flash,text-extraction,Java,Text,Flash,Text Extraction,我需要从一些swf文件中提取所有文本。我使用Java,因为我有很多用这种语言开发的模块。 因此,我在网上搜索了所有用于处理SWF文件的免费Java库。 最后,我找到了StuartMacKay开发的图书馆。名为transform swf的库可以通过单击在GitHub上找到 问题是:一旦我从文本span中提取字形索引es,我如何将glyps转换为字符? 请提供一个完整的工作和测试示例。不会接受任何理论上的答案,也不会接受“不可能”、“不可能”等答案 我所知道的和我所做的 我知道GlyphIndex是

我需要从一些swf文件中提取所有文本。我使用Java,因为我有很多用这种语言开发的模块。 因此,我在网上搜索了所有用于处理SWF文件的免费Java库。 最后,我找到了StuartMacKay开发的图书馆。名为transform swf的库可以通过单击在GitHub上找到

问题是:一旦我从
文本span
中提取
字形索引
es,我如何将glyps转换为字符?

请提供一个完整的工作和测试示例。不会接受任何理论上的答案,也不会接受“不可能”、“不可能”等答案

我所知道的和我所做的 我知道
GlyphIndex
是通过使用
TextTable
构建的,该表是通过循环使用一个整数来构建的,该整数表示字体大小和
DefineFont2
对象提供的字体描述,但是当我解码所有DefineFont2时,所有的长度都为零

下面是我所做的

//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));

//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
  if (object instanceof DefineFont2) {
    DefineFont2 df2 = (DefineFont2) object;
    fonts.put(df2.getIdentifier(), df2);
  }
} 
//Now I retrieve all the texts       
for (MovieTag object : list) {
    if (object instanceof DefineText2) {
        DefineText2 dt2 = (DefineText2) object;
        for (TextSpan ts : dt2.getSpans()) {
            Integer fontIdentifier = ts.getIdentifier();
            if (fontIdentifier != null) {
                int fontSize = ts.getHeight();
                // Here I try to create an object that should
                // reverse the process done by a TextTable
                ReverseTextTable rtt = 
                  new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
                System.out.println(rtt.charactersForText(ts.getCharacters()));
            }
        }
    }
}

不幸的是,
DefineFont2
的推进列表是空的,
ReverseTableText
的构造函数得到了一个
ArrayIndexOutOfBoundException
,它似乎很难实现,你试图重新编译文件,但很抱歉,这是不可能的,我建议你做的是把它转换成一些位图(如果可能的话),或者通过任何其他方法尝试使用


有一些这样做,你也可以检查一些有关的。因为swf的一次编译版本非常困难(据我所知是不可能的)。如果您愿意,您可以检查这个,或者尝试使用其他一些语言,比如项目

老实说,我不知道如何在Java中做到这一点。我并不是说这是不可能的,我也相信有办法做到这一点。然而,你说有很多图书馆都是这样做的。你还建议建立一个图书馆,即。因此,我建议再次使用该库从flash文件中提取文本。要做到这一点,您可以使用仅执行命令行来运行该库

就个人而言,我更喜欢JDK发布的标准库,而不是它。好吧,让我告诉你该怎么做。您应该使用的可执行文件是“swfstrings.exe”。假设它放在“
C:\
”中。假设您可以在同一文件夹中找到闪存文件,例如
page.swf
。然后,我尝试了以下代码(效果很好):


我知道,这不完全是你要求的答案,但效果很好

我在使用transform swf库处理长字符串时遇到了类似的问题

获取源代码并进行调试。
我相信类
com.flagstone.transform.coder.SWFDecoder
中有一个小错误

第540行(适用于3.0.2版),更改

dest+=长度

dest+=计数

这应该能帮到你(这是关于提取字符串的)。
我也通知了斯图尔特。这个问题只有在字符串非常大的情况下才会出现。

我碰巧现在正在用Java反编译SWF,在研究如何将原始文本反向工程时遇到了这个问题

看了源代码之后,我意识到它非常简单。每个字体都有一个指定的字符序列,可以通过调用
DefineFont2.getCodes()
来检索,glyphIndex是
DefineFont2.getCodes()
中匹配字符的索引

但是,在单个SWF文件中使用多个字体的情况下,很难将每个
DefineText
与相应的
DefineFont2
匹配,因为没有属性来标识用于每个
DefineText
DefineFont2

为了解决这个问题,我提出了一个自学习算法,它将尝试为每个
DefineText
猜测正确的
DefineFont2
,从而正确导出原始文本

为了对原始文本进行反向工程,我创建了一个名为
FontLearner
的类:

public class FontLearner {

    private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
    private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();

    /**
     * The same characters from the same font will have similar advance values.
     * This constant defines the allowed difference between two advance values
     * before they are treated as the same character
     */
    private static final int ADVANCE_THRESHOLD = 10;

    /**
     * Some characters have outlier advance values despite being compared
     * to the same character
     * This constant defines the minimum accuracy level for each String
     * before it is associated with the given font
     */
    private static final double ACCURACY_THRESHOLD = 0.9;

    /**
     * This method adds a DefineFont2 to the learner, and a DefineText
     * associated with the font to teach the learner about the given font.
     * 
     * @param font The font to add to the learner
     * @param text The text associated with the font
     */
    private void addFont(DefineFont2 font, DefineText text) {
        fonts.add(font);
        HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
        advancesMap.put(font.getIdentifier(), advances);

        List<Integer> codes = font.getCodes();

        List<TextSpan> spans = text.getSpans();
        for (TextSpan span : spans) {
            List<GlyphIndex> characters = span.getCharacters();
            for (GlyphIndex character : characters) {
                int glyphIndex = character.getGlyphIndex();
                char c = (char) (int) codes.get(glyphIndex);

                int advance = character.getAdvance();
                advances.put(c, advance);
            }
        }
    }

    /**
     * 
     * @param text The DefineText to retrieve the original String from
     * @return The String retrieved from the given DefineText
     */
    public String getString(DefineText text) {
        StringBuilder sb = new StringBuilder();

        List<TextSpan> spans = text.getSpans();

        DefineFont2 font = null;
        for (DefineFont2 getFont : fonts) {
            List<Integer> codes = getFont.getCodes();
            HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
            if (advances == null) {
                advances = new HashMap<Character, Integer>();
                advancesMap.put(getFont.getIdentifier(), advances);
            }

            boolean notFound = true;
            int totalMisses = 0;
            int totalCount = 0;

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                totalCount += characters.size();

                int misses = 0;
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    if (codes.size() > glyphIndex) {
                        char c = (char) (int) codes.get(glyphIndex);

                        Integer getAdvance = advances.get(c);
                        if (getAdvance != null) {
                            notFound = false;

                            if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
                                misses += 1;
                            }
                        }
                    } else {
                        notFound = false;
                        misses = characters.size();

                        break;
                    }
                }

                totalMisses += misses;
            }

            double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;

            if (accuracy > ACCURACY_THRESHOLD && !notFound) {
                font = getFont;

                // teach this DefineText to the FontLearner if there are
                // any new characters
                for (TextSpan span : spans) {
                    List<GlyphIndex> characters = span.getCharacters();
                    for (GlyphIndex character : characters) {
                        int glyphIndex = character.getGlyphIndex();
                        char c = (char) (int) codes.get(glyphIndex);

                        int advance = character.getAdvance();
                        if (advances.get(c) == null) {
                            advances.put(c, advance);
                        }
                    }
                }
                break;
            }
        }

        if (font != null) {
            List<Integer> codes = font.getCodes();

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    char c = (char) (int) codes.get(glyphIndex);
                    sb.append(c);
                }
                sb = new StringBuilder(sb.toString().trim());
                sb.append(" ");
            }
        }

        return sb.toString().trim();
    }
}
公共类{
私有最终ArrayList字体=新建ArrayList();
private final HashMap advancesMap=新HashMap();
/**
*来自同一字体的相同字符将具有相似的高级值。
*此常量定义两个前进值之间的允许差值
*在它们被视为同一个字符之前
*/
专用静态最终int-ADVANCE_阈值=10;
/**
*尽管进行了比较,但某些字符仍具有异常提前值
*一模一样
*该常数定义每个字符串的最小精度级别
*在它与给定字体关联之前
*/
专用静态最终双精度_阈值=0.9;
/**
*此方法向学习者添加DefineFont2和DefineText
*与字体关联,向学习者教授给定字体。
* 
*@param font要添加到学习者的字体
*@param text与字体关联的文本
*/
专用void addFont(定义字体2,定义文本文本){
字体。添加(字体);
HashMap advances=新的HashMap();
advancesMap.put(font.getIdentifier(),advances);
列表代码=font.getCodes();
List span=text.getspan();
用于(文本跨度:跨度){
List characters=span.getCharacters();
用于(索引字符:个字符){
int glyphIndex=character.getGlyphIndex();
char c=(char)(int)code.get(glyphIndex);
int advance=character.getAdvance();
预付款。投入(c,预付款);
}
}
}
/**
*
    Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
    CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
    commandLine.addArgument("\"" + swfFile.toString() + "\"");
    DefaultExecutor executor = new DefaultExecutor();
    executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
                                            //0 for file not found, -1 for error

    ByteArrayOutputStream stdout = new ByteArrayOutputStream();
    PumpStreamHandler psh = new PumpStreamHandler(stdout);
    executor.setStreamHandler(psh);
    int exitValue;
    try{
        exitValue = executor.execute(commandLine);
    }catch(org.apache.commons.exec.ExecuteException ex){
        psh.stop();
    }
    if(!executor.isFailure(exitValue)){
       String out = stdout.toString("UTF-8"); // here you have the extracted text
    }
public class FontLearner {

    private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
    private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();

    /**
     * The same characters from the same font will have similar advance values.
     * This constant defines the allowed difference between two advance values
     * before they are treated as the same character
     */
    private static final int ADVANCE_THRESHOLD = 10;

    /**
     * Some characters have outlier advance values despite being compared
     * to the same character
     * This constant defines the minimum accuracy level for each String
     * before it is associated with the given font
     */
    private static final double ACCURACY_THRESHOLD = 0.9;

    /**
     * This method adds a DefineFont2 to the learner, and a DefineText
     * associated with the font to teach the learner about the given font.
     * 
     * @param font The font to add to the learner
     * @param text The text associated with the font
     */
    private void addFont(DefineFont2 font, DefineText text) {
        fonts.add(font);
        HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
        advancesMap.put(font.getIdentifier(), advances);

        List<Integer> codes = font.getCodes();

        List<TextSpan> spans = text.getSpans();
        for (TextSpan span : spans) {
            List<GlyphIndex> characters = span.getCharacters();
            for (GlyphIndex character : characters) {
                int glyphIndex = character.getGlyphIndex();
                char c = (char) (int) codes.get(glyphIndex);

                int advance = character.getAdvance();
                advances.put(c, advance);
            }
        }
    }

    /**
     * 
     * @param text The DefineText to retrieve the original String from
     * @return The String retrieved from the given DefineText
     */
    public String getString(DefineText text) {
        StringBuilder sb = new StringBuilder();

        List<TextSpan> spans = text.getSpans();

        DefineFont2 font = null;
        for (DefineFont2 getFont : fonts) {
            List<Integer> codes = getFont.getCodes();
            HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
            if (advances == null) {
                advances = new HashMap<Character, Integer>();
                advancesMap.put(getFont.getIdentifier(), advances);
            }

            boolean notFound = true;
            int totalMisses = 0;
            int totalCount = 0;

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                totalCount += characters.size();

                int misses = 0;
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    if (codes.size() > glyphIndex) {
                        char c = (char) (int) codes.get(glyphIndex);

                        Integer getAdvance = advances.get(c);
                        if (getAdvance != null) {
                            notFound = false;

                            if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
                                misses += 1;
                            }
                        }
                    } else {
                        notFound = false;
                        misses = characters.size();

                        break;
                    }
                }

                totalMisses += misses;
            }

            double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;

            if (accuracy > ACCURACY_THRESHOLD && !notFound) {
                font = getFont;

                // teach this DefineText to the FontLearner if there are
                // any new characters
                for (TextSpan span : spans) {
                    List<GlyphIndex> characters = span.getCharacters();
                    for (GlyphIndex character : characters) {
                        int glyphIndex = character.getGlyphIndex();
                        char c = (char) (int) codes.get(glyphIndex);

                        int advance = character.getAdvance();
                        if (advances.get(c) == null) {
                            advances.put(c, advance);
                        }
                    }
                }
                break;
            }
        }

        if (font != null) {
            List<Integer> codes = font.getCodes();

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    char c = (char) (int) codes.get(glyphIndex);
                    sb.append(c);
                }
                sb = new StringBuilder(sb.toString().trim());
                sb.append(" ");
            }
        }

        return sb.toString().trim();
    }
}
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());

FontLearner learner = new FontLearner();
DefineFont2 font = null;

List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
    font = (DefineFont2) object;
} else if (object instanceof DefineText) {
    DefineText text = (DefineText) object;
    if (font != null) {
        learner.addFont(font, text);
        font = null;
    }
    String line = learner.getString(text); // reverse engineers the line
}