Applescript重复字数计数_Applescript_Word Count_Duplicates

Applescript重复字数计数

applescript

Applescript重复字数计数,applescript,word-count,duplicates,Applescript,Word Count,Duplicates,我如何创建一个applescript，对pdf中的重复单词进行计数，然后将结果显示在一个层次结构中，重复最多的单词位于顶部（带计数），第二多的单词位于顶部，依此类推？我想在学校里使用它，这样在将ppt转换成pdf后，我可以运行这个脚本来查看演示文稿中最重要的内容理想情况下，它会过滤掉诸如：the、so、it等词。您要查找的最后一部分很简单只需建立一个列表，检查单词是否在其中 set ignoreList to {"to", "is"} set reportFile to "/

我如何创建一个applescript，对pdf中的重复单词进行计数，然后将结果显示在一个层次结构中，重复最多的单词位于顶部（带计数），第二多的单词位于顶部，依此类推？我想在学校里使用它，这样在将ppt转换成pdf后，我可以运行这个脚本来查看演示文稿中最重要的内容

理想情况下，它会过滤掉诸如：the、so、it等词。

您要查找的最后一部分很简单

只需建立一个列表，检查单词是否在其中

    set ignoreList to {"to", "is"}
    set reportFile to "/Users/USERNAME/Desktop/Word Frequencies.txt"
set theTextFile to "Users/USERNAME/Desktop/foo.txt")


set word_list to every word of (do shell script "cat " & quoted form of theTextFile)

    set word_frequency_list to {}

    repeat with the_word_ref in word_list
        set the_current_word to contents of the_word_ref
        if the_current_word is not in ignoreList then

            set word_info to missing value

            repeat with record_ref in word_frequency_list
                if the_word of record_ref = the_current_word then
                    set word_info to contents of record_ref
                    exit repeat
                end if
            end repeat

            if word_info = missing value then
                set word_info to {the_word:the_current_word, the_count:1}
                set end of word_frequency_list to word_info
            else
                set the_count of word_info to (the_count of word_info) + 1
            end if

        end if
    end repeat
    --return word_frequency_list

    set the_report_list to {}
    repeat with word_info in word_frequency_list
        set end of the_report_list to quote & the_word of word_info & ¬
            quote & "  - appears " & the_count of word_info & " times."
    end repeat

    set AppleScript's text item delimiters to return
    set the_report to the_report_list as text
    do shell script "echo  " & quoted form of the_report & " >  " & quoted form of reportFile
    set AppleScript's text item delimiters to ""
    delay 1
    do shell script " open   " & quoted form of reportFile

我还修改了一些代码，使用shell脚本来读/写文件。只是因为我更喜欢使用它而不是文本编辑。

虽然它在applescript中是可行的，如markhunte所示，但速度非常慢。如果要处理较大的文本片段或大量文件，applescript的速度非常慢。在我的测试中，我放弃了它。所以，这里有一个简短的shell脚本，如果需要，可以从applescript调用它，这非常快

#!/bin/sh

[ "$1" = "" ] || [ "$2" = "" ] && echo "$0 [wordsfile] [textfile]" && exit 1 

INFILE="$2"
WORDS="${2}.words"
EXWORDS="$1"

echo "File $INFILE has `cat $INFILE | wc -w ` words."
echo "Excluding the `cat $EXWORDS | wc -w` words."

echo "Extracting words from file and removing common words..."
grep -o -E '\w{3,}' $INFILE | grep -x -i -v -f $EXWORDS > $WORDS

echo "Top 10 most frequest words in $INFILE are..."
cat "$WORDS" | tr [:upper:] [:lower:] | sort | uniq -c | sort -rn | head -10

# Clean up
rm $WORDS

到目前为止，我从一本在线书籍中获得了代码，但是AppleScript编辑器在第17行遇到了一个问题，那就是AppleScript正在编译这本书，但它没有创建新的文本编辑文档。此外，它的注释显示频率最高的单词第一，第二个单词第二，等等。我仍然需要过滤某些“常用”单词。因此，我找到了那个家伙试图将此代码作为他的代码传递的地方，建议删除行

return word\u frequency\u list

，这一行可以生成新的文本编辑文档。现在，我只需要根据单词的数量对单词进行排序，并过滤掉特定的单词。

我刚刚完成了一个我在互联网上发现的示例

并不意味着他试图将代码冒充为自己的。？。不管你以什么方式到达那里。。。