如何使用Golang自定义扫描字符串文字并扩展内存以将整个文件加载到内存中？_Go_String Literals

如何使用Golang自定义扫描字符串文字并扩展内存以将整个文件加载到内存中？

如何使用Golang自定义扫描字符串文字并扩展内存以将整个文件加载到内存中？,go,string-literals,Go,String Literals,我一直在想如何实现我最初认为是一个简单的程序。我有一个引文文本文件，所有引文都用“$$”分隔我想让程序解析报价文件，并随机选择3个报价显示和标准输出文件中有1022个引号 func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) { // If we are at the end of the file and there's no more data then we're

我一直在想如何实现我最初认为是一个简单的程序。我有一个引文文本文件，所有引文都用“$$”分隔

我想让程序解析报价文件，并随机选择3个报价显示和标准输出

文件中有1022个引号

func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) {

    // If we are at the end of the file and there's no more data then we're done
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    // If we are at the end of the file and there IS more data return it
    if atEOF {
        return len(data), data, nil
    }

    // If we find a $ then check if the next rune after is also a $. If so we
    // want to advance past the second $ and return a token up to but not
    // including the first $.
    if i := bytes.IndexByte(data, '$'); i >= 0 {
        if len(data) > i && data[i+1] == '$' {
            return i + 2, data[0:i], nil
        }
    }

    // Request more data.
    return 0, nil, nil
}

当我尝试分割文件时，会出现以下错误：缺少“

我似乎不知道如何使用字符串文本分配$$，我一直得到：
失踪的

这是自定义扫描仪：

onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {  
    for i := 0; i < len(data); i++ { 
        //if data[i] == "$$" {              # this is what I did originally
        //if data[i:i+2] == "$$" {    # (mismatched types []byte and string)
        //if data[i:i+2] == `$$` {    # throws (mismatched types []byte and string)
        // below throws syntax error: unexpected $ AND missing '
        if data[1:i+2] == '$$' {   
            return i + 1, data[:i], nil  
        }  
    }

在golang中，单引号

”

用于单字符（所谓的“符文”-内部是一个带有unicode码点的

int32

），双引号用于长度超过1个字符的字符串：

“$$”

因此，解析器在第一个美元符号之后等待结束符文chanacter

“

这里有一篇好文章：

更新：如果要避免将所有

数据

强制转换为字符串，可以通过以下方式进行检查：

...
   onDollarSign := func(data []byte, atEOF bool) (advance int, token []byte, err error) {  
        for i := 0; i < len(data); i++ {  
            if data[i] == '$' && data[i+1] == '$' {  /////   <----
                return i + 1, data[:i], nil  
            }  
        }  
        fmt.Print(data)  
        return 0, data, bufio.ErrFinalToken  
    } 
...

。。。
onDollarSign:=func（数据[]字节，atEOF布尔）（高级整数，令牌[]字节，错误）{
对于i:=0；i如果data[i]='$'&&data[i+1]='$'{//我根据stdlib函数重写了分割函数
我还没有完全测试过它，所以你应该练习一下。你还应该决定如何处理空白，比如文件末尾的换行符
func onDollarSign(data []byte, atEOF bool) (advance int, token []byte, err error) {

    // If we are at the end of the file and there's no more data then we're done
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    // If we are at the end of the file and there IS more data return it
    if atEOF {
        return len(data), data, nil
    }

    // If we find a $ then check if the next rune after is also a $. If so we
    // want to advance past the second $ and return a token up to but not
    // including the first $.
    if i := bytes.IndexByte(data, '$'); i >= 0 {
        if len(data) > i && data[i+1] == '$' {
            return i + 2, data[0:i], nil
        }
    }

    // Request more data.
    return 0, nil, nil
}

如果你最终还是要读取整个文件，那么使用扫描仪有点复杂。我会读取整个文件，然后简单地将其拆分为引号列表：
package main

import (
    "bytes"
    "io/ioutil"
    "log"
    "math/rand"
    "os"
)

func main() {
    // Slurp file.
    contents, err := ioutil.ReadFile("/Users/bryan/Dropbox/quotes_file.txt")
    if err != nil {
            log.Fatal(err)
    }

    // Split the quotes
    separator := []byte("$$") // Convert string to []byte
    quotes := bytes.Split(contents, separator)

    // Select three random quotes and write them to stdout
    for i := 0; i < 3; i++ {
            n := rand.Intn(len(quotes))
            quote := quotes[n]

            os.Stdout.Write(quote)
            os.Stdout.Write([]byte{'\n'}) // new line, if necessary
    }
}

主程序包
进口(
“字节”
“io/ioutil”
“日志”
“数学/兰德”
“操作系统”
)
func main（）{
//Slurp文件。
contents，err:=ioutil.ReadFile（“/Users/bryan/Dropbox/quotes\u file.txt”）
如果错误！=零{
log.Fatal（错误）
}
//拆分报价
分隔符：=[]字节（$$）//将字符串转换为[]字节
quotes:=字节。拆分（内容，分隔符）
//选择三个随机引号并将其写入标准输出
对于i:=0；i<3；i++{
n:=rand.Intn（len（引号））
引号：=引号[n]
os.Stdout.Write（引用）
os.Stdout.Write（[]字节{'\n'}）//如有必要，请换行
}
}

如果您在读取文件之前选择了三个引号，则使用扫描仪是有意义的；然后您可以在到达最后一个引号后停止读取。
扫描引号（scanQuotes
）类似于扫描行（）。例如
package main

import (
    "bufio"
    "bytes"
    "fmt"
    "os"
    "strings"
)

func dropCRLF(data []byte) []byte {
    if len(data) > 0 && data[len(data)-1] == '\n' {
        data = data[0 : len(data)-1]
        if len(data) > 0 && data[len(data)-1] == '\r' {
            data = data[0 : len(data)-1]
        }
    }
    return data
}

func scanQuotes(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(dropCRLF(data)) == 0 {
        return len(data), nil, nil
    }
    sep := []byte("$$")
    if i := bytes.Index(data, sep); i >= 0 {
        return i + len(sep), dropCRLF(data[0:i]), nil
    }
    if atEOF {
        return len(data), dropCRLF(data), nil
    }
    return 0, nil, nil
}

func main() {
    /*
       quote_file, err := os.Open("/Users/bryan/Dropbox/quotes_file.txt")
       if err != nil {
        log.Fatal(err)
       }
    */
    quote_file := strings.NewReader(shakespeare) // test data

    var quotes []string
    scanner := bufio.NewScanner(quote_file)
    scanner.Split(scanQuotes)
    for scanner.Scan() {
        quotes = append(quotes, scanner.Text())
    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "reading quotes:", err)
    }

    fmt.Println(len(quotes))
    for i, quote := range quotes {
        fmt.Println(i, quote)
    }
}

var shakespeare = `To be, or not to be: that is the question$$All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.$$Romeo, Romeo! wherefore art thou Romeo?$$Now is the winter of our discontent$$Is this a dagger which I see before me, the handle toward my hand?$$Some are born great, some achieve greatness, and some have greatness thrust upon them.$$Cowards die many times before their deaths; the valiant never taste of death but once.$$Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.$$A man can die but once.$$How sharper than a serpent’s tooth it is to have a thankless child!` + "\n"

游乐场：
输出：
10
0 To be, or not to be: that is the question
1 All the world‘s a stage, and all the men and women merely players. They have their exits and their entrances; And one man in his time plays many parts.
2 Romeo, Romeo! wherefore art thou Romeo?
3 Now is the winter of our discontent
4 Is this a dagger which I see before me, the handle toward my hand?
5 Some are born great, some achieve greatness, and some have greatness thrust upon them.
6 Cowards die many times before their deaths; the valiant never taste of death but once.
7 Full fathom five thy father lies, of his bones are coral made. Those are pearls that were his eyes. Nothing of him that doth fade, but doth suffer a sea-change into something rich and strange.
8 A man can die but once.
9 How sharper than a serpent’s tooth it is to have a thankless child!

我更改了do do do double quote:无法将“$$”转换为byteI thinnk类型，这是由于比较数据[I]='$'
。尝试使用数据[I:I+1]===“$$”
。能否加载代码和一些数据示例以使其处于活动状态？我的错误：数据[I:I+2]=“$$”
/quote\u生成器。go:38:无效操作：数据[I:I+2]==“$$”（不匹配的类型[]字节和字符串）由于外部报价文件，我不确定如何将此文件添加到play.golang.org。我试图创建一个以$$分隔的元素的字符串，但Get无法将quote_文件（类型字符串）用作bufio.NewScanner的参数中的类型io.Reader：字符串未实现io.Reader（缺少读取方法）因此，只有在处理部分文件时才应使用扫描仪？在这种情况下，在不完全读取文件的情况下，您如何计算文件中的引号总数？如果您不知道引号的数量，您别无选择，只能读取整个文件。在这种情况下，使用扫描仪比slurpin更复杂g文件并拆分字节。