如何从文本文件中读取列并在C中保存到单独的数组?
在一次熟悉指针的练习中,我用C编写了一个简短的程序,能够从文件中读取文本。我想坚持使用ANSI C 这个程序做得很好,但是我想继续从文本文件中读取列并保存到单独的数组中。有人问过类似的问题,回答时使用了如何从文本文件中读取列并在C中保存到单独的数组?,c,file,pointers,C,File,Pointers,在一次熟悉指针的练习中,我用C编写了一个简短的程序,能够从文件中读取文本。我想坚持使用ANSI C 这个程序做得很好,但是我想继续从文本文件中读取列并保存到单独的数组中。有人问过类似的问题,回答时使用了strtok,或fgets或sscanf,但我什么时候应该使用其中一个而不是另一个 以下是我的注释代码: #include <stdio.h> #include <stdlib.h> char *read_file(char *FILE_INPUT); /*fun
strtok
,或fgets
或sscanf
,但我什么时候应该使用其中一个而不是另一个
以下是我的注释代码:
#include <stdio.h>
#include <stdlib.h>
char *read_file(char *FILE_INPUT); /*function to read file*/
int main(int argc, char **argv) {
char *string; // Pointer to a char
string = read_file("file.txt");
if (string) {
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);
}
return 0;
}
//Returns a pointer to a char,
char *read_file(char *FILE_INPUT) {
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");
//Check if file exists
if (input_stream == NULL) {
perror (FILE_INPUT);
}
else if (input_stream) {
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);
// fread doesn't set it so put a \0 in the last position
// and buffer is now officially a string
buffer[string_size] = '\0';
//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size) {
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
}
// Always remember to close the file.
fclose(input_stream);
}
return buffer;
}
在进一步研究中,我发现
fread
用于允许程序在一个步骤中读取和写入大块数据,因此单独读取列可能不是fread
的目的。因此,我对这种工作的程序实现是错误的
我应该使用getc
、strtok
、sscanf
或getline
来读取这样的文本文件吗?我试图坚持好的编程原则,动态分配内存
编辑: 我的意思是(但不限于)使用良好的c编程技术和动态内存分配 我的第一个想法是用
fgets
取代fread
。更新,多亏了你的帮助,我终于有进展了
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream)) {
printf("%s", buffer);
}
对于上述文本文件打印:
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
B 08902768 1060 800 Test3000
我还使用以下方法从fgets()输入中删除了换行符:
strtok(buffer, "\n");
类似的例子,以及
如何继续将列保存到单独的数组?如果您知道什么是列分隔符以及有多少列,请先使用列分隔符,然后使用行分隔符 下面是
getline
:
它非常好,因为它为您分配空间,不需要知道您的列或行有多少字节
或者您只需使用链接中的代码示例中的getline
来读取整行内容,然后根据需要“解析”并提取列
若你们们粘贴你们们想要的运行程序的方式和你们们的输入,我可以尝试写快速C程序的好答案。现在只是评论式的回答,有太多的评论词:-(
还是因为某种原因你们不能使用图书馆
虽然在等待更好的问题时,我会注意到您可以使用
awk
从文本文件中读取列,但这可能不是您想要的?因为您真正想做的是什么?根据数据和数据,您可以使用scanf
或使用yacc/lex创建的解析器
我想继续只读取此文本文件的某些列
您可以使用任何输入函数执行此操作:getc
、fgets
、sscanf
、getline
…但您必须首先准确定义特定列的含义
- 列可以定义为由特定字符分隔,例如
、、
或TAB,在这种情况下,;
肯定不是正确的选择,因为它将所有分隔字符序列视为单个分隔符:因此strtok()
将被视为只有两列a、b
- 如果用空格分隔,则任何空格或制表符序列、
、strtok
或strpbrk
都可能派上用场strspn
fgets
逐行读取文件,但可能会遇到很长的行的问题。getline
是一种解决方案,但它可能不适用于所有系统。最佳实践有些主观,但“完全验证、逻辑和可读”应该始终是目标
对于读取固定数量的字段(在您的案例中,选择cols 1、2、5
作为未知长度的字符串值)和cols 3、4
作为简单的int
值),只需为合理预期的数据行数分配存储,即可从文件中读取未知数量的行,跟踪已填充的行数,然后在达到已分配存储的限制时根据需要重新分配存储
处理重新分配的一种有效方法是,在需要重新分配时(而不是为每一行额外的数据调用realloc
),重新分配一些合理数量的额外内存块。您可以添加固定数量的新块,将现有块乘以3/2
或2
或满足您需要的其他合理方案。我通常只会在每次达到分配限制时将存储空间增加一倍
由于有固定数量的未知大小的字段,因此只需使用sscanf
分隔五个字段,并通过检查sscanf
返回来验证是否发生了5次转换,就可以简化操作。如果您正在读取未知数量的字段,那么您只需使用相同的重新分配方案来处理上面讨论的用于读取未知数量行的按列扩展
(在这种情况下,不要求任何行具有相同数量的字段,但您可以通过设置包含第一行读取的字段数量的变量,然后验证所有后续行是否具有相同数量的字段来强制检查…)
如评论中所述,使用面向行的输入函数(如fgets
或POSIXgetline
)读取一行数据,然后使用strtok
标记化,或者在这种情况下使用固定数量的字段,简单地使用sscanf
解析数据,这是一种可靠的方法。它提供了允许独立验证(1)从文件读取数据的好处;(2)将数据解析为所需的值。(虽然灵活性较低,但对于某些数据集,您可以使用fscanf
在一个步骤中完成这项工作,但也可以使用inj
strtok(buffer, "\n");
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct {
char *col1, *col2, *col5;
int col3, col4;
} mydata_t;
int main (int argc, char **argv) {
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data))) {
perror ("malloc-data");
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != '\n' && len == MAXC - 1) {
fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
return 1;
}
if (row == arrsz) { /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
}
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
}
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5) {
fprintf (stderr, "error: invalid format line %zu\n", line);
continue; /* get next line */
}
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
row++; /* increment number of row pointers used */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in struct\n");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
}
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
}
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
for (size_t i = 0; i < n; i++) { /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
}
free (data); /* free structs */
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct {
char *col1, *col2, *col5;
int col3, col4;
} mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
}
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
for (size_t i = 0; i < n; i++) { /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
}
free (data); /* free structs */
}
int main (int argc, char **argv) {
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data))) {
perror ("malloc-data");
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != '\n' && len == MAXC - 1) {
fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
return 1;
}
if (row == arrsz) { /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) { /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
}
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
}
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5) {
fprintf (stderr, "error: invalid format line %zu\n", line);
continue; /* get next line */
}
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zu\n", line);
break; /* same reason to break not exit */
}
row++; /* increment number of row pointers used */
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in struct\n");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
}
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)