Matlab 如何将.bib文件读取为文本并将其分隔'；什么是文献计量目标领域？_Matlab

Matlab 如何将.bib文件读取为文本并将其分隔'；什么是文献计量目标领域？

matlab

Matlab 如何将.bib文件读取为文本并将其分隔'；什么是文献计量目标领域？,matlab,Matlab,我有一个.bib文件，我正在使用MATLAB从中提取不同的字段。目标是计算不同的文献计量指标，如h指数。我尝试了textscan（），但因为每篇文章的字段都不相同，所以没有完成所有的工作。围嘴如下： @article{LIM20072054, title = "Prevention of cardiovascular disease in high-risk individuals in low-income and middle-income countries: health effec

我有一个.bib文件，我正在使用MATLAB从中提取不同的字段。目标是计算不同的文献计量指标，如h指数。我尝试了textscan（），但因为每篇文章的字段都不相同，所以没有完成所有的工作。围嘴如下：

@article{LIM20072054,
title = "Prevention of cardiovascular disease in high-risk individuals in low-income and middle-income countries: health effects and costs",
journal = "The Lancet",
volume = "370",
number = "9604",
pages = "2054 - 2062",
year = "2007",
issn = "0140-6736",
doi = "https://doi.org/10.1016/S0140-6736(07)61699-7",
url = "http://www.sciencedirect.com/science/article/pii/S0140673607616997",
author = "Stephen S Lim and Thomas A Gaziano and Emmanuela Gakidou and K Srinath Reddy and Farshad Farzadfar and Rafael Lozano and Anthony Rodgers",
abstract = "Summary}

我试着用fgetl（）来获取行，但我需要一次读取所有文件，也许可以用{with}来分隔文章，在我们知道字段名的情况下，有人知道如何提取具有不同字段的未格式化文本吗？这是第一个代码

a = fopen('C:\Users\u3f\Downloads\a.bib');
textI='@article{%s title = %q %*s %*s %*s %*s year = %q %*s %*s %*s %*s abstract = %q %*s';
C = textscan(a,textI,'Delimiter','\n')
fclose(a)

这可能需要一点工作，但应该会得到你想要的

其思想是逐行扫描，首先查找以

@article{

开头的行。然后创建一个块并添加以下行，直到找到以

结尾的行（请注意，如果bibtex的字段以

结尾，则可能需要进行一些修改）

当找到块的结尾时，它被转换为一个结构，其中bibtex上的每个条目都成为一个字段。条目的关键字也作为名为

name

的字段添加。处理完所有块后，将有一个名为

entryList

的单元格，每个bibtex条目有一个结构

请记住，对于复杂的条目，您可能需要执行更复杂的文本解析以使所有内容正常工作

a = fopen('a.bib');
insideEntry = false;
currEntry = {};
entryList = {};
while(~feof(a))
  lin = fgetl(a); % Pull one line at a time
  if(insideEntry) % If you are inside an @article block
    currEntry = [currEntry lin]; % Append line
    if(regexp(lin, '$*}')) % Check for the end of a block
      insideEntry = false;
      entryname = extractBetween(currEntry{1}, '@article{',',');
      entryStruct = struct;
      entryStruct.name = entryname{1};
      for it = 2:length(currEntry)
        sepLine = strsplit(currEntry{it}, '=');
        if(length(sepLine) == 2)
          fieldName = strrep(strtrim(sepLine{1}),'-','_'); % Fix the keyword name (so it can be a field in a structure)
          sepLine{2} = regexprep(sepLine{2},'$*[",}]',''); % Fix end of entry
          sepLine{2} = regexprep(sepLine{2},'^[ "{]',''); % Fix start of entry
          entryStruct.(fieldName) = sepLine{2}; % Assign text to the struct field
        end
      end
      entryList{end+1} = entryStruct; % Append to the entry list
      currEntry = {};
    end
  elseif(contains(lin, '@article{')) % Look for @article block start line
    insideEntry = true;
    currEntry = [currEntry lin];
  end
end
fclose(a);

对于您提供的bibtex样品，应产生：

entryList{1}

ans = 

struct with fields:

    name: 'LIM20072054'
   title: 'Prevention of cardiovascular disease in high-risk individuals in low-income and middle-income countries: health effects and costs'
 journal: 'The Lancet'
  volume: '370'
  number: '9604'
   pages: '2054 - 2062'
    year: '2007'
    issn: '0140-6736'
     doi: 'https://doi.org/10.1016/S0140-6736(07)61699-7'
     url: 'http://www.sciencedirect.com/science/article/pii/S0140673607616997'
  author: 'Stephen S Lim and Thomas A Gaziano and Emmanuela Gakidou and K Srinath Reddy and Farshad Farzadfar and Rafael Lozano and Anthony Rodgers'
abstract: 'Summary'

您可以使用BibTeX解析库为您读取文件并将其转换为数据结构。这样，您就不必自己使用基本的I/O函数对这些文件进行低级解析，而Matlab为其没有内置支持的文件格式提供了基本的I/O函数

Matlab可以使用Java库，因此您可以使用

（这里是。）

您需要编写一个解析器。BibTeX是一种语言，就像MATLAB、C、XML等。解析器使用语言语法知识来解释文本。在解析器上的第一步很好。这可能还需要大量的工作才能使其具有足够的通用性来解析任意BibTeX，例如，在您查找

@文章的地方{'

，您可能应该查找15种左右已知的条目类型中的任何一种，或者将其设置为在

之后接受任何标记的通用类型。您使用的是哪个版本的MATLAB？我的是2014a，它在2016b引入的包含和提取函数之间抛出错误。您可以使用strcmp strfind和/或regexprep来获取同样的功能这是一个很好的解决方案，我可以在Matlab中使用java库，但是有没有一种方法可以从Matlab安装java库，而无需使用Eclipse之类的java IDE？是的。不直接使用Matlab代码，但可以手动完成。只需下载JAR文件，将它们保存在源代码树中的某个位置（我喜欢“lib/java/-/whater.JAR”），并使用Matlab的

javaaddpath（）

函数将JAR文件添加到Matlab Java类路径中。（如果有很多JAR文件，请编写一个小函数，找到“lib/Java/*/*.JAR”，并在所有JAR文件上添加调用

javaaddpath（）

）或者，您可以使用Maven的

mvn安装依赖项：复制依赖项

一次获取所有JAR。