Curl 类似于NetCDF的FITS文本提取?

Curl 类似于NetCDF的FITS文本提取?,curl,netcdf,apache-tika,fits,Curl,Netcdf,Apache Tika,Fits,我正在使用NetCDF和FITS文件,我让Tika为我工作 正在提取NetCDF文件中的标题文本,但我只能获取基本文件 FITS文件的元数据。标题文本提取对配合不起作用吗 档案 为适应以下情况: 我只看到了基本的文件元数据,而不是来自 标题 这就是我用于NetCDF文件的内容(也使用tika--gui查看 标题文本): curl-X-PUT--data binary@age4_timeseries.nc --标题“内容类型:text/-t” curl-T age4_timeseries.nc--

我正在使用NetCDF和FITS文件,我让Tika为我工作 正在提取NetCDF文件中的标题文本,但我只能获取基本文件 FITS文件的元数据。标题文本提取对配合不起作用吗 档案

为适应以下情况: 我只看到了基本的文件元数据,而不是来自 标题

这就是我用于NetCDF文件的内容(也使用tika--gui查看 标题文本): curl-X-PUT--data binary@age4_timeseries.nc --标题“内容类型:text/-t” curl-T age4_timeseries.nc--标题“接受: 文本/纯文本“

我查阅了Tika Jira,找到了2012年的参考资料:

但这似乎并未添加到Tika中

我从Tika收到了这个:

Content-Length: 40968000
Content-Type: application/fits
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser
X-TIKA:digest:MD5: cce03f62a68c09ec562f9e8e05b54b40
X-TIKA:digest:SHA256: b3f0c61409cbd7f2c9aeb8bdfa0798d529383db699c1055b8a12a68267b948dd
resourceName: mirc0000.fits
但希望收到标题的内容如下:

SIMPLE  =                    T / file does conform to FITS standard 
BITPIX  =                   16 / number of bits per data pixel                  
NAXIS   =                    3 / number of data axes                            
NAXIS1  =                 1280 / length of data axis 1                          
NAXIS2  =                   16 / length of data axis 2                          
NAXIS3  =                 1000 / length of data axis 3                          
EXTEND  =                    T / FITS dataset may contain extensions            
COMMENT   FITS (Flexible Image Transport System) format is defined in 'AstronomyCOMMENT   and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H 
BZERO   =                32768 / offset data range to that of unsigned short    
BSCALE  =                    1 / default scaling factor                         
DATE    = '2006-09-01T04:01:02' / File creation date (YYYY-MM-DDThh:mm:ss UTC)  
TELESCOP= 'CHARA array 330m max baseline, 6dishes' / Telescope                  
INSTURME= 'MIRC spectro/combiner' / The data acquisition instrument             
ORIGIN  = 'Mount Wilson Institute' / Origin of the Observation                  
SITELAT = '34.13   '           / Latitude (Geodetic, VLBI, to be verified)      
SITELONG= '118.03  '           / Longitude (Geodetic, 
VLBI, to be verified)     
SITEELEV= '1742.00 '           / Altitude above MSL, to be verified             
HISTORY = 'Multi-Dish FITS data' / File modification history                    
OBJECT  = 'HD_174639'          / Target name                                    
DATE-OBS= '09/01/2006'         / UT date (YYYY-MM-DD)                           
UTC-OBS = '04:00:10'           / Universal Time hh:mm:ss                        
LST-OBS = '18:48:41'           / Local Sidereal Time hh:mm:ss                   
CHARA-TM= '04:00:11'           / CHARA time  hh:mm:ss                           
LOST-TKS= '       0'           / CHARA lost Ticks in RT Clock t                 
LOST-SEC= '       0'           / CHARA lost seconds in rt clock s               
S1-TARGE=         41.342992001 / Delay line S1 target metrology                 
S2-TARGE=         38.610911409 / Delay line S2 target metrology                 
E1-TARGE=                   0. / Delay line E1 target metrology                 
E2-TARGE=                  44. / Delay line E2 target metrology                 
W1-TARGE=                   0. / Delay line W1 target metrology                 
W2-TARGE=                   0. / Delay line W2 target metrology                 
WAVELEN =                 1.65 / Central wavelength                             
BANDWID =                  0.3 / Bandwidth of spectrum                          
EXPOSURE=             5.483692 / Effective integration time in ms               
ROWOFFS =                    5 / Sub-image Y offset prom pixel 0                
COLOFFS =                   38 / Sub-image X offset prom pixel 0                
NREADS  =                    8 / Number of multiple reads for pixel             
FRMPRST =                 1000 / Number of frames per reset                     
VOFFSET =                   4. / PICNIC offset voltage                          
VD      =                   5. / PICNIC drain bias                              
ICTL    =                  3.3 / PICNIC warm OA offset voltage                  
END             

开始工作了!要知道这一点,您必须在构建GDAL之前安装CFITSIO库。 CFITSIO图书馆信息:

从此处下载GDAL:

枪口

焦油xvf

./configure--with cfitsio

制造

安装


像往常一样运行Tika。现在它就像一个冠军

你能举一个更详细的例子来说明你希望看到什么吗?Tika在引擎盖下使用GDAL制作文件。反过来,虽然有点原始,但至少可以将标题关键字提取到GDAL的元数据结构中。最终,我想将这些结果拉到Solr中,以便为将来的搜索目的编制索引。