Curl 类似于NetCDF的FITS文本提取?
我正在使用NetCDF和FITS文件,我让Tika为我工作 正在提取NetCDF文件中的标题文本,但我只能获取基本文件 FITS文件的元数据。标题文本提取对配合不起作用吗 档案 为适应以下情况: 我只看到了基本的文件元数据,而不是来自 标题 这就是我用于NetCDF文件的内容(也使用tika--gui查看 标题文本): curl-X-PUT--data binary@age4_timeseries.nc --标题“内容类型:text/-t” curl-T age4_timeseries.nc--标题“接受: 文本/纯文本“ 我查阅了Tika Jira,找到了2012年的参考资料: 但这似乎并未添加到Tika中 我从Tika收到了这个:Curl 类似于NetCDF的FITS文本提取?,curl,netcdf,apache-tika,fits,Curl,Netcdf,Apache Tika,Fits,我正在使用NetCDF和FITS文件,我让Tika为我工作 正在提取NetCDF文件中的标题文本,但我只能获取基本文件 FITS文件的元数据。标题文本提取对配合不起作用吗 档案 为适应以下情况: 我只看到了基本的文件元数据,而不是来自 标题 这就是我用于NetCDF文件的内容(也使用tika--gui查看 标题文本): curl-X-PUT--data binary@age4_timeseries.nc --标题“内容类型:text/-t” curl-T age4_timeseries.nc--
Content-Length: 40968000
Content-Type: application/fits
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser
X-TIKA:digest:MD5: cce03f62a68c09ec562f9e8e05b54b40
X-TIKA:digest:SHA256: b3f0c61409cbd7f2c9aeb8bdfa0798d529383db699c1055b8a12a68267b948dd
resourceName: mirc0000.fits
但希望收到标题的内容如下:
SIMPLE = T / file does conform to FITS standard
BITPIX = 16 / number of bits per data pixel
NAXIS = 3 / number of data axes
NAXIS1 = 1280 / length of data axis 1
NAXIS2 = 16 / length of data axis 2
NAXIS3 = 1000 / length of data axis 3
EXTEND = T / FITS dataset may contain extensions
COMMENT FITS (Flexible Image Transport System) format is defined in 'AstronomyCOMMENT and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H
BZERO = 32768 / offset data range to that of unsigned short
BSCALE = 1 / default scaling factor
DATE = '2006-09-01T04:01:02' / File creation date (YYYY-MM-DDThh:mm:ss UTC)
TELESCOP= 'CHARA array 330m max baseline, 6dishes' / Telescope
INSTURME= 'MIRC spectro/combiner' / The data acquisition instrument
ORIGIN = 'Mount Wilson Institute' / Origin of the Observation
SITELAT = '34.13 ' / Latitude (Geodetic, VLBI, to be verified)
SITELONG= '118.03 ' / Longitude (Geodetic,
VLBI, to be verified)
SITEELEV= '1742.00 ' / Altitude above MSL, to be verified
HISTORY = 'Multi-Dish FITS data' / File modification history
OBJECT = 'HD_174639' / Target name
DATE-OBS= '09/01/2006' / UT date (YYYY-MM-DD)
UTC-OBS = '04:00:10' / Universal Time hh:mm:ss
LST-OBS = '18:48:41' / Local Sidereal Time hh:mm:ss
CHARA-TM= '04:00:11' / CHARA time hh:mm:ss
LOST-TKS= ' 0' / CHARA lost Ticks in RT Clock t
LOST-SEC= ' 0' / CHARA lost seconds in rt clock s
S1-TARGE= 41.342992001 / Delay line S1 target metrology
S2-TARGE= 38.610911409 / Delay line S2 target metrology
E1-TARGE= 0. / Delay line E1 target metrology
E2-TARGE= 44. / Delay line E2 target metrology
W1-TARGE= 0. / Delay line W1 target metrology
W2-TARGE= 0. / Delay line W2 target metrology
WAVELEN = 1.65 / Central wavelength
BANDWID = 0.3 / Bandwidth of spectrum
EXPOSURE= 5.483692 / Effective integration time in ms
ROWOFFS = 5 / Sub-image Y offset prom pixel 0
COLOFFS = 38 / Sub-image X offset prom pixel 0
NREADS = 8 / Number of multiple reads for pixel
FRMPRST = 1000 / Number of frames per reset
VOFFSET = 4. / PICNIC offset voltage
VD = 5. / PICNIC drain bias
ICTL = 3.3 / PICNIC warm OA offset voltage
END
开始工作了!要知道这一点,您必须在构建GDAL之前安装CFITSIO库。 CFITSIO图书馆信息: 从此处下载GDAL: 枪口 焦油xvf ./configure--with cfitsio 制造 安装
像往常一样运行Tika。现在它就像一个冠军 你能举一个更详细的例子来说明你希望看到什么吗?Tika在引擎盖下使用GDAL制作文件。反过来,虽然有点原始,但至少可以将标题关键字提取到GDAL的元数据结构中。最终,我想将这些结果拉到Solr中,以便为将来的搜索目的编制索引。