如何在Ruby中从CSV标题行创建变量_Ruby_Csv_Eval

如何在Ruby中从CSV标题行创建变量

ruby csv

如何在Ruby中从CSV标题行创建变量,ruby,csv,eval,Ruby,Csv,Eval,我已经做了一些事情，但不知道如何在2-3行简洁的Ruby代码中做到这一点，尽管它似乎可以通过正确的编码技巧在短时间内完成我的“file.csv”标题行如下所示： Ticker,"Price","Market Cap","Average Volume","Analyst Recom","Relative Strength Index (14)","Sector","Industry","Dividend Yield","Beta","52-Week Low","52-Week High","50

我已经做了一些事情，但不知道如何在2-3行简洁的Ruby代码中做到这一点，尽管它似乎可以通过正确的编码技巧在短时间内完成

我的“file.csv”标题行如下所示：

Ticker,"Price","Market Cap","Average Volume","Analyst Recom","Relative Strength Index (14)","Sector","Industry","Dividend Yield","Beta","52-Week Low","52-Week High","50-Day Low","50-Day High","Company","50-Day Simple Moving Average","Country","P/E","Forward P/E","PEG","P/S","P/B","P/Cash","P/Free Cash Flow","Payout Ratio","EPS (ttm)","EPS growth this year","EPS growth next year","EPS growth past 5 years","EPS growth next 5 years","Sales growth past 5 years","EPS growth quarter over quarter","Sales growth quarter over quarter","Shares Outstanding","Shares Float","Insider Ownership","Insider Transactions","Institutional Ownership","Institutional Transactions","Float Short","Short Ratio","Return on Assets","Return on Equity","Return on Investment","Current Ratio","Quick Ratio","LT Debt/Equity","Total Debt/Equity","Gross Margin","Operating Margin","Profit Margin","Performance (Week)","Performance (Month)","Performance (Quarter)","Performance (Half Year)","Performance (Year)","Performance (Year)","Average True Range","Volatility (Week)","Volatility (Month)","20-Day Simple Moving Average","200-Day Simple Moving Average","Change from Open","Gap","Relative Volume","Change","Volume","Earnings Date","No."

然后是大约7000行，看起来像：

FCD,27.89,,0.94,,66.75,"Financial","Exchange Traded Fund",3.13%,,19.75%,-0.36%,6.37%,-0.36%,"Focus Morningstar Consumer Defensive ETF",2.28%,"USA",,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,0.36%,3.07%,9.93%,10.85%,,2.01%,0.12,0.04%,0.21%,1.26%,6.69%,0.00%,-0.04%,0.96,-0.04%,900,,2186
FCE-A,14.59,2496.64,960.33,2.50,54.76,"Financial","Property Management",,2.83,56.55%,-24.87%,36.61%,-7.77%,"Forest City Enterprises Inc.",11.49%,"USA",,69.48,,2.2
5,1.58,10.87,,,-0.02,410.77%,250.00%,-10.06%,8.00%,1.54%,-28.77%,-9.00%,171.12,136.94,0.26%,-8.25%,74.80%,-0.13%,4.62%,6.59,0.46%,-0.12%,0.54%,,,4.35,4.35,39.54%,
4.82%,4.60%,-4.01%,8.96%,25.45%,13.10%,-22.80%,23.43%,0.44,3.07%,2.98%,-0.89%,1.49%,-1.62%,0.00%,0.47,-1.62%,449874,12/8/2010 4:30:00 PM,2187

给定一个股票代码“FCD”，我试图将大约30个从标题字段中获取的新变量批量分配给与“FCD”行匹配的值

每个新变量的前缀为

fv\uu

，其余变量的前缀为字段名减去所有标点、空格、引号等（变量不友好的内容）

因此，对于“FCD”，我尝试给出我的脚本：

fv_Ticker="FCD"  
fv_Price=27.89  
fv_MarketCap=""  
fv_VolatilityMonth=0.21  # if get String not Float because of trailing % in "0.21%" that's okay, will deal with it later
etc.

我应该注意到，我退出了使用任何类型的

CSV.read

或

CSV.foreach

，原因是，读取需要几分钟，因此在重复运行的实时应用程序中是不可接受的

相反，我一直在使用Ruby管道向“awk”分配从文件中立即读取的单个变量，如下所示：

$stock="FCD"
$dividend_yield = IO.readlines("|awk -F, '$1==\"#{$stock}\" {print $9}' finviz.AllStocks.csv")[0].to_f
$beta = IO.readlines("|awk -F, '$1==\"#{$stock}\" {print $10}' AllStocks.csv")[0].to_f

但现在它变得太毛茸茸了，以至于不能一概而论。它需要处理任何具有未知字段的类似CSV的文件，直到看到第一行为止。

我想我知道了什么。中间路线，让Ruby只看到两行CSV输入，因此本机处理应该很快：

puts IO.readlines(%`|sed -n "1p; /^#{$stock},/p" AllStocks.csv`) # sed filters out the gazillion other lines that make Ruby slow

它起作用了。由于Ruby只看到一个简短的CSV，一行是标题，一行是数据，这要归功于

sed

，它在一瞬间加载，我有了我的散列：（我在寻找的关联数组）

这个单行版本甚至比其他版本更好。它更短、更简单，而且特别吸引人的是，它将所有标题改编成（几乎）可用于变量的唯一名称，因此我不必担心以后会解析出大部分内容：

csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" AllStocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}

结果是：

csv_data.size
=> 1

csv_data.to_a
=> [[:ticker, :price, :market_cap, :average_volume, :analyst_recom, :relative_strength_index_14, :sector, :industry, :dividend_yield, :beta, :"52week_low", :"52week_high", :"50day_low", :"50day_high", :company, :"50day_simple_moving_average", :country, :pe, :forward_pe, :peg, :ps, :pb, :pcash, :pfree_cash_flow, :payout_ratio, :eps_ttm, :eps_growth_this_year, :eps_growth_next_year, :eps_growth_past_5_years, :eps_growth_next_5_years, :sales_growth_past_5_years, :eps_growth_quarter_over_quarter, :sales_growth_quarter_over_quarter, :shares_outstanding, :shares_float, :insider_ownership, :insider_transactions, :institutional_ownership, :institutional_transactions, :float_short, :short_ratio, :return_on_assets, :return_on_equity, :return_on_investment, :current_ratio, :quick_ratio, :lt_debtequity, :total_debtequity, :gross_margin, :operating_margin, :profit_margin, :performance_week, :performance_month, :performance_quarter, :performance_half_year, :performance_year, :performance_year, :average_true_range, :volatility_week, :volatility_month, :"20day_simple_moving_average", :"200day_simple_moving_average", :change_from_open, :gap, :relative_volume, :change, :volume, :earnings_date, :no], ["ANAD", 2.57, 175.2, 442.65, 2.9, 38.21, "Technology", "Semiconductor - Integrated Circuits", nil, 2.3, "33.85%", "-52.84%", "27.23%", "-20.19%", "Anadigics, Inc.", "-2.63%", "USA", nil, nil, nil, 1.15, 1.05, 3.08, nil, nil, -0.73, "-4002.36%", "57.90%", "-7.14%", "15.67%", "-1.69%", "-564.96%", "-39.37%", 68.17, 66.25, "3.87%", "-7.51%", "47.64%", "-6.24%", "3.30%", 4.95, "-23.34%", "-26.70%", "-26.33%", 4.89, 3.9, 0.0, 0.0, "20.35%", "-32.79%", "-32.27%", "-5.17%", "-7.22%", "21.23%", "-6.20%", "-50.39%", "17.35%", 0.16, "5.14%", "5.51%", "-13.04%", "-3.71%", "-4.10%", "0.00%", 0.95, "-4.10%", 421200, "2/22/2012 7:00:00 AM", 280]]


$company = csv_data[:company][0]
=> "Anadigics, Inc."

csv_data[:volatility_month]
=> ["5.51%"]

为什么是变量？为什么不是一个

{“FCD”=>someObject}

的散列呢？如果不是更好的话，散列也可以，但是它需要更快。将CSV读入Ruby失败了。用变量填充符号表的效率不可能低于哈希。另外，在任何事情上使用全局变量，特别是在这方面，都不是一个好主意。我在bash解决方案中看到，您使用的是Google finance。谷歌金融能否以比CSV或XML更有效的格式提供输出？可能是二进制格式？亚马尔？JSON？所有这些都更有效。可能有用

csv_data.size
=> 1

csv_data.to_a
=> [[:ticker, :price, :market_cap, :average_volume, :analyst_recom, :relative_strength_index_14, :sector, :industry, :dividend_yield, :beta, :"52week_low", :"52week_high", :"50day_low", :"50day_high", :company, :"50day_simple_moving_average", :country, :pe, :forward_pe, :peg, :ps, :pb, :pcash, :pfree_cash_flow, :payout_ratio, :eps_ttm, :eps_growth_this_year, :eps_growth_next_year, :eps_growth_past_5_years, :eps_growth_next_5_years, :sales_growth_past_5_years, :eps_growth_quarter_over_quarter, :sales_growth_quarter_over_quarter, :shares_outstanding, :shares_float, :insider_ownership, :insider_transactions, :institutional_ownership, :institutional_transactions, :float_short, :short_ratio, :return_on_assets, :return_on_equity, :return_on_investment, :current_ratio, :quick_ratio, :lt_debtequity, :total_debtequity, :gross_margin, :operating_margin, :profit_margin, :performance_week, :performance_month, :performance_quarter, :performance_half_year, :performance_year, :performance_year, :average_true_range, :volatility_week, :volatility_month, :"20day_simple_moving_average", :"200day_simple_moving_average", :change_from_open, :gap, :relative_volume, :change, :volume, :earnings_date, :no], ["ANAD", 2.57, 175.2, 442.65, 2.9, 38.21, "Technology", "Semiconductor - Integrated Circuits", nil, 2.3, "33.85%", "-52.84%", "27.23%", "-20.19%", "Anadigics, Inc.", "-2.63%", "USA", nil, nil, nil, 1.15, 1.05, 3.08, nil, nil, -0.73, "-4002.36%", "57.90%", "-7.14%", "15.67%", "-1.69%", "-564.96%", "-39.37%", 68.17, 66.25, "3.87%", "-7.51%", "47.64%", "-6.24%", "3.30%", 4.95, "-23.34%", "-26.70%", "-26.33%", 4.89, 3.9, 0.0, 0.0, "20.35%", "-32.79%", "-32.27%", "-5.17%", "-7.22%", "21.23%", "-6.20%", "-50.39%", "17.35%", 0.16, "5.14%", "5.51%", "-13.04%", "-3.71%", "-4.10%", "0.00%", 0.95, "-4.10%", 421200, "2/22/2012 7:00:00 AM", 280]]


$company = csv_data[:company][0]
=> "Anadigics, Inc."

csv_data[:volatility_month]
=> ["5.51%"]