Java 从API读取数据

Java 从API读取数据,java,json,optimization,csv,Java,Json,Optimization,Csv,我已经编写了一个函数来从外部API读取一些数据。我的函数所做的是,它在从磁盘读取文件时调用该API。我想优化我的代码,以适应大文件(35000条记录)。你能给我提个建议吗 下面是我的代码 public void readCSVFile() { try { br = new BufferedReader(new FileReader(getFileName())); while ((line = br.readLine()) != null) {

我已经编写了一个函数来从外部API读取一些数据。我的函数所做的是,它在从磁盘读取文件时调用该API。我想优化我的代码,以适应大文件(35000条记录)。你能给我提个建议吗

下面是我的代码

public void readCSVFile() {

    try {

        br = new BufferedReader(new FileReader(getFileName()));

        while ((line = br.readLine()) != null) {


            String[] splitLine = line.split(cvsSplitBy);

            String campaign = splitLine[0];
            String adGroup =  splitLine[1];
            String url = splitLine[2];              
            long searchCount = getSearchCount(url);             

            StringBuilder sb = new StringBuilder();
            sb.append(campaign + ",");
            sb.append(adGroup + ",");               
            sb.append(searchCount + ",");               
            writeToFile(sb, getNewFileName());

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}

private long getSearchCount(String url) {
    long recordCount = 0;
    try {

        DefaultHttpClient httpClient = new DefaultHttpClient();

        HttpGet getRequest = new HttpGet(
                "api.com/querysearch?q="
                        + url);
        getRequest.addHeader("accept", "application/json");

        HttpResponse response = httpClient.execute(getRequest);

        if (response.getStatusLine().getStatusCode() != 200) {
            throw new RuntimeException("Failed : HTTP error code : "
                    + response.getStatusLine().getStatusCode());
        }

        BufferedReader br = new BufferedReader(new InputStreamReader(
                (response.getEntity().getContent())));

        String output;

        while ((output = br.readLine()) != null) {
            try {

                JSONObject json = (JSONObject) new JSONParser()
                        .parse(output);
                JSONObject result = (JSONObject) json.get("result");
                recordCount = (long) result.get("count");
                System.out.println(url + "=" + recordCount);

            } catch (Exception e) {
                System.out.println(e.getMessage());
            }

        }

        httpClient.getConnectionManager().shutdown();

    } catch (Exception e) {
        e.getStackTrace();
    }
    return recordCount;

}

由于远程调用比本地磁盘访问慢,所以您可能希望以某种方式并行化或批处理远程调用。如果您无法对远程API进行批处理调用,但它允许多个并发读取,那么您可能希望使用线程池之类的东西来进行远程调用:

public void readCSVFile() {
    // exception handling ignored for space
    br = new BufferedReader(new FileReader(getFileName()));
    List<Future<String>> futures = new ArrayList<Future<String>>();
    ExecutorService pool = Executors.newFixedThreadPool(5);

    while ((line = br.readLine()) != null) {
        final String[] splitLine = line.split(cvsSplitBy);
        futures.add(pool.submit(new Callable<String> {
            public String call() {
                long searchCount = getSearchCount(splitLine[2]);
                return new StringBuilder()
                    .append(splitLine[0]+ ",")
                    .append(splitLine[1]+ ",")
                    .append(searchCount + ",")
                    .toString();
            }
        }));
    }

    for (Future<String> fs: futures) {
        writeToFile(fs.get(), getNewFileName());
    }

    pool.shutdown();
}
public void readCSVFile(){
//已忽略空间的异常处理
br=新的BufferedReader(新的文件读取器(getFileName());
列表期货=新的ArrayList();
ExecutorService池=Executors.newFixedThreadPool(5);
而((line=br.readLine())!=null){
最终字符串[]拆分行=行.拆分(cvsSplitBy);
futures.add(pool.submit)(新的可调用{
公共字符串调用(){
long searchCount=getSearchCount(拆分行[2]);
返回新的StringBuilder()
.append(拆分行[0]+“,”)
.append(拆分行[1]+“,”)
.append(searchCount+,“”)
.toString();
}
}));
}
for(未来财政司司长:期货){
writeToFile(fs.get(),getNewFileName());
}
pool.shutdown();
}

但是,理想情况下,如果可能的话,您确实希望从远程API读取单个批处理。

您的瓶颈肯定是HTTP内容。我想优化这个。如果可能的话,可能不会关闭连接或获得批量结果。是的,存在问题。问题是,我必须用一个来自文件的GET参数调用这个API。谢谢你的建议。顺便说一句,我无法进行单批读取。但允许多个并发读取。