使用POI将大型结果集写入Excel文件
这是内联瓦特/ 写一个大的ResultSet到一个文件,但有问题的文件是一个Excel文件。
我正在使用Apache POI库来编写一个Excel文件,其中包含从ResultSet对象中检索的大型数据集。 数据可能从几千条到约一百万条; 不知道这是如何转换成Excel格式的文件系统字节。
以下是我编写的一个testing代码,用于检查编写如此大的结果集所需的时间,以及CPU和内存的性能影响。
protected void writeResultsetToExcelFile(ResultSet rs, int numSheets, String fileNameAndPath) throws Exception { BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileNameAndPath)); int numColumns = rs.getMetaData().getColumnCount(); Workbook wb = ExcelFileUtil.createExcelWorkBook(true, numSheets); Row heading = wb.getSheetAt(0).createRow(1); ResultSetMetaData rsmd = rs.getMetaData(); for(int x = 0; x < numColumns; x++) { Cell cell = heading.createCell(x+1); cell.setCellValue(rsmd.getColumnLabel(x+1)); } int rowNumber = 2; int sheetNumber = 0; while(rs.next()) { if(rowNumber == 65001) { log("Sheet " + sheetNumber + "written; moving onto to sheet " + (sheetNumber + 1)); sheetNumber++; rowNumber = 2; } Row row = wb.getSheetAt(sheetNumber).createRow(rowNumber); for(int y = 0; y < numColumns; y++) { row.createCell(y+1).setCellValue(rs.getString(y+1)); wb.write(bos); } rowNumber++; } //wb.write(bos); bos.close(); }
上面的代码没有太多的运气。 创build的文件似乎迅速增长(〜70Mb每秒)。 所以我在大约10分钟后停止了执行(当文件达到7Gb时,杀死了JVM)并试图在Excel 2007中打开文件。打开文件的那一刻,文件大小变为8k(!),只有头文件和第一个行被创build。 不知道我在这里错过了什么。
有任何想法吗?
哦。 我认为你写了94.4万次的工作簿。 你的wb.write(bos)调用在内部循环中。 我不确定这与Workbook类的语义是否一致? 从我在该类的Javadoc中可以看出的方法来说,该方法将整个工作簿写出到指定的输出stream中。 而且随着事情的发展,每一行都会写出一行。
这就解释了为什么你也看到了一行。 第一个要写入文件的工作簿(一行)就是所有正在显示的内容 – 之后是7GB的垃圾。
使用SXSSF poi 3.8
package example; import java.io.FileInputStream; import java.io.FileOutputStream; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.util.CellReference; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import org.apache.poi.xssf.usermodel.XSSFWorkbook; public class SXSSFexample { public static void main(String[] args) throws Throwable { FileInputStream inputStream = new FileInputStream("mytemplate.xlsx"); XSSFWorkbook wb_template = new XSSFWorkbook(inputStream); inputStream.close(); SXSSFWorkbook wb = new SXSSFWorkbook(wb_template); wb.setCompressTempFiles(true); SXSSFSheet sh = (SXSSFSheet) wb.getSheetAt(0); sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk for(int rownum = 4; rownum < 100000; rownum++){ Row row = sh.createRow(rownum); for(int cellnum = 0; cellnum < 10; cellnum++){ Cell cell = row.createCell(cellnum); String address = new CellReference(cell).formatAsString(); cell.setCellValue(address); } } FileOutputStream out = new FileOutputStream("tempsxssf.xlsx"); wb.write(out); out.close(); } }
这个需要:
- POI-OOXML-3.8.jar,
- POI-3.8.jar,
- POI-OOXML-模式-3.8.jar,
- STAX-API-1.0.1.jar,
- XML的API-1.0.b2.jar,
- XMLBeans的-2.3.0.jar,
- 公地编解码器1.5.jar,
- dom4j的-1.6.1.jar
有用的链接
除非你必须写公式或格式,否则应该考虑写出一个.csv文件。 无限简单,无限快速,Excel将根据定义自动正确地转换为.xls或.xlsx。
你可以使用SXSSFWorkbook实现Workbook ,如果你在你的excel中使用样式,你可以通过Flyweight Pattern
来caching样式来提高你的性能。
现在我拿@ Gian的build议,把每个Workbook的logging数量限制在500k,然后把其余部分翻到下一个Workbook。 似乎是体面的工作。 对于上面的configuration,每个工作簿花了我大约10分钟。
我更新了BigGridDemo以支持多个工作表。
BigExcelWriterImpl.java
package com.gdais.common.apache.poi.bigexcelwriter; import static com.google.common.base.Preconditions.*; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.io.OutputStreamWriter; import java.io.Writer; import java.util.Enumeration; import java.util.HashMap; import java.util.LinkedHashMap; import java.util.Map; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; import java.util.zip.ZipOutputStream; import javax.annotation.Nonnull; import javax.annotation.Nullable; import org.apache.commons.io.FilenameUtils; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.xssf.usermodel.XSSFSheet; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import com.google.common.base.Function; import com.google.common.collect.ImmutableList; import com.google.common.collect.Iterables; public class BigExcelWriterImpl implements BigExcelWriter { private static final String XML_ENCODING = "UTF-8"; @Nonnull private final File outputFile; @Nullable private final File tempFileOutputDir; @Nullable private File templateFile = null; @Nullable private XSSFWorkbook workbook = null; @Nonnull private LinkedHashMap<String, XSSFSheet> addedSheets = new LinkedHashMap<String, XSSFSheet>(); @Nonnull private Map<XSSFSheet, File> sheetTempFiles = new HashMap<XSSFSheet, File>(); BigExcelWriterImpl(@Nonnull File outputFile) { this.outputFile = outputFile; this.tempFileOutputDir = outputFile.getParentFile(); } @Override public BigExcelWriter createWorkbook() { workbook = new XSSFWorkbook(); return this; } @Override public BigExcelWriter addSheets(String... sheetNames) { checkState(workbook != null, "workbook must be created before adding sheets"); for (String sheetName : sheetNames) { XSSFSheet sheet = workbook.createSheet(sheetName); addedSheets.put(sheetName, sheet); } return this; } @Override public BigExcelWriter writeWorkbookTemplate() throws IOException { checkState(workbook != null, "workbook must be created before writing template"); checkState(templateFile == null, "template file already written"); templateFile = File.createTempFile(FilenameUtils.removeExtension(outputFile.getName()) + "-template", ".xlsx", tempFileOutputDir); System.out.println(templateFile); FileOutputStream os = new FileOutputStream(templateFile); workbook.write(os); os.close(); return this; } @Override public SpreadsheetWriter createSpreadsheetWriter(String sheetName) throws IOException { if (!addedSheets.containsKey(sheetName)) { addSheets(sheetName); } return createSpreadsheetWriter(addedSheets.get(sheetName)); } @Override public SpreadsheetWriter createSpreadsheetWriter(XSSFSheet sheet) throws IOException { checkState(!sheetTempFiles.containsKey(sheet), "writer already created for this sheet"); File tempSheetFile = File.createTempFile( FilenameUtils.removeExtension(outputFile.getName()) + "-sheet" + sheet.getSheetName(), ".xml", tempFileOutputDir); Writer out = null; try { out = new OutputStreamWriter(new FileOutputStream(tempSheetFile), XML_ENCODING); SpreadsheetWriter sw = new SpreadsheetWriterImpl(out); sheetTempFiles.put(sheet, tempSheetFile); return sw; } catch (RuntimeException e) { if (out != null) { out.close(); } throw e; } } private static Function<XSSFSheet, String> getSheetName = new Function<XSSFSheet, String>() { @Override public String apply(XSSFSheet sheet) { return sheet.getPackagePart().getPartName().getName().substring(1); } }; @Override public File completeWorkbook() throws IOException { FileOutputStream out = null; try { out = new FileOutputStream(outputFile); ZipOutputStream zos = new ZipOutputStream(out); Iterable<String> sheetEntries = Iterables.transform(sheetTempFiles.keySet(), getSheetName); System.out.println("Sheet Entries: " + sheetEntries); copyTemplateMinusEntries(templateFile, zos, sheetEntries); for (Map.Entry<XSSFSheet, File> entry : sheetTempFiles.entrySet()) { XSSFSheet sheet = entry.getKey(); substituteSheet(entry.getValue(), getSheetName.apply(sheet), zos); } zos.close(); out.close(); return outputFile; } finally { if (out != null) { out.close(); } } } private static void copyTemplateMinusEntries(File templateFile, ZipOutputStream zos, Iterable<String> entries) throws IOException { ZipFile templateZip = new ZipFile(templateFile); @SuppressWarnings("unchecked") Enumeration<ZipEntry> en = (Enumeration<ZipEntry>) templateZip.entries(); while (en.hasMoreElements()) { ZipEntry ze = en.nextElement(); if (!Iterables.contains(entries, ze.getName())) { System.out.println("Adding template entry: " + ze.getName()); zos.putNextEntry(new ZipEntry(ze.getName())); InputStream is = templateZip.getInputStream(ze); copyStream(is, zos); is.close(); } } } private static void substituteSheet(File tmpfile, String entry, ZipOutputStream zos) throws IOException { System.out.println("Adding sheet entry: " + entry); zos.putNextEntry(new ZipEntry(entry)); InputStream is = new FileInputStream(tmpfile); copyStream(is, zos); is.close(); } private static void copyStream(InputStream in, OutputStream out) throws IOException { byte[] chunk = new byte[1024]; int count; while ((count = in.read(chunk)) >= 0) { out.write(chunk, 0, count); } } @Override public Workbook getWorkbook() { return workbook; } @Override public ImmutableList<XSSFSheet> getSheets() { return ImmutableList.copyOf(addedSheets.values()); } }
SpreadsheetWriterImpl.java
package com.gdais.common.apache.poi.bigexcelwriter; import java.io.IOException; import java.io.Writer; import java.util.Calendar; import org.apache.poi.ss.usermodel.DateUtil; import org.apache.poi.ss.util.CellReference; class SpreadsheetWriterImpl implements SpreadsheetWriter { private static final String XML_ENCODING = "UTF-8"; private final Writer _out; private int _rownum; SpreadsheetWriterImpl(Writer out) { _out = out; } @Override public SpreadsheetWriter closeFile() throws IOException { _out.close(); return this; } @Override public SpreadsheetWriter beginSheet() throws IOException { _out.write("<?xml version=\"1.0\" encoding=\"" + XML_ENCODING + "\"?>" + "<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\">"); _out.write("<sheetData>\n"); return this; } @Override public SpreadsheetWriter endSheet() throws IOException { _out.write("</sheetData>"); _out.write("</worksheet>"); closeFile(); return this; } /** * Insert a new row * * @param rownum * 0-based row number */ @Override public SpreadsheetWriter insertRow(int rownum) throws IOException { _out.write("<row r=\"" + (rownum + 1) + "\">\n"); this._rownum = rownum; return this; } /** * Insert row end marker */ @Override public SpreadsheetWriter endRow() throws IOException { _out.write("</row>\n"); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, String value, int styleIndex) throws IOException { String ref = new CellReference(_rownum, columnIndex).formatAsString(); _out.write("<cr=\"" + ref + "\" t=\"inlineStr\""); if (styleIndex != -1) { _out.write(" s=\"" + styleIndex + "\""); } _out.write(">"); _out.write("<is><t>" + value + "</t></is>"); _out.write("</c>"); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, String value) throws IOException { createCell(columnIndex, value, -1); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, double value, int styleIndex) throws IOException { String ref = new CellReference(_rownum, columnIndex).formatAsString(); _out.write("<cr=\"" + ref + "\" t=\"n\""); if (styleIndex != -1) { _out.write(" s=\"" + styleIndex + "\""); } _out.write(">"); _out.write("<v>" + value + "</v>"); _out.write("</c>"); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, double value) throws IOException { createCell(columnIndex, value, -1); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, Calendar value, int styleIndex) throws IOException { createCell(columnIndex, DateUtil.getExcelDate(value, false), styleIndex); return this; } @Override public SpreadsheetWriter createCell(int columnIndex, Calendar value) throws IOException { createCell(columnIndex, value, -1); return this; } }