如何使用Java从互联网下载和保存文件?
有一个在线文件(如http://www.example.com/information.asp
)我需要抓取并保存到一个目录。 我知道有几种方法可以逐行抓取和在线阅读文件(URL),但有没有办法使用Java下载和保存文件?
试试Java NIO :
URL website = new URL("http://www.website.com/information.asp"); ReadableByteChannel rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream("information.html"); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
使用transferFrom()
可能比从源通道读取并写入此通道的简单循环有效得多。 许多操作系统可以直接从源通道传输字节到文件系统caching中,而无需实际复制它们。
在这里查看更多。
注意 :transferFrom中的第三个参数是要传输的最大字节数。 Integer.MAX_VALUE
最多可传输2 ^ 31个字节, Long.MAX_VALUE
最多允许2 ^ 63个字节(比现有文件大)。
使用apache commons-io ,只需一行代码:
FileUtils.copyURLToFile(URL, File)
更简单的使用:
URL website = new URL("http://www.website.com/information.asp"); try (InputStream in = website.openStream()) { Files.copy(in, target, StandardCopyOption.REPLACE_EXISTING); }
public void saveUrl(final String filename, final String urlString) throws MalformedURLException, IOException { BufferedInputStream in = null; FileOutputStream fout = null; try { in = new BufferedInputStream(new URL(urlString).openStream()); fout = new FileOutputStream(filename); final byte data[] = new byte[1024]; int count; while ((count = in.read(data, 0, 1024)) != -1) { fout.write(data, 0, count); } } finally { if (in != null) { in.close(); } if (fout != null) { fout.close(); } } }
你需要处理exception,可能是这个方法的外部。
下载文件需要你阅读,无论哪种方式,你将不得不以某种方式通过文件。 而不是一行一行,你可以从stream中读取字节:
BufferedInputStream in = new BufferedInputStream(new URL("http://www.website.com/information.asp").openStream()) byte data[] = new byte[1024]; int count; while((count = in.read(data,0,1024)) != -1) { out.write(data, 0, count); }
使用Java 7+
使用以下方法从Internet下载文件并将其保存到某个目录中:
private static Path download(String sourceURL, String targetDirectory) throws IOException { URL url = new URL(sourceURL); String fileName = sourceURL.substring(sourceURL.lastIndexOf('/') + 1, sourceURL.length()); Path targetPath = new File(targetDirectory + File.separator + fileName).toPath(); Files.copy(url.openStream(), targetPath, StandardCopyOption.REPLACE_EXISTING); return targetPath; }
文档在这里 。
这个答案几乎和select的答案一样,但有两个增强:它是一个方法,它closures了FileOutputStream对象:
public static void downloadFileFromURL(String urlString, File destination) { try { URL website = new URL(urlString); ReadableByteChannel rbc; rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream(destination); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); fos.close(); rbc.close(); } catch (IOException e) { e.printStackTrace(); } }
import java.io.*; import java.net.*; public class filedown { public static void download(String address, String localFileName) { OutputStream out = null; URLConnection conn = null; InputStream in = null; try { URL url = new URL(address); out = new BufferedOutputStream(new FileOutputStream(localFileName)); conn = url.openConnection(); in = conn.getInputStream(); byte[] buffer = new byte[1024]; int numRead; long numWritten = 0; while ((numRead = in.read(buffer)) != -1) { out.write(buffer, 0, numRead); numWritten += numRead; } System.out.println(localFileName + "\t" + numWritten); } catch (Exception exception) { exception.printStackTrace(); } finally { try { if (in != null) { in.close(); } if (out != null) { out.close(); } } catch (IOException ioe) { } } } public static void download(String address) { int lastSlashIndex = address.lastIndexOf('/'); if (lastSlashIndex >= 0 && lastSlashIndex < address.length() - 1) { download(address, address.substring(lastSlashIndex + 1)); } else { System.err.println("Could not figure out local file name for "+address); } } public static void main(String[] args) { for (int i = 0; i < args.length; i++) { download(args[i]); } } }
就我个人而言,我发现Apache的HttpClient不仅仅是我需要做的一切。 这里是使用HttpClient的一个很好的教程
这是另一个java7变体基于布赖恩风险的答案使用try-with语句:
public static void downloadFileFromURL(String urlString, File destination) throws Throwable { URL website = new URL(urlString); try( ReadableByteChannel rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream(destination); ){ fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); } }
这里有许多优雅而有效的答案。 但是简洁可以使我们失去一些有用的信息。 特别是,人们通常不希望将连接错误视为exception ,并且可能需要对待某种与networking相关的错误,例如,决定是否应该重试下载。
这里有一个方法,不会为networking错误抛出exception(仅用于真正的特殊问题,如格式不正确的URL或写入文件的问题)
/** * Downloads from a (http/https) URL and saves to a file. * Does not consider a connection error an Exception. Instead it returns: * * 0=ok * 1=connection interrupted, timeout (but something was read) * 2=not found (FileNotFoundException) (404) * 3=server error (500...) * 4=could not connect: connection timeout (no internet?) java.net.SocketTimeoutException * 5=could not connect: (server down?) java.net.ConnectException * 6=could not resolve host (bad host, or no internet - no dns) * * @param file File to write. Parent directory will be created if necessary * @param url http/https url to connect * @param secsConnectTimeout Seconds to wait for connection establishment * @param secsReadTimeout Read timeout in seconds - trasmission will abort if it freezes more than this * @return See above * @throws IOException Only if URL is malformed or if could not create the file */ public static int saveUrl(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout) throws IOException { Files.createDirectories(file.getParent()); // make sure parent dir exists , this can throw exception URLConnection conn = url.openConnection(); // can throw exception if bad url if( secsConnectTimeout > 0 ) conn.setConnectTimeout(secsConnectTimeout * 1000); if( secsReadTimeout > 0 ) conn.setReadTimeout(secsReadTimeout * 1000); int ret = 0; boolean somethingRead = false; try (InputStream is = conn.getInputStream()) { try (BufferedInputStream in = new BufferedInputStream(is); OutputStream fout = Files .newOutputStream(file)) { final byte data[] = new byte[8192]; int count; while((count = in.read(data)) > 0) { somethingRead = true; fout.write(data, 0, count); } } } catch(java.io.IOException e) { int httpcode = 999; try { httpcode = ((HttpURLConnection) conn).getResponseCode(); } catch(Exception ee) {} if( somethingRead && e instanceof java.net.SocketTimeoutException ) ret = 1; else if( e instanceof FileNotFoundException && httpcode >= 400 && httpcode < 500 ) ret = 2; else if( httpcode >= 400 && httpcode < 600 ) ret = 3; else if( e instanceof java.net.SocketTimeoutException ) ret = 4; else if( e instanceof java.net.ConnectException ) ret = 5; else if( e instanceof java.net.UnknownHostException ) ret = 6; else throw e; } return ret; }
有一个简单的用法的问题:
org.apache.commons.io.FileUtils.copyURLToFile(URL, File)
如果您需要下载并保存非常大的文件,或者一般情况下,如果您需要在连接断开的情况下自动重试。
我在这种情况下build议Apache HttpClient与org.apache.commons.io.FileUtils一起。 例如:
GetMethod method = new GetMethod(resource_url); try { int statusCode = client.executeMethod(method); if (statusCode != HttpStatus.SC_OK) { logger.error("Get method failed: " + method.getStatusLine()); } org.apache.commons.io.FileUtils.copyInputStreamToFile( method.getResponseBodyAsStream(), new File(resource_file)); } catch (HttpException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { method.releaseConnection(); }
可以使用Apache的HttpComponents
而不是Commons-IO
来下载文件。 此代码允许您根据其URL下载Java文件并将其保存在特定的目标位置。
public static boolean saveFile(URL fileURL, String fileSavePath) { boolean isSucceed = true; CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet(fileURL.toString()); httpGet.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"); httpGet.addHeader("Referer", "https://www.google.com"); try { CloseableHttpResponse httpResponse = httpClient.execute(httpGet); HttpEntity fileEntity = httpResponse.getEntity(); if (fileEntity != null) { FileUtils.copyInputStreamToFile(fileEntity.getContent(), new File(fileSavePath)); } } catch (IOException e) { isSucceed = false; } httpGet.releaseConnection(); return isSucceed; }
与单行代码相反:
FileUtils.copyURLToFile(fileURL, new File(fileSavePath), URLS_FETCH_TIMEOUT, URLS_FETCH_TIMEOUT);
这段代码可以让你更好地控制一个进程,并且让你不仅指定超时,而且指定User-Agent
和Referer
值,这对于许多网站来说是非常重要的。
总结(并以某种方式抛光和更新)以前的答案。 以下三种方法实际上是等效的。 (我添加了明确的超时,因为我认为他们是必须的,没有人希望在连接丢失时永远冻结下载。)
public static void saveUrl1(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (BufferedInputStream in = new BufferedInputStream( streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ); OutputStream fout = Files.newOutputStream(file)) { final byte data[] = new byte[8192]; int count; while((count = in.read(data)) > 0) fout.write(data, 0, count); } } public static void saveUrl2(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (ReadableByteChannel rbc = Channels.newChannel( streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ); FileChannel channel = FileChannel.open(file, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE) ) { channel.transferFrom(rbc, 0, Long.MAX_VALUE); } } public static void saveUrl3(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (InputStream in = streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ) { Files.copy(in, file, StandardCopyOption.REPLACE_EXISTING); } } public static InputStream streamFromUrl(URL url,int secsConnectTimeout,int secsReadTimeout) throws IOException { URLConnection conn = url.openConnection(); if(secsConnectTimeout>0) conn.setConnectTimeout(secsConnectTimeout*1000); if(secsReadTimeout>0) conn.setReadTimeout(secsReadTimeout*1000); return conn.getInputStream(); }
我没有发现显着的差异,对我来说似乎都是对的。 他们是安全和高效的。 (在速度上的差异似乎几乎没有关系 – 我在本地服务器写入180Mb的SSD磁盘时,波动大约1.2至1.5分段)。 他们不需要外部库。 所有的工作与任意大小和(根据我的经验)HTTPredirect。
此外,如果找不到资源(通常为错误404),则抛出FileNotFoundException
,如果DNSparsing失败,则抛出java.net.UnknownHostException
; 其他IOException在传输过程中对应于错误。
(标记为社区wiki,随时添加信息或更正)
public class DownloadManager { static String urls = "[WEBSITE NAME]"; public static void main(String[] args) throws IOException{ URL url = verify(urls); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); InputStream in = null; String filename = url.getFile(); filename = filename.substring(filename.lastIndexOf('/') + 1); FileOutputStream out = new FileOutputStream("C:\\Java2_programiranje/Network/DownloadTest1/Project/Output" + File.separator + filename); in = connection.getInputStream(); int read = -1; byte[] buffer = new byte[4096]; while((read = in.read(buffer)) != -1){ out.write(buffer, 0, read); System.out.println("[SYSTEM/INFO]: Downloading file..."); } in.close(); out.close(); System.out.println("[SYSTEM/INFO]: File Downloaded!"); } private static URL verify(String url){ if(!url.toLowerCase().startsWith("http://")) { return null; } URL verifyUrl = null; try{ verifyUrl = new URL(url); }catch(Exception e){ e.printStackTrace(); } return verifyUrl; } }
您可以使用netloader for Java在一行中执行此操作:
new NetFile(new File("my/zips/1.zip"), "https://example.com/example.zip", -1).load(); //returns true if succeed, otherwise false.
在下划线库中有方法$ .fetch()。
pom.xml中:
<groupId>com.github.javadev</groupId> <artifactId>underscore-lodash</artifactId> <version>1.23</version>
代码示例:
import com.github.underscore.lodash.$; public class Download { public static void main(String ... args) { String text = $.fetch("https://stackoverflow.com/questions" + "/921262/how-to-download-and-save-a-file-from-internet-using-java").text(); } }