如何使用Java从互联网下载和保存文件？

有一个在线文件（如http://www.example.com/information.asp ）我需要抓取并保存到一个目录。我知道有几种方法可以逐行抓取和在线阅读文件（URL），但有没有办法使用Java下载和保存文件？

试试Java NIO ：

 URL website = new URL("http://www.website.com/information.asp"); ReadableByteChannel rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream("information.html"); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);

使用transferFrom() 可能比从源通道读取并写入此通道的简单循环有效得多。许多操作系统可以直接从源通道传输字节到文件系统caching中，而无需实际复制它们。

在这里查看更多。

注意：transferFrom中的第三个参数是要传输的最大字节数。 Integer.MAX_VALUE最多可传输2 ^ 31个字节， Long.MAX_VALUE最多允许2 ^ 63个字节（比现有文件大）。

使用apache commons-io ，只需一行代码：

 FileUtils.copyURLToFile(URL, File)

更简单的使用：

 URL website = new URL("http://www.website.com/information.asp"); try (InputStream in = website.openStream()) { Files.copy(in, target, StandardCopyOption.REPLACE_EXISTING); }

 public void saveUrl(final String filename, final String urlString) throws MalformedURLException, IOException { BufferedInputStream in = null; FileOutputStream fout = null; try { in = new BufferedInputStream(new URL(urlString).openStream()); fout = new FileOutputStream(filename); final byte data[] = new byte[1024]; int count; while ((count = in.read(data, 0, 1024)) != -1) { fout.write(data, 0, count); } } finally { if (in != null) { in.close(); } if (fout != null) { fout.close(); } } }

你需要处理exception，可能是这个方法的外部。

下载文件需要你阅读，无论哪种方式，你将不得不以某种方式通过文件。而不是一行一行，你可以从stream中读取字节：

 BufferedInputStream in = new BufferedInputStream(new URL("http://www.website.com/information.asp").openStream()) byte data[] = new byte[1024]; int count; while((count = in.read(data,0,1024)) != -1) { out.write(data, 0, count); }

使用Java 7+使用以下方法从Internet下载文件并将其保存到某个目录中：

 private static Path download(String sourceURL, String targetDirectory) throws IOException { URL url = new URL(sourceURL); String fileName = sourceURL.substring(sourceURL.lastIndexOf('/') + 1, sourceURL.length()); Path targetPath = new File(targetDirectory + File.separator + fileName).toPath(); Files.copy(url.openStream(), targetPath, StandardCopyOption.REPLACE_EXISTING); return targetPath; }

文档在这里。

这个答案几乎和select的答案一样，但有两个增强：它是一个方法，它closures了FileOutputStream对象：

  public static void downloadFileFromURL(String urlString, File destination) { try { URL website = new URL(urlString); ReadableByteChannel rbc; rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream(destination); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); fos.close(); rbc.close(); } catch (IOException e) { e.printStackTrace(); } }

 import java.io.*; import java.net.*; public class filedown { public static void download(String address, String localFileName) { OutputStream out = null; URLConnection conn = null; InputStream in = null; try { URL url = new URL(address); out = new BufferedOutputStream(new FileOutputStream(localFileName)); conn = url.openConnection(); in = conn.getInputStream(); byte[] buffer = new byte[1024]; int numRead; long numWritten = 0; while ((numRead = in.read(buffer)) != -1) { out.write(buffer, 0, numRead); numWritten += numRead; } System.out.println(localFileName + "\t" + numWritten); } catch (Exception exception) { exception.printStackTrace(); } finally { try { if (in != null) { in.close(); } if (out != null) { out.close(); } } catch (IOException ioe) { } } } public static void download(String address) { int lastSlashIndex = address.lastIndexOf('/'); if (lastSlashIndex >= 0 && lastSlashIndex < address.length() - 1) { download(address, address.substring(lastSlashIndex + 1)); } else { System.err.println("Could not figure out local file name for "+address); } } public static void main(String[] args) { for (int i = 0; i < args.length; i++) { download(args[i]); } } }

就我个人而言，我发现Apache的HttpClient不仅仅是我需要做的一切。这里是使用HttpClient的一个很好的教程

这是另一个java7变体基于布赖恩风险的答案使用try-with语句：

 public static void downloadFileFromURL(String urlString, File destination) throws Throwable { URL website = new URL(urlString); try( ReadableByteChannel rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream(destination); ){ fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); } }

这里有许多优雅而有效的答案。但是简洁可以使我们失去一些有用的信息。特别是，人们通常不希望将连接错误视为exception ，并且可能需要对待某种与networking相关的错误，例如，决定是否应该重试下载。

这里有一个方法，不会为networking错误抛出exception（仅用于真正的特殊问题，如格式不正确的URL或写入文件的问题）

 /** * Downloads from a (http/https) URL and saves to a file. * Does not consider a connection error an Exception. Instead it returns: * * 0=ok * 1=connection interrupted, timeout (but something was read) * 2=not found (FileNotFoundException) (404) * 3=server error (500...) * 4=could not connect: connection timeout (no internet?) java.net.SocketTimeoutException * 5=could not connect: (server down?) java.net.ConnectException * 6=could not resolve host (bad host, or no internet - no dns) * * @param file File to write. Parent directory will be created if necessary * @param url http/https url to connect * @param secsConnectTimeout Seconds to wait for connection establishment * @param secsReadTimeout Read timeout in seconds - trasmission will abort if it freezes more than this * @return See above * @throws IOException Only if URL is malformed or if could not create the file */ public static int saveUrl(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout) throws IOException { Files.createDirectories(file.getParent()); // make sure parent dir exists , this can throw exception URLConnection conn = url.openConnection(); // can throw exception if bad url if( secsConnectTimeout > 0 ) conn.setConnectTimeout(secsConnectTimeout * 1000); if( secsReadTimeout > 0 ) conn.setReadTimeout(secsReadTimeout * 1000); int ret = 0; boolean somethingRead = false; try (InputStream is = conn.getInputStream()) { try (BufferedInputStream in = new BufferedInputStream(is); OutputStream fout = Files .newOutputStream(file)) { final byte data[] = new byte[8192]; int count; while((count = in.read(data)) > 0) { somethingRead = true; fout.write(data, 0, count); } } } catch(java.io.IOException e) { int httpcode = 999; try { httpcode = ((HttpURLConnection) conn).getResponseCode(); } catch(Exception ee) {} if( somethingRead && e instanceof java.net.SocketTimeoutException ) ret = 1; else if( e instanceof FileNotFoundException && httpcode >= 400 && httpcode < 500 ) ret = 2; else if( httpcode >= 400 && httpcode < 600 ) ret = 3; else if( e instanceof java.net.SocketTimeoutException ) ret = 4; else if( e instanceof java.net.ConnectException ) ret = 5; else if( e instanceof java.net.UnknownHostException ) ret = 6; else throw e; } return ret; }

有一个简单的用法的问题：

 org.apache.commons.io.FileUtils.copyURLToFile(URL, File)

如果您需要下载并保存非常大的文件，或者一般情况下，如果您需要在连接断开的情况下自动重试。

我在这种情况下build议Apache HttpClient与org.apache.commons.io.FileUtils一起。例如：

 GetMethod method = new GetMethod(resource_url); try { int statusCode = client.executeMethod(method); if (statusCode != HttpStatus.SC_OK) { logger.error("Get method failed: " + method.getStatusLine()); } org.apache.commons.io.FileUtils.copyInputStreamToFile( method.getResponseBodyAsStream(), new File(resource_file)); } catch (HttpException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { method.releaseConnection(); }

可以使用Apache的HttpComponents而不是Commons-IO来下载文件。此代码允许您根据其URL下载Java文件并将其保存在特定的目标位置。

 public static boolean saveFile(URL fileURL, String fileSavePath) { boolean isSucceed = true; CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet(fileURL.toString()); httpGet.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"); httpGet.addHeader("Referer", "https://www.google.com"); try { CloseableHttpResponse httpResponse = httpClient.execute(httpGet); HttpEntity fileEntity = httpResponse.getEntity(); if (fileEntity != null) { FileUtils.copyInputStreamToFile(fileEntity.getContent(), new File(fileSavePath)); } } catch (IOException e) { isSucceed = false; } httpGet.releaseConnection(); return isSucceed; }

与单行代码相反：

 FileUtils.copyURLToFile(fileURL, new File(fileSavePath), URLS_FETCH_TIMEOUT, URLS_FETCH_TIMEOUT);

这段代码可以让你更好地控制一个进程，并且让你不仅指定超时，而且指定User-Agent和Referer值，这对于许多网站来说是非常重要的。

总结（并以某种方式抛光和更新）以前的答案。以下三种方法实际上是等效的。（我添加了明确的超时，因为我认为他们是必须的，没有人希望在连接丢失时永远冻结下载。）

 public static void saveUrl1(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (BufferedInputStream in = new BufferedInputStream( streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ); OutputStream fout = Files.newOutputStream(file)) { final byte data[] = new byte[8192]; int count; while((count = in.read(data)) > 0) fout.write(data, 0, count); } } public static void saveUrl2(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (ReadableByteChannel rbc = Channels.newChannel( streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ); FileChannel channel = FileChannel.open(file, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE) ) { channel.transferFrom(rbc, 0, Long.MAX_VALUE); } } public static void saveUrl3(final Path file, final URL url, int secsConnectTimeout, int secsReadTimeout)) throws MalformedURLException, IOException { // Files.createDirectories(file.getParent()); // optional, make sure parent dir exists try (InputStream in = streamFromUrl(url, secsConnectTimeout,secsReadTimeout) ) { Files.copy(in, file, StandardCopyOption.REPLACE_EXISTING); } } public static InputStream streamFromUrl(URL url,int secsConnectTimeout,int secsReadTimeout) throws IOException { URLConnection conn = url.openConnection(); if(secsConnectTimeout>0) conn.setConnectTimeout(secsConnectTimeout*1000); if(secsReadTimeout>0) conn.setReadTimeout(secsReadTimeout*1000); return conn.getInputStream(); }

我没有发现显着的差异，对我来说似乎都是对的。他们是安全和高效的。（在速度上的差异似乎几乎没有关系 – 我在本地服务器写入180Mb的SSD磁盘时，波动大约1.2至1.5分段）。他们不需要外部库。所有的工作与任意大小和（根据我的经验）HTTPredirect。

此外，如果找不到资源（通常为错误404），则抛出FileNotFoundException ，如果DNSparsing失败，则抛出java.net.UnknownHostException ; 其他IOException在传输过程中对应于错误。

（标记为社区wiki，随时添加信息或更正）

 public class DownloadManager { static String urls = "[WEBSITE NAME]"; public static void main(String[] args) throws IOException{ URL url = verify(urls); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); InputStream in = null; String filename = url.getFile(); filename = filename.substring(filename.lastIndexOf('/') + 1); FileOutputStream out = new FileOutputStream("C:\\Java2_programiranje/Network/DownloadTest1/Project/Output" + File.separator + filename); in = connection.getInputStream(); int read = -1; byte[] buffer = new byte[4096]; while((read = in.read(buffer)) != -1){ out.write(buffer, 0, read); System.out.println("[SYSTEM/INFO]: Downloading file..."); } in.close(); out.close(); System.out.println("[SYSTEM/INFO]: File Downloaded!"); } private static URL verify(String url){ if(!url.toLowerCase().startsWith("http://")) { return null; } URL verifyUrl = null; try{ verifyUrl = new URL(url); }catch(Exception e){ e.printStackTrace(); } return verifyUrl; } }

您可以使用netloader for Java在一行中执行此操作：

 new NetFile(new File("my/zips/1.zip"), "https://example.com/example.zip", -1).load(); //returns true if succeed, otherwise false.

在下划线库中有方法$ .fetch（）。

pom.xml中：

  <groupId>com.github.javadev</groupId> <artifactId>underscore-lodash</artifactId> <version>1.23</version>

代码示例：

 import com.github.underscore.lodash.$; public class Download { public static void main(String ... args) { String text = $.fetch("https://stackoverflow.com/questions" + "/921262/how-to-download-and-save-a-file-from-internet-using-java").text(); } }

如何使用Java从互联网下载和保存文件？

我怎样才能提交一个文件从MVC控制器下载？

在PHP中为用户创build一个CSV文件

PHP：强制文件下载和IE，再次

如果wget中存在文件，跳过下载？

在哪里下载旧的Xcode版本？

下载一个网页的工作本地副本

HTTP状态码0是什么意思

我需要Content-Type：application / octet-stream进行文件下载吗？

是否有一个公共FTP服务器来testing上传和下载？

我在哪里可以下载Cygwin的离线安装程序？