在执行页面的javascript后保存页面的html输出

有一个网站，我想刮，首先加载一个HTML / JS修改表单input字段使用JS然后POST。我怎样才能得到POST页面的最终html输出？

我试图用phantomjs做到这一点，但似乎只有一个选项来呈现图像文件。谷歌search表明它应该是可能的，但我不知道如何。我的尝试：

var page = require('webpage').create(); var fs = require('fs'); page.open('https://www.somesite.com/page.aspx', function () { page.evaluate(function(){ }); page.render('export.png'); fs.write('1.html', page.content, 'w'); phantom.exit(); });

这个代码将被用于客户端，我不能指望他安装太多的包（nodejs，casperjs等）

谢谢

你所拥有的输出码是正确的，但同步性有问题。在页面完成加载之前，您正在执行的输出行正在执行。你可以绑定到onLoadFinishedcallback，以了解何时发生。见下面的完整代码。

  var page = new WebPage() var fs = require('fs'); page.onLoadFinished = function() { console.log("page load finished"); page.render('export.png'); fs.write('1.html', page.content, 'w'); phantom.exit(); }; page.open("http://www.google.com", function() { page.evaluate(function() { }); });

当使用像谷歌这样的网站，它可能是欺骗，因为它加载速度如此之快，你可以经常执行一个像你有内联的screengrab。在phantomjs中，时序是一件棘手的事情，有时我会用setTimeout来testing时序是否是一个问题。

当我直接复制您的代码，并将url更改为www.google.com时，它运行良好，保存了两个文件：

1.HTML
export.png

请记住，这些文件将写入您运行脚本的位置，而不是您的.js文件所在的位置

我尝试了几种类似的任务，并使用selenium的最佳结果。

在我尝试PhantomJS和Cheerio之前。在页面上执行JS时，Phantom经常崩溃。

我正在使用CasperJS来运行PhantomJS的testing。我将这段代码添加到我的tearDown函数中：

 var require = patchRequire(require); var fs = require('fs'); casper.test.begin("My Test", { tearDown: function(){ casper.capture("export.png"); fs.write("1.html", casper.getHTML(undefined, true), 'w'); }, test: function(test){ // test code casper.run(function(){ test.done(); }); } });

请参阅捕获和getHTML的文档。

经过两天的苦苦挣扎，我终于得到了类似的问题。窍门是PhantomJS官方网站上的waitfor.js例子。要开心！

 "use strict"; function waitFor(testFx, onReady, timeOutMillis) { var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 3000, //< Default Max Timout is 3s start = new Date().getTime(), condition = false, interval = setInterval(function() { if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) { // If not time-out yet and condition not yet fulfilled condition = (typeof(testFx) === "string" ? eval(testFx) : testFx()); //< defensive code } else { if(!condition) { // If condition still not fulfilled (timeout but condition is 'false') console.log("'waitFor()' timeout"); phantom.exit(1); } else { // Condition fulfilled (timeout and/or condition is 'true') console.log("'waitFor()' finished in " + (new Date().getTime() - start) + "ms."); typeof(onReady) === "string" ? eval(onReady) : onReady(); //< Do what it's supposed to do once the condition is fulfilled clearInterval(interval); //< Stop this interval } } }, 250); //< repeat check every 250ms }; var page = require('webpage').create(); // Open Twitter on 'sencha' profile and, onPageLoad, do... page.open("http://twitter.com/#!/sencha", function (status) { // Check for page load success if (status !== "success") { console.log("Unable to access network"); } else { // Wait for 'signin-dropdown' to be visible waitFor(function() { // Check in the page if a specific element is now visible return page.evaluate(function() { return $("#signin-dropdown").is(":visible"); }); }, function() { console.log("The sign-in dialog should be visible now."); phantom.exit(); }); } });

除了使用无头浏览器之外，我想到的一种方法显然是模拟ajax调用，然后合并页面后处理，请求请求。然而，这往往是一种棘手的问题，应该作为最后的手段，除非你真的喜欢挖掘JavaScript代码

这可以很容易地做一些PHP代码和JavaScript使用fopen（）和fwrite（）和这个函数来保存它：var generatedSource = new XMLSerializer（）。serializeToString（document）;

在执行页面的javascript后保存页面的html输出

Selenium支持无头浏览器testing吗？

Python无头浏览器（需要JavaScript支持！）

无头的互联网浏览器？

无头，脚本化的Firefox / Webkit的Linux？