操作3000万字符的string

我正在从另一台服务器上下载一个CSV文件作为供应商提供的数据。

我正在使用curl来获取文件的内容并将其保存到一个名为$contents的variables中。

我可以到那个部分就好了，但是我试着用\r和\n来爆炸得到一行数组，但是失败时出现了'内存不足'的错误。

我echo strlen($contents) ，这是约3050万字。

我需要操纵这些值并将它们插入到数据库中。我需要做什么来避免内存分配错误？

PHP因为内存不足而窒息。而不是curl用文件的内容填充一个PHPvariables，使用

 CURLOPT_FILE

选项将文件保存到磁盘。

 //pseudo, untested code to give you the idea $fp = fopen('path/to/save/file', 'w'); curl_setopt($ch, CURLOPT_FILE, $fp); curl_exec ($ch); curl_close ($ch); fclose($fp);

然后，一旦文件被保存，而不是使用file或file_get_contents函数（这会将整个文件加载到内存中，再次杀死PHP），使用fopen和fgets一次读取一行文件。

正如其他答案所说：

你不能拥有所有的记忆
一个解决scheme将是使用CURLOPT_FILE

但是，你可能不想真的创build一个文件; 你可能想要处理内存中的数据……一旦“到达”就使用它。

一个可能的解决scheme可能是定义你自己的stream包装，并使用这个，而不是一个真正的文件， CURLOPT_FILE

首先看到：

stream_wrapper_register
The streamWrapper class
示例类注册为stream包装器

现在，我们来举个例子。

首先，我们来创build我们的stream包装类：

 class MyStream { protected $buffer; function stream_open($path, $mode, $options, &$opened_path) { // Has to be declared, it seems... return true; } public function stream_write($data) { // Extract the lines ; on y tests, data was 8192 bytes long ; never more $lines = explode("\n", $data); // The buffer contains the end of the last line from previous time // => Is goes at the beginning of the first line we are getting this time $lines[0] = $this->buffer . $lines[0]; // And the last line os only partial // => save it for next time, and remove it from the list this time $nb_lines = count($lines); $this->buffer = $lines[$nb_lines-1]; unset($lines[$nb_lines-1]); // Here, do your work with the lines you have in the buffer var_dump($lines); echo '<hr />'; return strlen($data); } }

我所做的是：

工作在数据块（我使用var_dump，但你会做你通常的东西，而不是），当他们到达
请注意，您不会得到“完整行”：行的结尾是块的开始，同一行的开始位于前一个块的末尾; 所以，你必须在stream_write的调用之间保留一些stream_write

接下来，我们注册这个stream包装器，与伪协议“test”一起使用：

 // Register the wrapper stream_wrapper_register("test", "MyStream") or die("Failed to register protocol");

而且，现在，我们按照我们的要求来做，就像我们在写入“真实”文件时所做的一样，就像其他答案一样：

 // Open the "file" $fp = fopen("test://MyTestVariableInMemory", "r+"); // Configuration of curl $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://www.rue89.com/"); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_BUFFERSIZE, 256); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FILE, $fp); // Data will be sent to our stream ;-) curl_exec($ch); curl_close($ch); // Don't forget to close the "file" / stream fclose($fp);

请注意，我们不使用真正的文件，而是使用我们的伪协议。

这样，每当一块数据到达时， MyStream::stream_write方法就会被调用，并且能够处理less量的数据（当我testing的时候，总是有8192个字节，无论我用于CURLOPT_BUFFERSIZE值是多less）

一些注意事项：

显然，你需要比我做得更多
如果行长度大于8192字节，我的stream_write实现可能无法工作; 由你来修补它;-)
这只是一些指针，而不是一个完全可行的解决scheme：你必须再次testing，并且可能代码更多一些！

不过，我希望这有助于;-)
玩的开心！

达伦库克评论帕斯卡尔马丁的回应是非常有趣的。在现代的PHP + Curl版本中，可以设置CURLOPT_WRITEFUNCTION选项，以便CURL为每个接收到的“块”数据调用callback。具体来说，“可调用”将接收两个参数，第一个调用curl对象，第二个调用curl对象。 funcion应该返回strlen($data)以便继续发送更多的数据。

可以使用PHP的方法。使用这一切，我已经开发出一种可能的解决scheme，我发现前者更具可读性（尽pipe帕斯卡尔·马丁的反应非常好，事情从那时起就发生了变化）。为了简单起见，我使用了公有属性，但是我相信读者可以调整和改进代码。当达到许多行（或字节）时，甚至可以中止CURL请求。我希望这会对其他人有用。

 <? class SplitCurlByLines { public function curlCallback($curl, $data) { $this->currentLine .= $data; $lines = explode("\n", $this->currentLine); // The last line could be unfinished. We should not // proccess it yet. $numLines = count($lines) - 1; $this->currentLine = $lines[$numLines]; // Save for the next callback. for ($i = 0; $i < $numLines; ++$i) { $this->processLine($lines[$i]); // Do whatever you want ++$this->totalLineCount; // Statistics. $this->totalLength += strlen($lines[$i]) + 1; } return strlen($data); // Ask curl for more data (!= value will stop). } public function processLine($str) { // Do what ever you want (split CSV, ...). echo $str . "\n"; } public $currentLine = ''; public $totalLineCount = 0; public $totalLength = 0; } // SplitCurlByLines // Just for testing, I will echo the content of Stackoverflow // main page. To avoid artifacts, I will inform the browser about // plain text MIME type, so the source code should be vissible. Header('Content-type: text/plain'); $splitter = new SplitCurlByLines(); // Configuration of curl $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://stackoverflow.com/"); curl_setopt($ch, CURLOPT_WRITEFUNCTION, array($splitter, 'curlCallback')); curl_exec($ch); // Process the last line. $splitter->processLine($splitter->currentLine); curl_close($ch); error_log($splitter->totalLineCount . " lines; " . $splitter->totalLength . " bytes."); ?>

您可能要考虑将其保存到临时文件，然后使用fgets或fgetcsv一次读取一行。

这样你就避免了从爆炸这么大的string中获得的最初的大数组。

在php.ini增加memory_limit 。
使用fopen()和fgets()读取数据。

将它caching到一个文件。不要试图一次把所有的数据保存在内存中。

注意：

“基本上，如果你用fopen打开一个文件，把它打开然后解除链接，它就可以正常工作，但是如果在fopen和fclose之间，你把文件句柄给cURL写入文件，那么unlink就会失败。这是发生在我身上的事情，我认为这可能与Bug＃48676有关“

http://bugs.php.net/bug.php?id=49517

所以要小心，如果你在一个旧版本的PHP。有一个简单的修复在这个页面上双重closures文件资源：

 fclose($fp); if (is_resource($fp)) fclose($fp);

操作3000万字符的string

参考 – 这个错误在PHP中意味着什么？

isset（）和__isset（）之间有什么区别？

如何从命令行执行PHP代码？

最好的XMLparsing器的PHP

连接echo和return时的句号和逗号之间的区别？

如何通过点击一个button来调用PHP函数

如何将date转换为PHP中的时间戳？

恶意黑客能否改变隐藏的后期variables？

PHPstring浮动

什么是PHP中的工厂devise模式？