为什么从std :: istream读取logging结构字段失败,我该如何解决?
假设我们有以下情况:
- 一个logging结构声明如下
struct Person { unsigned int id; std::string name; uint8_t age; // ... };
- logging使用以下格式存储在文件中:
ID Forename Lastname Age ------------------------------ 1267867 John Smith 32 67545 Jane Doe 36 8677453 Gwyneth Miller 56 75543 J. Ross Unusual 23 ...
应读入文件以收集任意数量的上述Person
logging:
std::istream& ifs = std::ifstream("SampleInput.txt"); std::vector<Person> persons; Person actRecord; while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) { persons.push_back(actRecord); } if(!ifs) { std::err << "Input format error!" << std::endl; }
问题:(这是一个常见的问题,在这个或那个forms)
我能做些什么来读取单独的值存储到一个actRecord
variables的字段?
上面的代码示例结束了运行时错误:
Runtime error time: 0 memory: 3476 signal:-1 stderr: Input format error!
一个可行的解决scheme是重新sortinginput字段(如果这是可能的话)
ID Age Forename Lastname 1267867 32 John Smith 67545 36 Jane Doe 8677453 56 Gwyneth Miller 75543 23 J. Ross Unusual ...
并按以下方式读入logging
#include <iostream> #include <vector> struct Person { unsigned int id; std::string name; uint8_t age; // ... }; int main() { std::istream& ifs = std::cin; // Open file alternatively std::vector<Person> persons; Person actRecord; unsigned int age; while(ifs >> actRecord.id >> age && std::getline(ifs, actRecord.name)) { actRecord.age = uint8_t(age); persons.push_back(actRecord); } return 0; }
名字和姓氏之间有空格。 改变你的类有名字和姓氏作为单独的string,它应该工作。 你可以做的另一件事是读取两个单独的variables,如name1
和name2
并将其赋值为
actRecord.name = name1 + " " + name2;
下面是我提出的一个操纵器的实现,它通过每个提取的字符来计算分隔符。 使用您指定的分隔符数量,它将从inputstream中提取单词。 这是一个工作演示。
template<class charT> struct word_inserter_impl { word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim) : str_(str) , delim_(delim) , words_(words) { } friend std::basic_istream<charT>& operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) { typename std::basic_istream<charT>::sentry ok(is); if (ok) { std::istreambuf_iterator<charT> it(is), end; std::back_insert_iterator<std::string> dest(wi.str_); while (it != end && wi.words_) { if (*it == wi.delim_ && --wi.words_ == 0) { break; } dest++ = *it++; } } return is; } private: std::basic_string<charT>& str_; charT delim_; mutable std::size_t words_; }; template<class charT=char> word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) { return word_inserter_impl<charT>(words, str, delim); }
现在你可以做:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) { std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n'; }
现场演示
解决办法是将第一个条目读入ID
variables。
然后从行中读出所有其他的单词(只要将它们推入一个临时向量),然后用除了最后一个年龄条目之外的所有元素构造个体的名称。
这可以让你在最后一个位置仍然有年龄,但能够处理像“J.罗斯不寻常”这样的名字。
更新添加一些代码,说明上述理论:
#include <memory> #include <string> #include <vector> #include <iterator> #include <fstream> #include <sstream> #include <iostream> struct Person { unsigned int id; std::string name; int age; }; int main() { std::fstream ifs("in.txt"); std::vector<Person> persons; std::string line; while (std::getline(ifs, line)) { std::istringstream iss(line); // first: ID simply read it Person actRecord; iss >> actRecord.id; // next iteration: read in everything std::string temp; std::vector<std::string> tempvect; while(iss >> temp) { tempvect.push_back(temp); } // then: the name, let's join the vector in a way to not to get a trailing space // also taking care of people who do not have two names ... int LAST = 2; if(tempvect.size() < 2) // only the name and age are in there { LAST = 1; } std::ostringstream oss; std::copy(tempvect.begin(), tempvect.end() - LAST, std::ostream_iterator<std::string>(oss, " ")); // the last element oss << *(tempvect.end() - LAST); actRecord.name = oss.str(); // and the age actRecord.age = std::stoi( *(tempvect.end() - 1) ); persons.push_back(actRecord); } for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++) { std::cout << it->id << ":" << it->name << ":" << it->age << std::endl; } }
由于我们可以很容易地在空格中分割一行,我们知道唯一可以分隔的值就是名字,所以可能的解决scheme是对包含行的空白分隔元素的每一行使用一个双端队列。 id和age可以很容易地从deque中获取,其余的元素可以被连接起来以获取名字:
#include <iostream> #include <fstream> #include <deque> #include <vector> #include <sstream> #include <iterator> #include <string> #include <algorithm> #include <utility> struct Person { unsigned int id; std::string name; uint8_t age; };
int main(int argc, char* argv[]) { std::ifstream ifs("SampleInput.txt"); std::vector<Person> records; std::string line; while (std::getline(ifs,line)) { std::istringstream ss(line); std::deque<std::string> info(std::istream_iterator<std::string>(ss), {}); Person record; record.id = std::stoi(info.front()); info.pop_front(); record.age = std::stoi(info.back()); info.pop_back(); std::ostringstream name; std::copy ( info.begin() , info.end() , std::ostream_iterator<std::string>(name," ")); record.name = name.str(); record.name.pop_back(); records.push_back(std::move(record)); } for (auto& record : records) { std::cout << record.id << " " << record.name << " " << static_cast<unsigned int>(record.age) << std::endl; } return 0; }
另一种解决方法是为特定的字段需要特定的分隔字符,并为此提供一个特殊的提取操纵器。
假设我们定义了分隔符"
,input应该如下所示:
1267867 "John Smith" 32 67545 "Jane Doe" 36 8677453 "Gwyneth Miller" 56 75543 "J. Ross Unusual" 23
一般需要包括:
#include <iostream> #include <vector> #include <iomanip>
logging声明:
struct Person { unsigned int id; std::string name; uint8_t age; // ... };
支持与std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&)
全局运算符重载的代理类(struct)的声明/定义:
struct delim_field_extractor_proxy { delim_field_extractor_proxy ( std::string& field_ref , char delim = '"' ) : field_ref_(field_ref), delim_(delim) {} friend std::istream& operator>> ( std::istream& is , const delim_field_extractor_proxy& extractor_proxy); void extract_value(std::istream& is) const { field_ref_.clear(); char input; bool addChars = false; while(is) { is.get(input); if(is.eof()) { break; } if(input == delim_) { addChars = !addChars; if(!addChars) { break; } else { continue; } } if(addChars) { field_ref_ += input; } } // consume whitespaces while(std::isspace(is.peek())) { is.get(); } } std::string& field_ref_; char delim_; };
std::istream& operator>> ( std::istream& is , const delim_field_extractor_proxy& extractor_proxy) { extractor_proxy.extract_value(is); return is; }
pipe道连接在一起,并实例化delim_field_extractor_proxy
:
int main() { std::istream& ifs = std::cin; // Open file alternatively std::vector<Person> persons; Person actRecord; int act_age; while(ifs >> actRecord.id >> delim_field_extractor_proxy(actRecord.name,'"') >> act_age) { actRecord.age = uint8_t(act_age); persons.push_back(actRecord); } for(auto it = persons.begin(); it != persons.end(); ++it) { std::cout << it->id << ", " << it->name << ", " << int(it->age) << std::endl; } return 0; }
看到这里的工作示例 。
注意:
此解决scheme也可以很好地将制表符( \t
)指定为分隔符,这对parsing标准的.csv
格式非常有用。
我能做些什么来形成一个
actRecord.name
variables形成名称的actRecord.name
?
一般的答案是: 不可以 ,如果没有额外的分隔符规范和对形成预期的actRecord.name
内容的部分的特殊parsing,你不能做到这一点。
这是因为一个std::string
字段将被分析到下一个空白字符的发生。
值得注意的是,某些标准格式(例如.csv
)可能需要支持从制表符( '\t'
)或其他字符区分空白( ' '
)以划定某些logging字段(乍一看可能不可见) 。
另请注意:
要将uint8_t
值作为数字input读取,必须使用临时的unsigned int
值进行偏移。 只读取一个unsigned char
(aka uint8_t
)会搞砸streamparsing状态。
解决分析问题的另一个尝试。
int main() { std::ifstream ifs("test-115.in"); std::vector<Person> persons; while (true) { Person actRecord; // Read the ID and the first part of the name. if ( !(ifs >> actRecord.id >> actRecord.name ) ) { break; } // Read the rest of the line. std::string line; std::getline(ifs,line); // Pickup the rest of the name from the rest of the line. // The last token in the rest of the line is the age. // All other tokens are part of the name. // The tokens can be separated by ' ' or '\t'. size_t pos = 0; size_t iter1 = 0; size_t iter2 = 0; while ( (iter1 = line.find(' ', pos)) != std::string::npos || (iter2 = line.find('\t', pos)) != std::string::npos ) { size_t iter = (iter1 != std::string::npos) ? iter1 : iter2; actRecord.name += line.substr(pos, (iter - pos + 1)); pos = iter + 1; // Skip multiple whitespace characters. while ( isspace(line[pos]) ) { ++pos; } } // Trim the last whitespace from the name. actRecord.name.erase(actRecord.name.size()-1); // Extract the age. // std::stoi returns an integer. We are assuming that // it will be small enough to fit into an uint8_t. actRecord.age = std::stoi(line.substr(pos).c_str()); // Debugging aid.. Make sure we have extracted the data correctly. std::cout << "ID: " << actRecord.id << ", name: " << actRecord.name << ", age: " << (int)actRecord.age << std::endl; persons.push_back(actRecord); } // If came here before the EOF was reached, there was an // error in the input file. if ( !(ifs.eof()) ) { std::cerr << "Input format error!" << std::endl; } }
当看到这样一个input文件时,我认为它不是一个(新的)分隔文件,而是一个好的固定大小的字段,就像Fortran和Cobol程序员用来处理的那样。 所以我会parsing它(注意我分开的名字和姓氏):
#include <iostream> #include <fstream> #include <sstream> #include <string> #include <vector> struct Person { unsigned int id; std::string forename; std::string lastname; uint8_t age; // ... }; int main() { std::istream& ifs = std::ifstream("file.txt"); std::vector<Person> persons; std::string line; int fieldsize[] = {8, 9, 9, 4}; while(std::getline(ifs, line)) { Person person; int field = 0, start=0, last; std::stringstream fieldtxt; fieldtxt.str(line.substr(start, fieldsize[0])); fieldtxt >> person.id; start += fieldsize[0]; person.forename=line.substr(start, fieldsize[1]); last = person.forename.find_last_not_of(' ') + 1; person.forename.erase(last); start += fieldsize[1]; person.lastname=line.substr(start, fieldsize[2]); last = person.lastname.find_last_not_of(' ') + 1; person.lastname.erase(last); start += fieldsize[2]; std::string a = line.substr(start, fieldsize[3]); fieldtxt.str(line.substr(start, fieldsize[3])); fieldtxt >> age; person.age = person.age; persons.push_back(person); } return 0; }