Pythonrecursion文件夹读取
我有一个C ++ / Obj-C的背景,我只是发现Python(已经写了大约一个小时)。 我正在写一个脚本recursion读取文件夹结构中的文本文件的内容。
我遇到的问题是我写的代码只能用于一个文件夹。 我可以看到为什么在代码中(请参阅#hardcoded path
),我只是不知道我可以如何前进与Python,因为我的经验只是全新的。
Python代码:
import os import sys rootdir = sys.argv[1] for root, subFolders, files in os.walk(rootdir): for folder in subFolders: outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path folderOut = open( outfileName, 'w' ) print "outfileName is " + outfileName for file in files: filePath = rootdir + '/' + file f = open( filePath, 'r' ) toWrite = f.read() print "Writing '" + toWrite + "' to" + filePath folderOut.write( toWrite ) f.close() folderOut.close()
确保你了解os.walk
的三个返回值:
for root, subdirs, files in os.walk(rootdir):
具有以下含义:
-
root
:“走过”的当前path -
subdirs
:root
types的root
目录中的文件 -
files
:除目录以外的其他文件(不在subdirs
)
请使用os.path.join
而不是用斜线连接! 你的问题是filePath = rootdir + '/' + file
– 你必须连接当前“走”的文件夹,而不是最顶层的文件夹。 所以,必须是filePath = os.path.join(root, file)
。 BTW“文件”是一个内置的,所以你通常不使用它作为variables名称。
另一个问题是你的循环,应该是这样的,例如:
import os import sys walk_dir = sys.argv[1] print('walk_dir = ' + walk_dir) # If your current working directory may change during script execution, it's recommended to # immediately convert program arguments to an absolute path. Then the variable root below will # be an absolute path as well. Example: # walk_dir = os.path.abspath(walk_dir) print('walk_dir (absolute) = ' + os.path.abspath(walk_dir)) for root, subdirs, files in os.walk(walk_dir): print('--\nroot = ' + root) list_file_path = os.path.join(root, 'my-directory-list.txt') print('list_file_path = ' + list_file_path) with open(list_file_path, 'wb') as list_file: for subdir in subdirs: print('\t- subdirectory ' + subdir) for filename in files: file_path = os.path.join(root, filename) print('\t- file %s (full path: %s)' % (filename, file_path)) with open(file_path, 'rb') as f: f_content = f.read() list_file.write(('The file %s contains:\n' % filename).encode('utf-8')) list_file.write(f_content) list_file.write(b'\n')
如果你不知道,文件的with
语句是一个简写:
with open('filename', 'rb') as f: dosomething() # is effectively the same as f = open('filename', 'rb') try: dosomething() finally: f.close()
同意Dave Webb, os.walk
将产生树中每个目录的一个项目。 事实是,你只是不必关心子文件subFolders
。
这样的代码应该工作:
import os import sys rootdir = sys.argv[1] for folder, subs, files in os.walk(rootdir): with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest: for filename in files: with open(os.path.join(folder, filename), 'r') as src: dest.write(src.read())
如果您使用Python 3.5+或更高版本,则可以在1行中完成此操作。
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True): print(filename)
正如文件中所提到的
如果recursion是真的,模式'**'将匹配任何文件和零个或多个目录和子目录。
如果你想要每一个文件,你可以使用
for filename in glob.iglob(root_dir + '**/*', recursive=True): print(filename)
使用os.path.join()
来构build你的path – 它是整洁的:
import os import sys rootdir = sys.argv[1] for root, subFolders, files in os.walk(rootdir): for folder in subFolders: outfileName = os.path.join(root,folder,"py-outfile.txt") folderOut = open( outfileName, 'w' ) print "outfileName is " + outfileName for file in files: filePath = os.path.join(root,file) toWrite = open( filePath).read() print "Writing '" + toWrite + "' to" + filePath folderOut.write( toWrite ) folderOut.close()
os.walk
默认情况下是recursion的。 对于每个目录,从根开始它产生一个3元组(dirpath,dirnames,文件名)
from os import walk from os.path import splitext, join def select_files(root, files): """ simple logic here to filter out interesting files .py files in this example """ selected_files = [] for file in files: #do concatenation here to get full path full_path = join(root, file) ext = splitext(file)[1] if ext == ".py": selected_files.append(full_path) return selected_files def build_recursive_dir_tree(path): """ path - where to begin folder scan """ selected_files = [] for root, dirs, files in walk(path): selected_files += select_files(root, files) return selected_files
我认为问题是你没有正确处理os.walk
的输出。
首先,改变:
filePath = rootdir + '/' + file
至:
filePath = root + '/' + file
rootdir
是你的固定起始目录; root
是os.walk
返回的目录。
其次,你不需要缩进你的文件处理循环,因为这对每个子目录都是没有意义的。 你会得到root
设置为每个子目录。 你不需要手工处理子目录,除非你想对目录本身做些什么。
尝试这个:
import os import sys for root, subdirs, files in os.walk(path): for file in os.listdir(root): filePath = os.path.join(root, file) if os.path.isdir(filePath): pass else: f = open (filePath, 'r') # Do Stuff