如何从Gmail下载所有含附件的电子邮件?
如何连接到Gmail并确定哪些邮件具有附件? 然后,我想要下载每个附件,并在处理它时打印出每个消息的Subject:和From:。
硬一个:-)
import email, getpass, imaplib, os detach_dir = '.' # directory where to save attachments (default: current) user = raw_input("Enter your GMail username:") pwd = getpass.getpass("Enter your password: ") # connecting to the gmail imap server m = imaplib.IMAP4_SSL("imap.gmail.com") m.login(user,pwd) m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead # use m.list() to get all the mailboxes resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp) items = items[0].split() # getting the mails id for emailid in items: resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc email_body = data[0][1] # getting the mail content mail = email.message_from_string(email_body) # parsing the mail content to get a mail object #Check if any attachments at all if mail.get_content_maintype() != 'multipart': continue print "["+mail["From"]+"] :" + mail["Subject"] # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach for part in mail.walk(): # multipart are just containers, so we skip them if part.get_content_maintype() == 'multipart': continue # is this part an attachment ? if part.get('Content-Disposition') is None: continue filename = part.get_filename() counter = 1 # if there is no filename, we create one with a counter to avoid duplicates if not filename: filename = 'part-%03d%s' % (counter, 'bin') counter += 1 att_path = os.path.join(detach_dir, filename) #Check if its already there if not os.path.isfile(att_path) : # finally write the stuff fp = open(att_path, 'wb') fp.write(part.get_payload(decode=True)) fp.close()
Wowww! 那是什么 ;-)但是在Java中尝试一样,只是为了好玩!
顺便说一句,我在一个shell中testing,所以有些错误可能会保留。
请享用
编辑:
由于邮箱名称可能会从一个国家变为另一个国家,我build议在m.select("the mailbox name")
之前先执行m.list()
并select一个项目,以避免出现此错误:
imaplib.error:在状态AUTH中命令SEARCH非法,只允许在SELECTED状态下使用
我不是Perl的专家,但我所知道的是,GMail支持IMAP和POP3,2个完全标准的协议,并且允许您这样做。
也许这有助于你开始。
#!/usr/bin/env python """Save all attachments for given gmail account.""" import os, sys from libgmail import GmailAccount ga = GmailAccount("your.account@gmail.com", "pA$$w0Rd_") ga.login() # folders: inbox, starred, all, drafts, sent, spam for thread in ga.getMessagesByFolder('all', allPages=True): for msg in thread: sys.stdout.write('.') if msg.attachments: print "\n", msg.id, msg.number, msg.subject, msg.sender for att in msg.attachments: if att.filename and att.content: attdir = os.path.join(thread.id, msg.id) if not os.path.isdir(attdir): os.makedirs(attdir) with open(os.path.join(attdir, att.filename), 'wb') as f: f.write(att.content)
未经testing
- 确保TOS允许这样的脚本,否则你的帐户将被暂停
- 可能有更好的select:GMail离线模式,Thunderbird + ExtractExtensions,GmailFS,Gmail Drive等
看看Mail :: Webmail :: Gmail :
获取附件
有两种方法可以获得附件:
1 – >通过发送对由get_indv_email
返回的特定附件的get_indv_email
# Creates an array of references to every attachment in your account my $messages = $gmail->get_messages(); my @attachments; foreach ( @{ $messages } ) { my $email = $gmail->get_indv_email( msg => $_ ); if ( defined( $email->{ $_->{ 'id' } }->{ 'attachments' } ) ) { foreach ( @{ $email->{ $_->{ 'id' } }->{ 'attachments' } } ) { push( @attachments, $gmail->get_attachment( attachment => $_ ) ); if ( $gmail->error() ) { print $gmail->error_msg(); } } } }
2 – >或通过发送附件ID和消息ID
#retrieve specific attachment my $msgid = 'F000000000'; my $attachid = '0.1'; my $attach_ref = $gmail->get_attachment( attid => $attachid, msgid => $msgid );
(返回包含附件数据的标量的引用。)
在gmail中,你可以过滤“has:attachment”,用它来识别你testing时应该得到的消息。 请注意,这似乎给两个消息附加文件(显示回形针图标),以及内嵌附加图像(显示无回形针)。
没有Gmail API,所以IMAP或POP是您唯一的实际选项。 JavaMail API可能有一些帮助,以及使用Perl从IMAP下载附件的非常简洁的文章。 以前的一些关于SO的问题也可能有所帮助。
这个PHP例子也可以帮助。 不幸的是,从我所看到的imap_header中没有包含附件信息,因此下载正文需要能够看到X-Attachment-Id字段。 (有人请certificate我错了)。
如果你有任何更新到Python 3.3我从这里 2.7脚本并更新到3.3。 还解决了gmail返回信息的一些问题。
# Something in lines of http://stackoverflow.com/questions/348630/how-can-i-download-all-emails-with-attachments-from-gmail # Make sure you have IMAP enabled in your gmail settings. # Right now it won't download same file name twice even if their contents are different. # Gmail as of now returns in bytes but just in case they go back to string this line is left here. import email import getpass, imaplib import os import sys import time detach_dir = '.' if 'attachments' not in os.listdir(detach_dir): os.mkdir('attachments') userName = input('Enter your GMail username:\n') passwd = getpass.getpass('Enter your password:\n') try: imapSession = imaplib.IMAP4_SSL('imap.gmail.com',993) typ, accountDetails = imapSession.login(userName, passwd) if typ != 'OK': print ('Not able to sign in!') raise imapSession.select('Inbox') typ, data = imapSession.search(None, 'ALL') if typ != 'OK': print ('Error searching Inbox.') raise # Iterating over all emails for msgId in data[0].split(): typ, messageParts = imapSession.fetch(msgId, '(RFC822)') if typ != 'OK': print ('Error fetching mail.') raise #print(type(emailBody)) emailBody = messageParts[0][1] #mail = email.message_from_string(emailBody) mail = email.message_from_bytes(emailBody) for part in mail.walk(): #print (part) if part.get_content_maintype() == 'multipart': # print part.as_string() continue if part.get('Content-Disposition') is None: # print part.as_string() continue fileName = part.get_filename() if bool(fileName): filePath = os.path.join(detach_dir, 'attachments', fileName) if not os.path.isfile(filePath) : print (fileName) fp = open(filePath, 'wb') fp.write(part.get_payload(decode=True)) fp.close() imapSession.close() imapSession.logout() except : print ('Not able to download all attachments.') time.sleep(3)
这个问题是相当古老的,当时的Gmail API是不可用的。 但是现在Google提供了Gmail API来访问IMAP。 在这里查看Google的Gmail API。 另请参阅pypi上的google-api-python-client 。
/*based on http://www.codejava.net/java-ee/javamail/using-javamail-for-searching-e-mail-messages*/ package getMailsWithAtt; import java.io.File; import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.Properties; import javax.mail.Address; import javax.mail.Folder; import javax.mail.Message; import javax.mail.MessagingException; import javax.mail.Multipart; import javax.mail.NoSuchProviderException; import javax.mail.Part; import javax.mail.Session; import javax.mail.Store; import javax.mail.internet.MimeBodyPart; import javax.mail.search.AndTerm; import javax.mail.search.SearchTerm; import javax.mail.search.ReceivedDateTerm; import javax.mail.search.ComparisonTerm; public class EmailReader { private String saveDirectory; /** * Sets the directory where attached files will be stored. * * @param dir * absolute path of the directory */ public void setSaveDirectory(String dir) { this.saveDirectory = dir; } /** * Downloads new messages and saves attachments to disk if any. * * @param host * @param port * @param userName * @param password * @throws IOException */ public void downloadEmailAttachments(String host, String port, String userName, String password, Date startDate, Date endDate) { Properties props = System.getProperties(); props.setProperty("mail.store.protocol", "imaps"); try { Session session = Session.getDefaultInstance(props, null); Store store = session.getStore("imaps"); store.connect("imap.gmail.com", userName, password); // ... Folder inbox = store.getFolder("INBOX"); inbox.open(Folder.READ_ONLY); SearchTerm olderThan = new ReceivedDateTerm (ComparisonTerm.LT, startDate); SearchTerm newerThan = new ReceivedDateTerm (ComparisonTerm.GT, endDate); SearchTerm andTerm = new AndTerm(olderThan, newerThan); //Message[] arrayMessages = inbox.getMessages(); <--get all messages Message[] arrayMessages = inbox.search(andTerm); for (int i = arrayMessages.length; i > 0; i--) { //from newer to older Message msg = arrayMessages[i-1]; Address[] fromAddress = msg.getFrom(); String from = fromAddress[0].toString(); String subject = msg.getSubject(); String sentDate = msg.getSentDate().toString(); String receivedDate = msg.getReceivedDate().toString(); String contentType = msg.getContentType(); String messageContent = ""; // store attachment file name, separated by comma String attachFiles = ""; if (contentType.contains("multipart")) { // content may contain attachments Multipart multiPart = (Multipart) msg.getContent(); int numberOfParts = multiPart.getCount(); for (int partCount = 0; partCount < numberOfParts; partCount++) { MimeBodyPart part = (MimeBodyPart) multiPart .getBodyPart(partCount); if (Part.ATTACHMENT.equalsIgnoreCase(part .getDisposition())) { // this part is attachment String fileName = part.getFileName(); attachFiles += fileName + ", "; part.saveFile(saveDirectory + File.separator + fileName); } else { // this part may be the message content messageContent = part.getContent().toString(); } } if (attachFiles.length() > 1) { attachFiles = attachFiles.substring(0, attachFiles.length() - 2); } } else if (contentType.contains("text/plain") || contentType.contains("text/html")) { Object content = msg.getContent(); if (content != null) { messageContent = content.toString(); } } // print out details of each message System.out.println("Message #" + (i + 1) + ":"); System.out.println("\t From: " + from); System.out.println("\t Subject: " + subject); System.out.println("\t Received: " + sentDate); System.out.println("\t Message: " + messageContent); System.out.println("\t Attachments: " + attachFiles); } // disconnect inbox.close(false); store.close(); } catch (NoSuchProviderException e) { e.printStackTrace(); System.exit(1); } catch (MessagingException e) { e.printStackTrace(); System.exit(2); } catch (IOException ex) { ex.printStackTrace(); } } /** * Runs this program with Gmail POP3 server * @throws ParseException */ public static void main(String[] args) throws ParseException { String host = "pop.gmail.com"; String port = "995"; String userName = "user@gmail.com"; String password = "pass"; Date startDate = new SimpleDateFormat("yyyy-MM-dd").parse("2014-06-30"); Date endDate = new SimpleDateFormat("yyyy-MM-dd").parse("2014-06-01"); String saveDirectory = "C:\\Temp"; EmailReader receiver = new EmailReader(); receiver.setSaveDirectory(saveDirectory); receiver.downloadEmailAttachments(host, port, userName, password,startDate,endDate); } }
Maven依赖:
<dependency> <groupId>com.sun.mail</groupId> <artifactId>javax.mail</artifactId> <version>1.5.1</version> </dependency>
由于Gmail支持POP和IMAP标准协议,任何提供任一协议客户端的平台,工具,应用程序,组件或API都应该可以工作。
我build议你做一个Googlesearch你最喜欢的语言/平台(例如,“python”),加上“pop”,加上“imap”,加上也许是“开源”,加上也许“下载”或“审查”你得到的select。
有许多免费的应用程序和组件,挑选一些似乎值得的,检查评论,然后下载和享受。
你应该知道,你需要SSL连接到GMail(无论是POP3还是IMAP–这当然也是他们的SMTP服务器,除了端口25,但这是另一回事)。
下面是我写下的用Groovy (Java平台的dynamic语言)下载银行对账单的内容。
import javax.mail.* import java.util.Properties String gmailServer int gmailPort def user, password, LIMIT def inboxFolder, root, StartDate, EndDate // Downloads all attachments from a gmail mail box as per some criteria // to a specific folder // Based on code from // http://agileice.blogspot.com/2008/10/using-groovy-to-connect-to-gmail.html // http://stackoverflow.com/questions/155504/download-mail-attachment-with-java // // Requires: // java mail jars in the class path (mail.jar and activation.jar) // openssl, with gmail certificate added to java keystore (see agileice blog) // // further improvement: maybe findAll could be used to filter messages // subject could be added as another criteria ////////////////////// <CONFIGURATION> ////////////////////// // Maximm number of emails to access in case parameter range is too high LIMIT = 10000 // gmail credentials gmailServer = "imap.gmail.com" gmailPort = 993 user = "gmailuser@gmail.com" password = "gmailpassword" // gmail label, or "INBOX" for inbox inboxFolder = "finance" // local file system where the attachment files need to be stored root = "D:\\AttachmentStore" // date range dd-mm-yyyy StartDate= "31-12-2009" EndDate = "1-6-2010" ////////////////////// </CONFIGURATION> ////////////////////// StartDate = Date.parse("dd-MM-yyyy", StartDate) EndDate = Date.parse("dd-MM-yyyy", EndDate) Properties props = new Properties(); props.setProperty("mail.store.protocol", "imaps"); props.setProperty("mail.imaps.host", gmailServer); props.setProperty("mail.imaps.port", gmailPort.toString()); props.setProperty("mail.imaps.partialfetch", "false"); def session = javax.mail.Session.getDefaultInstance(props,null) def store = session.getStore("imaps") store.connect(gmailServer, user, password) int i = 0; def folder = store.getFolder(inboxFolder) folder.open(Folder.READ_ONLY) for(def msg : folder.messages) { //if (msg.subject?.contains("bank Statement")) println "[$i] From: ${msg.from} Subject: ${msg.subject} -- Received: ${msg.receivedDate}" if (msg.receivedDate < StartDate || msg.receivedDate > EndDate) { println "Ignoring due to date range" continue } if (msg.content instanceof Multipart) { Multipart mp = (Multipart)msg.content; for (int j=0; j < mp.count; j++) { Part part = mp.getBodyPart(j); println " ---- ${part.fileName} ---- ${part.disposition}" if (part.disposition?.equalsIgnoreCase(Part.ATTACHMENT)) { if (part.content) { def name = msg.receivedDate.format("yyyy_MM_dd") + " " + part.fileName println "Saving file to $name" def f = new File(root, name) //f << part.content try { if (!f.exists()) f << part.content } catch (Exception e) { println "*** Error *** $e" } } else { println "NO Content Found!!" } } } } if (i++ > LIMIT) break; }
你有没有看过维基百科上的Gmail第三方附加组件 ?
特别是, PhpGmailDrive是一个开源插件,你可以直接使用,或者也许可以学习灵感?
对于Java,你会发现G4J的使用。 这是一组通过Java与Google Mail进行通信的API(主页上的屏幕截图是围绕此构build的演示电子邮件客户端)