|
|
13.8. Removing Attachments from an Email MessageCredit: Anthony Baxter . ProblemYou're handling email in Python and need to remove from email messages any attachments that might be dangerous. . SolutionRegular expressions can help us identify dangerous content types and file extensions, and thus code a function to remove any potentially dangerous attachments:
. DiscussionThis issue has come up a few times on the newsgroup comp.lang.python, so I decided to post a cookbook entry to show how easy it is to deal with this kind of task. Specifically, this recipe shows how to read in an email message, strip out any dangerous or suspicious attachments, and replace them with a harmless text message informing the user of the alterations that we're performed. This kind of task is particularly important when end users are using something like Microsoft Outlook, which is targeted by harmful virus and worm messages (collectively known as malware) on a daily basis. The email parser in Python 2.4 has been completely rewritten to be robust first, correct second. Prior to that version, the parser was written for correctness first. But focusing on correctness was a problem because many virus/worm messages and other malware routinely send email messages that are broken and nonconformant—malformed to the point that the old email parser chokes and dies. The new parser is designed to never actually break when reading a message. Instead, it tries its best to fix whatever it can fix in the message. (If you have a message that causes the parser to crash, please let us, the core Python developers, know. It's a bug, and we'll fix it. Please include a copy of the message that makes the parser crash, or else it's very unlikely that we can reproduce your problem!) The recipe's code itself is fairly well commented and should be easy enough to follow. A mail message consists of one or more parts; each of these parts can contain nested parts. We call the sanitise function on the top-level Message object, and it calls itself recursively on the subobjects if and as needed. The sanitise function first checks the Content-Type of the part, and if there's a filename, it also checks that filename's extension against a known-to-be-bad list. If the message part is bad, we replace the message itself with a short text description describing the now-removed part and clean out the headers that are relevant. We set this message part's Content-Type to 'text/plain' and remove other headers related to the now-removed message. Finally, we check whether the message is a multipart message. If so, it means the message has subparts, so we recursively call the sanitise function on each of them. We then replace the payload with our list of sanitized subparts. If you're interested in working further on this recipe, the most important extra functionality, which is easy to add with a small amount of work, might be to store the attached file in some directory (instead of destroying all suspect attachments), and give the user a link to that file. Also consider extending the check in sanitise that filters dangerous attachments to have it verify more than just the content type and file extension; other headers may be able to carry known signs of worm or virus messages. View the past week's recipes: Today | Yesterday | 3 days ago | 4 days ago | 5 days ago | 6 days ago | A week ago |
|
Sponsored by: |