TR Forums

cheesyking · Sat Dec 19, 2009 9:35 am

I realise there are many of these already but I wanted to do a specific task and brush up on my python programming anyway so I thought I write my own...

So, my goal is to create some web pages where users of a mail server can check whether any mail sent to them got filtered before reaching their inboxes. Eventually it will give details like why it got filtered and give the opportunity to download the message. Right now it just creates two basic html files that list all senders who got filtered for each recipient, and a list of all recipients for each sender.

Here's a rough outline on how it works:

I use awk to search the log file for filtered mails and return a string with the from and to addresses for each (I plan to extract more info in the future, like quarantine file, date, mail ID etc)

then I split the string in the getaddrs function which strips off any junk from the log file and returns a nested list of from and to addresses.

next I convert these into two dictionaries using either the sender or recipient address as the key.

finally these are passed to the buildHTML**** functions which create the two static HTML pages.

Besides the extra functionality I mentioned I'm also thinking of a complete rewrite to use OO code rather than funtions (hey I learned to program before OO came along and it's still what comes most naturally to me on small projects :roll:

) and perhaps using a DB rather than static HTML pages.

Any suggestions for improving what I've done, I'm sure there must be better ways of doing some (probably most) of this.
Here's the code:

import os

def getaddrs(data):
    #split the from and to addresses
    addresses = data.split(" ")
    faddr = str(addresses[0])
    taddr = str(addresses[1])
    #4 types of junk to remove: from=<>, to=<>, <> (both from and to)
    if faddr.startswith("from=<",0):
        faddr = faddr.lstrip("from=<").rstrip(">")
    if faddr.startswith("<",0):
        faddr = faddr.lstrip("<").rstrip(">")
    if taddr.startswith("to=<",0):
        taddr = taddr.lstrip("to=<").rstrip(">\n")
    if taddr.startswith("<",0):
        taddr = taddr.lstrip("<").rstrip(">,\n")
    #split taddr into a list of to addresses (since a single mail can have 
    #multiple recipents)
    taddrs = taddr.split(",")
    #remove the junk from these to addresses and create a list for all recipients
    to = []
    for i in taddrs:
        j = i.lstrip("<").rstrip(">")
        to.append(j)
    return [faddr, to]

def buildHTMLrecipients(recipients):
    f = open("recipients.html", "w")
    f.write("<table border=1><th>Recipients</th><th>Sender</th>\n")
    for r, s in recipients.iteritems():
        line = "<tr><td>"+ r + "</td><td>"+ str(s) + "</td></tr>\n"
        f.write(line)
    f.write("</table>")
    f.close()
    return 0

def buildHTMLsenders(senders):
    f = open("senders.html", "w")
    f.write("<table border=1><th>Sender</th><th>Recipients</th>\n")
    for s, r in senders.iteritems():
        recipientString = str(r)
        line = "<tr><td>"+ s + "</td><td>"+ recipientString + "</td></tr>\n"
        f.write(line)
    f.write("</table>")
    f.close()
    return 0



#postfix log file
logfile = "mail.log"

#commands must return a string in the format:
#FROM_ADDRESS TO_ADDRESSES
cmd1 = "awk '/spamcop/ {print $25, $26}' "+logfile
cmd2 = "awk '/Blocked SPAM/ {print $11, $13}' "+logfile

#Run the commands and build a list of all mails
mails = []
loglines=os.popen(cmd1)
for line in loglines:
    mails.append(getaddrs(str(line)))

loglines=os.popen(cmd2)
for line in loglines:
    mails.append(getaddrs(str(line)))

#build sender dictionary of recipients (IE a list of all recipients for a given sender)
senders = {}
for mail in mails:
    #if the sender is already in the dictionary then add this mail's recipients
    #to the senders existing list
    fromAddr = str(mail[0])
    # we're iterating through mail[1] since it's a list
    for toAddresses in mail[1]:
        if fromAddr in senders:
            senders[fromAddr].append(toAddresses)
        else:
            senders[fromAddr] = [toAddresses]

#build recipient dictionary of senders (IE a list of all senders for a given recipient)
recipients = {}
for mail in mails:
    #if the recipient is already in the dictonary then add this mail's sender
    #to the recipient's existing list
    #NB we must iterate through mail[1] since it is a list
    for toAddresses in mail[1]:
        toAddr = str(toAddresses)
        if toAddr in recipients:
            recipients[toAddr].append(mail[0])
        else:
            recipients[toAddr] = [mail[0]]

buildHTMLrecipients(recipients)
buildHTMLsenders(senders)

bitvector · Sat Dec 19, 2009 11:51 pm

cheesyking wrote:
Any suggestions for improving what I've done, I'm sure there must be better ways of doing some (probably most) of this.

So I'm not sure what kind of suggestions you'd like, but I have a few Python style suggestions to make the code a little more succinct (or in one case, just switching from a deprecated module). Maybe that's not what you're looking but I'll put them here for posterity.

mails = []
loglines=os.popen(cmd1)
for line in loglines:
    mails.append(getaddrs(str(line)))

loglines=os.popen(cmd2)
for line in loglines:
    mails.append(getaddrs(str(line)))

When you're iterating over a list to build a new list, you should consider using list comprehensions instead:

from itertools import chain
loglines1=os.popen(cmd1) 
loglines2=os.popen(cmd2)
mails = [getaddress(str(line)) for line in chain(loglines1, loglines2)]

Same thing with this:

    to = []
    for i in taddrs:
        j = i.lstrip("<").rstrip(">")
        to.append(j)

It becomes:

to = [i.lstrip("<").rstrip(">") for i in taddrs]

Also, when you have code like this for a dictionary:

if fromAddr in senders:
    senders[fromAddr].append(toAddresses)
else:
    senders[fromAddr] = [toAddresses]

You might consider using setdefault:

lst = senders.setdefault(fromAddr, [])
lst.append(toAddresses)

Whenever you have string concatenation like this:

recipientString = str(r)
line = "<tr><td>"+ s + "</td><td>"+ recipientString + "</td></tr>\n"

you might consider using the % based string replacement:

line = "<tr><td>%s</td><td>%s</td></tr>\n" % (s, r)

Although some people find C format string style replacement terse, I find it can be easier to edit and maintain than the sequence of string literals separated by concatenation operators (same deal with C++ iostreams).

Also, os.popen is now considered deprecated in favor of the subprocess module:

loglines1=os.popen(cmd1) 
loglines2=os.popen(cmd2)

can be replaced with subprocess's Popen:

from subprocess import Popen, PIPE
loglines1 = Popen(cmd1, shell=True, stdout=PIPE).stdout
loglines2 = Popen(cmd2, shell=True, stdout=PIPE).stdout

Also, in situations like this:

    addresses = data.split(" ")
    faddr = str(addresses[0])
    taddr = str(addresses[1])

You can unpack directly into the faddr and taddr vars (splitting a string gives you strings already):

faddr, toaddr = data.split(" ")

Now when you do this, you'll end up with a "too many values to unpack" exception if you have more than two entries in the list that data.split returns. You could fix that by slicing the return to contain two elements (i.e. data.split(" ")[:2]) or by adding the "maxsplit" second parameter to split (but that would give all of the left-overs in taddr). You might not want to use it, but the last suggestion is more to make you aware of Python's comma-based multiple assignment capability if you weren't already familiar with it.

cheesyking · Sun Dec 20, 2009 7:00 am

Excellent, that's exactly the kind of thing I was looking for, thanks!

Now I've just got to digest it all

TR Forums

postfix log analyser

postfix log analyser

Re: postfix log analyser

Re: postfix log analyser

Who is online