Experience with parsing exchange server emails

From Visual Basic to GNU C, this is the place to talk programming.

Moderators: SecretSquirrel, just brew it!

Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 9:50 am

Hey guys and gals,

I have a new task set upon me in the next few weeks, and it's something I've never done before - so before I start reinventing the wheel, I thought I'd ask around and see if anyone else has done this before.

Basically an account we setup for our MS Exchange server will be receiving emails from various shipping companies (FedEx, DHL, UPS) so I will have to parse 3 different types of emails. There is a common piece of information akin to an order id in all 3 emails that I'll be parsing and storing in a SQL database for use on a central webpage that will show each user, regardless of their shipping method, the status of their order. Because we're the middleman and not the actual order processors etc, that's about as much information as I can rely on. Further, I'm not sure I can rely on the format of the emails received to not change.

So what I am envisioning right now is a parser custom written for each email based on spacing and character counts etc. Using something like a SubString() function to capture a key piece of the email, then parsing X characters after it and relying on that being correct. There are many things about this approach I don't like that I think are obvious.

I was hoping someone had experience with this before, even if it's a paid-for solution, as long as it works Im sure we'd shell out for something robust. Baring that, how would one go about gaining access to the files? It's my understanding that the files are stored as .eml - if that's correct, is there any identifiable information about the file name that would tell me that they are the files I am looking for? How does exchange store all this information, I assumed some sort of database. I did some research so far and I'm under the impression that using MIME you can at least separate the header/attachment/body/subject etc.

Thanks in advance for any advice/information you can lend.
Corsair 600T | ASUS P8P67 PRO | Intel 2500k @ 4.4Ghz | EVGA 560 TI | G.SKILL Ripjaws Series 8GB | Corsair HX650 650W
steelcity_ballin
Gerbilus Supremus
Silver subscriber
 
 
Posts: 11924
Joined: Mon May 26, 2003 5:55 am
Location: Pittsburgh PA

Re: Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 10:18 am

One thing that I'm not following here, is where exactly are you going to run these scripts. You want to run them on the MS Exchange server itself, and this script has to interface with Exchange to access those e-mails? Do you have a specific language you want or not want to use? Because if it's simply an address that receives order e-mails, it's relatively easy to, for instance, use PHP (or Perl, whatever) and pull them up via POP/IMAP protocol and proceed from there.

From the description, it looks like you can rely on the sender domain address (i.e. something@dhl.com) for determining the company, and the e-mail subject line to determine the order number.
There is a fixed amount of intelligence on the planet, and the population keeps growing :(
morphine
Grand Admiral Gerbil
Silver subscriber
 
 
Posts: 10069
Joined: Fri Dec 27, 2002 8:51 pm
Location: Portugal (that's next to Spain)

Re: Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 10:42 am

We're basically a 3rd party to an over-complicated process... /sigh

I would write an executable that runs as a scheduled task to gather all the emails like you said, of a particular sender address, and parse them out for the information I want. I'm still waiting to talk to our IT to see what is available for me. I found this program, http://www.afterlogic.com/mailbee-net/imap-component, that seems like it would make a semi-cumbersome task a little easier, and it's not that expensive.

My initial question I guess wasn't laid out so clearly; How does exchange store all the emails? Is it a database? I've seen the .eml extension thrown around, is there some folder on the exchange server that houses all of these files?

Here's a stack overflow thread that seems to suggest it's painful to try and do what I'm proposing; http://stackoverflow.com/questions/6525 ... email-in-c

edit: Part of the article I missed, WE HAVE EXCHANGE 2007, I CAN USE SOAP! http://msdn.microsoft.com/en-us/library/bb204119.aspx
Corsair 600T | ASUS P8P67 PRO | Intel 2500k @ 4.4Ghz | EVGA 560 TI | G.SKILL Ripjaws Series 8GB | Corsair HX650 650W
steelcity_ballin
Gerbilus Supremus
Silver subscriber
 
 
Posts: 11924
Joined: Mon May 26, 2003 5:55 am
Location: Pittsburgh PA

Re: Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 11:00 am

Languages: I suggest PHP or Perl for what you're accomplishing, assuming you're okay with either. I'll use PHP as an example here.

Scenario 1: Access the destination account via IMAP, and use the IMAP functions to retrieve and parse the messages. You'll see that the functions do most of the heavy lifting for you and will let you access the necessary e-mail parts reasonably directly.

Scenario 2: Access Exchange's web services via SOAP using their API (see SOAP reference here). If I had to guess, it's more work, and you'll lock yourself into Exchange this way, as IMAP/POP3 are open standards. That's why I would go with scenario 1 if possible.

Accessing .eml files or a database directly is a bad idea in my book, even if that was possible. Any of the above scenarios provides the necessary data abstraction. For example, let us imagine that you'd access Exchange's database directly. They change a table name in the next version, your script stops working. They change a field name, your script stops working. And so on, and so on. Granted, with .eml files this would be less of a problem, but even then, let's imagine that your backend stopped using .eml files in some future version. Then you'd be screwed.
There is a fixed amount of intelligence on the planet, and the population keeps growing :(
morphine
Grand Admiral Gerbil
Silver subscriber
 
 
Posts: 10069
Joined: Fri Dec 27, 2002 8:51 pm
Location: Portugal (that's next to Spain)

Re: Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 12:18 pm

Personally, I prefer Python to either PHP or (especially) Perl. It's truly object-oriented (as is PHP, not so much Perl), and is really a more robust programming language for a scripting language. But this is mainly a personal preference thing.

Honestly, though, you're probably better off sticking with Microsoft technology languages so you can leverage their libraries. C# is probably the best choice in this regard. In older versions of Exchange Server, you could even embed executable script into the server folders themselves, which could be triggered by server events. Not sure if that's been deprecated in newer versions, but it's worth looking into.
Buub
Maximum Gerbil
Silver subscriber
 
 
Posts: 4214
Joined: Sat Nov 09, 2002 11:59 pm
Location: Seattle, WA

Re: Experience with parsing exchange server emails

Postposted on Fri Apr 29, 2011 12:19 pm

Oh and SOAP... bleck! There is a reason REST is the future and SOAP is fading... :-)
Buub
Maximum Gerbil
Silver subscriber
 
 
Posts: 4214
Joined: Sat Nov 09, 2002 11:59 pm
Location: Seattle, WA

Re: Experience with parsing exchange server emails

Postposted on Sun May 01, 2011 2:03 am

The problem I see is that all these 3rd parties are guaranteed to send you something funny in terms format, be it HTML, RTF, or text. And then they may use tables or anything. Of course in the perfect world they will send you some standardized XML then you just need to write this once. And I can almost guarantee that they are going to change the email text regularly, so this will be pretty much a never-ending catch-up hack job. :(
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24524
Joined: Mon May 24, 2004 2:19 am


Return to Developer's Den

Who is online

Users browsing this forum: No registered users and 2 guests