Noob Trying to Parse .rtf File

From Visual Basic to GNU C, this is the place to talk programming.

Moderators: SecretSquirrel, just brew it!

Noob Trying to Parse .rtf File

Postposted on Tue Jul 07, 2009 9:24 am

I'm working with a rather annoying program at work, which of course has very limited output options. On top of that, it only exports the data in the .rtf format. When you add up all these wonderful limitation's you get a nice 1200+ page .rtf document. Naturally what I want in the document is spread throughout it.

One of the problems I'm having is figuring out how to parse the data. Here's a snippet of the format it's in:

Title
CODE: II/Moderate
STATUS:

Description (centered up on the line)
blah blah blah

then it repeats the same format; from 'Title' to the text (blah blah).

Title
CODE: I/Unknown
STATUS:

Description (centered up on the line)
blah blah blah

What I need are all the CODE: I 's and nothing else. So it seems to me I need to:
  • First find the text that matches CODE: I/ (as it doesn't always say moderate); to another file.
  • Copy the one line above the matched string; to another file (Title).
  • Then copy the line of the matched string, to another file.
  • Copy everything until it finds text that matches CODE: I (terminator part of the parse), to another file.
  • Then if I care about neatness I can go in and remove the last copied line... so I don't have Title there.. but meh that's a minor thing.

Now I'm sure people who have parsed things before are pulling their hair at my approach, but that's why I came here for help. I have multiple OS I can use, Redhat Linux being my main OS... but I'm willing to use Windows if need be. :)
malebolgia
Gerbil Elite
 
Posts: 973
Joined: Fri Apr 05, 2002 7:00 pm
Location: New Mexico, USA

Re: Noob Trying to Parse .rtf File

Postposted on Tue Jul 07, 2009 10:12 am

Could you load the RTF into a word processor (OO.o, Abiword, whatever), tell it to export to .TXT, and run your parser on the text file?
Think for yourself, schmuck!
i5-2500K@4.3|Asus P8P67-LE|8GB DDR3-1600|Powercolor R7850 2G|1.5TB 7200.11|1988 Model M|Saitek X-45 & P880|Logitech MX 518|Dell 2209WA|Sennheiser PC151|Asus Xonar DX
bthylafh
Grand Gerbil Poohbah
 
Posts: 3130
Joined: Mon Dec 29, 2003 11:55 pm
Location: Southwest Missouri, USA

Re: Noob Trying to Parse .rtf File

Postposted on Tue Jul 07, 2009 10:18 am

awk
notfred
Grand Gerbil Poohbah
 
Posts: 3712
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Noob Trying to Parse .rtf File

Postposted on Tue Jul 07, 2009 10:36 am

I've seen some amazing things done with awk, so that gets a +1. If you're looking for the rtf specs, here:

http://msdn.microsoft.com/en-us/library ... 10%29.aspx
Calm seas never made a skilled mariner.
drsauced
Graphmaster Gerbil
 
Posts: 1463
Joined: Mon Apr 21, 2003 1:38 pm
Location: Here!

Re: Noob Trying to Parse .rtf File

Postposted on Wed Jul 08, 2009 12:30 am

Maintain a Queue. Analyze the next line in.

Ruby pseudocode example.

Code: Select all
require 'thread'

queue = Queue.new

file = File.new("filename.rtf", "r")

file.each { |line|
queue.pop
queue.push(line)
if queue[0].downcase[0, 4] == "code:"
{
  found the line... let's do some stuff.
}
}
Image
Nitrodist
Grand Gerbil Poohbah
 
Posts: 3280
Joined: Wed Jul 19, 2006 1:51 am
Location: Minnesota

Re: Noob Trying to Parse .rtf File

Postposted on Wed Jul 08, 2009 8:05 am

Nitrodist wrote:Maintain a Queue. Analyze the next line in.

That may not work. RTF is a markup language like HTML, so a "line" may span multiple physical lines. You also need to account for possible formatting tags with the keywords (basically means Regex is a must in searching the keywords).

Awk may be fine. Ruby may be fine given some additional logic with RTF spec knowledge. I did a quick google and found some RTF -> Text converter so that may be of use. I also think Word automation with VBScript/Macro may work here as well.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24289
Joined: Mon May 24, 2004 2:19 am


Return to Developer's Den

Who is online

Users browsing this forum: No registered users and 1 guest