Personal computing discussed

Moderators: renee, SecretSquirrel, just brew it!

 
malebolgia
Gerbil Elite
Topic Author
Posts: 973
Joined: Fri Apr 05, 2002 7:00 pm
Location: USA

Noob Trying to Parse .rtf File

Tue Jul 07, 2009 9:24 am

I'm working with a rather annoying program at work, which of course has very limited output options. On top of that, it only exports the data in the .rtf format. When you add up all these wonderful limitation's you get a nice 1200+ page .rtf document. Naturally what I want in the document is spread throughout it.

One of the problems I'm having is figuring out how to parse the data. Here's a snippet of the format it's in:

Title
CODE: II/Moderate
STATUS:

Description (centered up on the line)
blah blah blah

then it repeats the same format; from 'Title' to the text (blah blah).

Title
CODE: I/Unknown
STATUS:

Description (centered up on the line)
blah blah blah

What I need are all the CODE: I 's and nothing else. So it seems to me I need to:
  • First find the text that matches CODE: I/ (as it doesn't always say moderate); to another file.
  • Copy the one line above the matched string; to another file (Title).
  • Then copy the line of the matched string, to another file.
  • Copy everything until it finds text that matches CODE: I (terminator part of the parse), to another file.
  • Then if I care about neatness I can go in and remove the last copied line... so I don't have Title there.. but meh that's a minor thing.

Now I'm sure people who have parsed things before are pulling their hair at my approach, but that's why I came here for help. I have multiple OS I can use, Redhat Linux being my main OS... but I'm willing to use Windows if need be. :)
 
bthylafh
Maximum Gerbil
Posts: 4320
Joined: Mon Dec 29, 2003 11:55 pm
Location: Southwest Missouri, USA

Re: Noob Trying to Parse .rtf File

Tue Jul 07, 2009 10:12 am

Could you load the RTF into a word processor (OO.o, Abiword, whatever), tell it to export to .TXT, and run your parser on the text file?
Hakkaa päälle!
i7-8700K|Asus Z-370 Pro|32GB DDR4|Asus Radeon RX-580|Samsung 960 EVO 1TB|1988 Model M||Logitech MX 518 & F310|Samsung C24FG70|Dell 2209WA|ATH-M50x
 
notfred
Maximum Gerbil
Posts: 4610
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Noob Trying to Parse .rtf File

Tue Jul 07, 2009 10:18 am

awk
 
drsauced
Gerbil Jedi
Posts: 1543
Joined: Mon Apr 21, 2003 1:38 pm
Location: Here!

Re: Noob Trying to Parse .rtf File

Tue Jul 07, 2009 10:36 am

I've seen some amazing things done with awk, so that gets a +1. If you're looking for the rtf specs, here:

http://msdn.microsoft.com/en-us/library ... 10%29.aspx
Calm seas never made a skilled mariner. But, sadly I'm an A's fan.
 
Nitrodist
Grand Gerbil Poohbah
Posts: 3281
Joined: Wed Jul 19, 2006 1:51 am
Location: Minnesota

Re: Noob Trying to Parse .rtf File

Wed Jul 08, 2009 12:30 am

Maintain a Queue. Analyze the next line in.

Ruby pseudocode example.

require 'thread'

queue = Queue.new

file = File.new("filename.rtf", "r")

file.each { |line|
queue.pop
queue.push(line)
if queue[0].downcase[0, 4] == "code:"
{
  found the line... let's do some stuff.
}
}
Image
 
Flying Fox
Gerbil God
Posts: 25690
Joined: Mon May 24, 2004 2:19 am
Contact:

Re: Noob Trying to Parse .rtf File

Wed Jul 08, 2009 8:05 am

Nitrodist wrote:
Maintain a Queue. Analyze the next line in.

That may not work. RTF is a markup language like HTML, so a "line" may span multiple physical lines. You also need to account for possible formatting tags with the keywords (basically means Regex is a must in searching the keywords).

Awk may be fine. Ruby may be fine given some additional logic with RTF spec knowledge. I did a quick google and found some RTF -> Text converter so that may be of use. I also think Word automation with VBScript/Macro may work here as well.
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On