Personal computing discussed

Moderators: renee, SecretSquirrel, just brew it!

 
Asin
Gerbil Team Leader
Topic Author
Posts: 292
Joined: Tue Mar 09, 2004 10:36 pm
Location: Ontario, Canada
Contact:

Identation and Spacing Program

Fri Dec 09, 2005 8:29 pm

I've got a 3 week break between school and my next co-op term. I picture a very boring 3 weeks (aside from Christmas stuff).

I had this idea a long time ago but never really looked into it much: Parse any Java file and output the result in a perfectly indented and spaced program. All the braces, one space before and after operators, truncate extraneous space, etc...

On the face of it, this is a very simple program. But there are lots of intricacies in Java. And I plan on not using lexical analysis aids, like JLex.

What am I getting myself into? I don't see this as being any less than a thousand lines of code since there are "groups" of Java keywords that require different things in terms of braces (as an example), like "try", "for", "if", etc...

I'll probably need a small database for the keywords and a tokenizer method of some sort. Probably parsing one character at a time and some sort of (non-)deterministic finite automaton.
 
UberGerbil
Grand Admiral Gerbil
Posts: 10368
Joined: Thu Jun 19, 2003 3:11 pm

Fri Dec 09, 2005 9:03 pm

Yes, a fun project (as long as you consider it just a project -- it's not like source code formatters / pretty printers are a new idea, even for Java). Essentially you're writing a simple-minded parser that tokenizes the keywords and builds a simple graph for the program structure and then spits it all back out according to the formatting rules you define.
 
BigMadDrongo
Gerbil Elite
Posts: 909
Joined: Mon Apr 29, 2002 12:57 pm
Location: London, UK
Contact:

Fri Dec 09, 2005 10:21 pm

Asin wrote:
I plan on not using lexical analysis aids, like JLex

I'd think about whether this is a good idea. If you want experience in writing a parser, then sure, skip the tools, but bear in mind it's a separate task from the code beautification. It probably is worth writing a recursive descent parser or two if you want to know how programming languages work, but as my compilers lecturer put it, "parsing is good for your soul, but you don't want things which are good for your soul too often in your life".

You could go halfway, maybe use a lexing library but write the parser yourself or something. I'm just saying that it sounds like the interesting part of the program would be the code formatter rather than the parser, so it might be worth saving yourself some effort on the latter. (You could even use a tool-driven parser to start with, then swap in a handwritten one later if you felt masochistic :P)
 
madlemming
Gerbil XP
Posts: 341
Joined: Fri Oct 15, 2004 2:22 pm

Fri Dec 09, 2005 10:58 pm

Are you doing the program itself in java? It has some pretty good regex libraries ... but this seems like the sort of thing perl would be good at.
 
Asin
Gerbil Team Leader
Topic Author
Posts: 292
Joined: Tue Mar 09, 2004 10:36 pm
Location: Ontario, Canada
Contact:

Sat Dec 10, 2005 12:49 am

I wasn't aware of the terms used. I guess that I am looking to create a code formatter.

JIndent looks close to what I'm after. But mostly just shuffling Java code around and not the other stuff.

I've made my own compiler using JLex and Java CUP before. It took a subset of a high level C++ language and turned it into MIPS assembly code. We did that last term.

Yeah, so not just a parser. I'll probably have to switch between read in strings using the java.lang.String.startsWith() method and character by character analysis.

EDIT:
UberGerbil wrote:
Essentially you're writing a simple-minded parser that tokenizes the keywords and builds a simple graph for the program structure and then spits it all back out according to the formatting rules you define.

I guess this would be the better way to do this. Much like how JLex does it.

With that idea in mind, it seems a little easier since I could just have a helper method that removes leading spaces and then use the startsWith() method. I guess to add to the challenge, I can always write my own startsWith() combination tokenizer.
 
fc34
Minister of Gerbil Affairs
Posts: 2816
Joined: Wed May 08, 2002 8:39 am
Location: Somewhere

Mon Dec 12, 2005 11:57 pm

Shouldnt be too difficult, but it'll require a fair bit of planning. Not all lines end with ;. What are the lines that dont end with ';'.
Windows XP - The 64-bit wannabe with a 32-bit graphics interface for 16-bit extensions to a 8-bit patch on a 4-bit operating system designed to run on a 2-bit processor by a company that can't stand 1-bit of competition
 
bitvector
Grand Gerbil Poohbah
Posts: 3293
Joined: Wed Jun 22, 2005 4:39 pm
Location: San Francisco, CA

Re: Identation and Spacing Program

Tue Dec 13, 2005 12:14 am

Asin wrote:
I'll probably need a small database for the keywords and a tokenizer method of some sort. Probably parsing one character at a time and some sort of (non-)deterministic finite automaton.

I think you're going about this slightly askew. You'll probably want to do it the standard way most compilers do it. Write the scanning logic that can turn a byte stream into a token stream and take care of whitespace and comments. Then write a parser that can take a token stream and parse it based on the formal grammar of the Java language. The parser will build an intermediate representation in the form of an abstract syntax tree (plus comment annotations) and you'll walk the tree to re-output the code or do whatever kind of transformation you desire.

But there are many tools that already do that, like Jalopy. And writing a LALR(1) parser from scratch instead of using a parser generator isn't all that hard, but constructing all of the tables and LR(1) items is grunt work. Same thing with not using scanner generators. I'd suggest you use something like ANTLR. And Sun has an example of a Java parser using ANTLR on their website: http://java.sun.com/developer/technicalArticles/Parser/SeriesPt3/.
 
fc34
Minister of Gerbil Affairs
Posts: 2816
Joined: Wed May 08, 2002 8:39 am
Location: Somewhere

Tue Dec 13, 2005 12:19 am

I think what hes trying to accomplish is more of a indenter/spacer. Basically making reading of messy code (for all the people who like to code in non-standard ways) easier.

I don't think that you should go about using a compiler's way, because at the end of the day, you'd have to spend as much time creating white spaces as removing them, and the final result will have lots of white space anyway.
Windows XP - The 64-bit wannabe with a 32-bit graphics interface for 16-bit extensions to a 8-bit patch on a 4-bit operating system designed to run on a 2-bit processor by a company that can't stand 1-bit of competition
 
Asin
Gerbil Team Leader
Topic Author
Posts: 292
Joined: Tue Mar 09, 2004 10:36 pm
Location: Ontario, Canada
Contact:

Tue Dec 13, 2005 9:12 pm

fc34 wrote:
Basically making reading of messy code (for all the people who like to code in non-standard ways) easier.

More or less. Not really for easier reading since I'm fairly familiar with Java and other people's coding habits already.

For me, it's just a very basic and straightforward project that I want to do. I'm learning stuff about design issues and abstraction, so I figured that this would be a good way for see what I've learned.

fc34 wrote:
I don't think that you should go about using a compiler's way, because at the end of the day, you'd have to spend as much time creating white spaces as removing them, and the final result will have lots of white space anyway.

If I wanted to make a compiler from scratch, I probably would have used JLex and just used the Java source that results as my lexical analyzer. It would be cheating, but it's easier. :P

Yeah, I'll probably want to try and make this work first and then worry about the rest. Learned about black box and white box testing two terms ago. Had lots of projects or assignments with various languages and never really used it. This might be a good practice.
 
bitvector
Grand Gerbil Poohbah
Posts: 3293
Joined: Wed Jun 22, 2005 4:39 pm
Location: San Francisco, CA

Tue Dec 13, 2005 9:47 pm

Asin wrote:
fc34 wrote:
I don't think that you should go about using a compiler's way, because at the end of the day, you'd have to spend as much time creating white spaces as removing them, and the final result will have lots of white space anyway.

If I wanted to make a compiler from scratch, I probably would have used JLex and just used the Java source that results as my lexical analyzer. It would be cheating, but it's easier. :P

I'm not sure how using JLex to generate a scanner is 'cheating' unless your goal is to understand the construction process of scanning automata. It's really just gruntwork. Most compilers make use of scanner and parser generators. If using a tool to automate gruntwork is cheating, what are you doing using Java? I mean, after all, it's cheating since the compiler is writing all of the bytecode for you (not to mention all of the work by the garbage collector and runtime!).
 
Yahoolian
Grand Gerbil Poohbah
Posts: 3577
Joined: Sun Feb 16, 2003 3:43 pm
Location: MD
Contact:

Thu Dec 15, 2005 2:26 pm

Just use Python, where whitespace determines nesting level, instead of curly braces.
 
Asin
Gerbil Team Leader
Topic Author
Posts: 292
Joined: Tue Mar 09, 2004 10:36 pm
Location: Ontario, Canada
Contact:

Thu Dec 15, 2005 10:25 pm

I don't mean to brag, but like most people, my identation is perfectly consistent with or without an IDE.

My goal is to turn this:

package something.whatever;/**
* Stuff
* @version 0.1
*/class Test{private int iValue=0;
// ...
public void aMethod(int a){System.out.println(a);}public static void main(String[]args){/* do something */}}


into this:

package something,whatever;

/**
 * Stuff
 * @version 0.1
 */
class Test {

    private int iValue = 0;

    // ...
    public void aMethod (int a) {
        System.out.println (a);
    } // end aMethod

    public static void main (String[] args) {
        /* do something */
    } // end main

}

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On