I am writing code for Stata. The code reads in data from a text file, then assigns variable names like "Price" to whats read in as ER34556 so I actually know what the variables are. I have multiple text files of data, corresponding to different years in a panel survey. Variables translate across years, but have different names. ie ER34556 is "Price" in 2007 and ER45566 is "Price" in 2009. There exists online documentation for this data, where i can search "ER34556" and it will return a list of variables across years that correspond to this, ie I will get "[05]ER25588 [07]ER34556 [09]ER45566" as part of the returned information. I have written the code for 2009 and cut and pasted to previous years. What I want to do is this: Take the nonsensical variable name "ER..." from my stata code (written as a text file) and search the online documentation. Then, use the returned list of variables corresponding to other years to update my code written for other years. As I see it (please correct me if im wrong at any point), this will require several steps
- Extract the variables names from the 2009 code (commands are written: generate varname = ER...), so i would need everything to the right of the equals sign.
- Take each "ER" variable, and search the documentation.
- Extract the list that is returned
- Search the stata code for additional instances of the "ER" variable
- Assign the new variable name from the documentation on a year by year basis.
- Repeat above as needed
This should be possible, but I have no idea where to start! (languages, methods, etc) Any help at all would be greatly appreciated! If more information is needed to answer the question, please let me know. Thanks in advance.
