07 February, 2006
Useful things
I've been working with tabular data files all day. Some are comma-separated, others tabbed, yet others have an arbitrary amount of spaces between fields.
All because I'm using so many different applications: custom stuff for extracting features from images, R for principal components analysis, an on-line app for multiple regression (Have a look).
Thankfully, Emacs has some features that make translating and editing these files quite a lot easier:
- The obvious regexp-replace
- The less obvious kill-rectangle (C-x r k)
- And the matching yank-rectangle (C-x r y)
The last two aren't heard of very often, but they're really really useful: begin your region at the top-left of the rectangle you want, move to the bottom-right and use C-x r k - Emacs magically cuts just the rectangle.
Also useful today was Python's eval builtin. The on-line multiple regression software I mentioned outputs a function that looks like: 1.5 V1[t] + 7.2 V2[t] + 1.1 (V2[t])^2. I wrote a little script that takes the copy-pasted function, regexps in the implied multiplication and Pythonises the hat powers into ** and then calls eval() to calculate the result.
The trick is to make a list for each variable (V1 and V2 in this example) so that the subscripts work. Emacs comes in handy again here - cut the rectangles from the data file and then regexp-replace the carriage returns into commas.