Utility Program for Checking SAS Logs

In the interest of moving repeated tasks in research from the realm of the human to that of the machine, I’ve written a script that checks SAS logs for common errors and prints a summary. I’ve always felt like finding useful information about errors in SAS logs can be like finding a needle in a haystack. The short program is written in Python and I’ve named it checklog.

The name of the log file to check may be supplied as an optional argument; if no arguments are supplied, the program searches the current directory for the most recently written SAS log (ie, a file ending in “.log”).

The major value-added for this program is that I have made a list of SAS problems appearing in logs that seem important but that don’t stop the SAS script from executing — things like creating datasets that are empty or trying to create a variable that already exists, neither of which generally raise ERROR flags in the log. I strip out or summarize WARNING’s and NOTE’s that I have identified as not important (eg, notes that tables have been created), and print any that are unexpected for human review. If the same error, note, or warning appears many times, the program simply prints the message and how many times it appears, saving the human the time of having to search through the log manually for errors only to find the same message repeated throughout. I’ve been using it for several months, and it generally does the job for me of picking out what I need to know to decide whether a program ran as expected from just one glance.

After printing info about potential problems in the log, the program prompts for a y/n response to open the raw log in vi for further human inspection, so if you’re used to opening your logs in vi to read them, this program will offer you a lot of extra quick info at your fingertips for hardly any cost.

Update May 2016. Running the command checklog.py -help prints the following:

checklog.py [file]

file: If supplied, open and check this file for errors or important warnings.
If no file is supplied, then the most recent file in the current directory
is used.

After a check is run, a prompt will appear asking whether to open the log in vi.

Allowed responses to this prompt and their results are:
yes | y opens the log in vi

no | n | :q | ctrl+D closes this program without opening the log

cat will print the log then exit (equivalent to
“cat “)

tail | t will print the log to the screen as it
is being written to by SAS (equivalent
to “tail -f “)

A convenient way to use this program is to alias it. To do this, Alex
puts the following line in ~/.bashrc :
alias checklog=”python /bod/soi2/econ2/ambell/checklog.py ”

Then, he just types the command checklog to call this program in the current
directory.

textme.py: Have important emails come in as text messages (for free!)

It’s sometimes more relaxing to forward texts from certain people to your cell phone as text messages, rather than compulsively checking your email or being notified by the stupid iPhone new mail sound of every spam message you get.

textmeI’ve uploaded a Python script here that does just that. It’s in zip format because I’ve also bundled the necessary Python dependencies with it since it can be tricky sometimes installing your own in Python (twilio-pythonsix, and httplib2). If you already have these or prefer to install them yourself, you can just download the Python script by itself here.

This script logs you in to Gmail and checks your email every 10 seconds. If the from field of a new email matches a string that you specify, it sends you a text message including the sender, subject, send time, and first 100 characters of the email.

How to make it work ASAP (note: requires making a free account on a web texting service, but it’s quick, free, and better yet hip):

  1. Download the script and dependencies I referenced above.
  2. Sign up for a free trial account at https://www.twilio.com. I think you may have to provide a credit card number, but I’ve had an account there for a while, and they seem like a respectable company (they’ve never charged me). Of course, you may decide you really like this and want to pay them a few cents per text message to remove the note on each message that it was sent from a Twilio trial account (it doesn’t bother me).
  3. Fill in the following lines at the top of the script. The first three you’ll get from making your Twilio account, the next two are your Gmail login info to check your inbox every 10 seconds, then put in the cell number you’d like the free text messages to go to and the regular expression pattern to search the from field for in order for it to send the message. [tip: to forward messages from Bob B and Jane J, put pattern = “Bob B|Jane J”; the middle character is a pipe, found above the return key, and means “or”].

ACCOUNT_SID = “your twilio id”
AUTH_TOKEN = “your twilio token”
twilionum = “your twilio phone nmber”
username = “your gmail username”
password = “your gmail password”
cellnumber = “the number you want to receive text messages at, eg 2038675309”
pattern = “regular expression (aka pattern, eg a name) to search the from field”

  1. Launch the script by changing to the directory where textme.py is, and type python textme.py. It should continue running and sending you text messages until the process is terminated and/or the computer goes to sleep or restarts. So, it’s probably easiest to have this running somewhere on a server that you know won’t restart for a while.
  2. I hope it’s helpful! As always, email with questions or comments.

Thesis turned in!

sortingIt’s been a hectic few weeks, but it felt great to finally hand in my thesis this afternoon. I have dropped on a page the abstract of my paper and a graphical walk-through of my findings. The paper itself is available here. I’m especially grateful to my thesis advisor, Oded Galor, for so many conversations and comments. I’m also very appreciative of many friends for helpful discussions along the way about my findings (and particularly to Chris and Christine for comments this week!).

I’ll be presenting my thesis to the Economics department May 1 (anyone is welcome to attend). I’ll also be making a less technical presentation at Theories in Action on Sunday, April 28.

Submarine Patents from a 21st Century Vantage Point: Honors Thesis Proposal

I am writing an interdisciplinary senior thesis at Brown spanning the fields of computer science and economics. The subject is submarine patents.

A submarine patent is a patent whose prosecution review at the US Patent & Trademark Office is purposefully prolonged by the applicant, in the hopes of “emerging” some years later with a patent on what has become a fundamental technology, extracting licensing fees from businesses who have already built upon this technology without knowledge of the patent’s filing.

Two reforms made submarine patents much less worthwhile to pursue. While patent terms used to be determined from issue date, starting in June of 1995, all new patent filings would receive terms from date of filing — the fact that the clock was ticking during prosecution made stalling at this stage much less desirable. A further reform came in November of 2000, when the USPTO announced that most patent applications would be published to the world 18 months after filing. Given the changes in patent term structure and the lifting of the veil of secrecy surrounding patent applications, the ability for inventors to unexpectedly corner a market long after filing their invention has been effectively eliminated.

Despite the closure of these loopholes, submarine patents continue to issue. In examining all patents that have issued over the past several decades, a single anomaly is prominent: many applications self-sorted to file prior to the closing of the loophole. A tremendous number of patent applications were filed in these days and weeks, and we now see that these were no ordinary applications. In fact, applications filed in these few weeks represent the most pronounced spike in average pendency in modern history. Identifying submarine patents as those filing prior to this discontinuity offers a unique vantage point from which to study the motives for and outcomes of submarine patents.

First, I have downloaded the full text of every patent granted in the past three decades.

I transformed these documents into roughly the following relations (number of tuples in parentheses):

  • Basic bibliographic info — one line for each patent grant (4.8M) (dta sample)
  • Assignee: name, address, etc. (for all assignees of patent) (4.3M) (dta sample)
  • Inventor: name, address, etc. (for all assignees of patent) (11M) (dta sample)
  • References to other patents (in the US and abroad) (55M) (dta sample)
  • References to “non-patent literature” (papers, brochures, etc.) (16M) (dta sample)
  • Parents (1.8M) (dta sample)
  • Fields searched by examiner as prior arts (11M) (dta sample)

Presentation to Brown’s Economics Honors Thesis Class Nov. 20, 2012 (PDF):

A Python Tutorial for Economists

It’s been a few months since I’ve posted here; blogging was a bit taboo this summer at work (though it turns out I found plenty of other ways to raise red flags for the Cyber Security team using just Python + the Interwebs).

Working in an office with several other research assistants who were proficient with some statistical scripting languages (Stata, SAS), I began to think there’s probably a niche for a more general-purpose language in academic social science research (as well as in automating some of the tasks involved with casework around the office). I was already using Python in much of my work. What started out as a few trips to coworkers’ desks to help them write this or that script quickly turned into a few pages of notes, and that turned into some thirty pages of charts, explanations, and instructional tasks. (I must note, the final formatting was inspired by the style of my linear algebra lecture notes from last semester.)

I presented a version of Python for Economists to some coworkers at the FTC Bureau of Economics in July. I’ve been a student of three different college classes that taught Python from scratch, but I’ve never seen a way of teaching Python that I thought was appropriate for students already familiar with scripting languages such as Stata. I focus on two broad applications of Python I’ve found very useful in social science research: web scraping and textual processing (including regular expressions).

Downloads:

  • PDF of the booklet (34 pages, colored Python syntax highlighting)
  • Zipped supporting materials used in the exercises

What is on Google Patents?

I’m a bit disappointed now that I’m finally going through the data I downloaded from Google Patents throughout the semester. It doesn’t seem like it will be very useful for looking at patent trends prior to 2000. It’s unclear what sampling of patent applications they’re actually providing; I wish they were more transparent about what data they’re providing.

Global Interplays of Values, Wealth, and Geography

I wrote a research paper for my course in Geographic Information Systems (GIS). I had a blast writing it, and there was plenty of Stata and ArcGIS play to be done. I’m posting a download link here, in case anyone is as excited about this kind of stuff as I am.

Summary:

  • It’s true that distance to the equator is a good predictor of wealth. It predicts 28% of the variance in per-capita GDP! This was a satisfying result — highly significant.
  • We can do similar studies with measures of cultural values. It seems that many values also change similarly with geography.
  • Using four fairly arbitrarily selected values as regressors, about two thirds of the variance in per-capita income can be predicted. This was really surprising to me, and it seems like there’s a lot of similar work to be done here regarding mapping and cultural values.

Weaker Consistency Models

Here are slides I created for my databases class. The theme is weaker consistency models, and they cover the following papers:

  1. Werner Vogels, Eventually Consistent, Communications of the ACM, 2009
  2. J. Baker, et al., Megastore: Providing Scalable, Highly Available Storage For Interactive Services, CIDR, 2011
  3. Pat Helland, Life Beyond Distributed Transactions: An Apostate’s Opinion, CIDR, 2007

Slides in:

I’ve reached Google Images Fame

A search for "Christina Paxson"A friend just pointed this out to me. It’s quite touching, actually. If only I can find a way to jump ahead of that female inmate, I’ll be all set.

Have your computer email you when an action completes

I can’t imagine doing distributed or large batch work without an easy-to-use script that emails me once a command terminates. Before that, I was just going insane with checking on them all the time to see if my process was done or if an error has occurred, leaving the computer idle. I’ve called it notify (linked here, or pasted at end), and you can use it as a shell script to quickly call with other commands.

I issue it in the terminal like this:
$ python ~/reallylongscript.py; notify
The semicolon separates lines, and as soon as the first statement terminates (whether it terminates with success or with error), the notification script will be called.

You’ll get an email with a body like:

New notification from brunonia.

Where “brunonia” is the name of my computer (this script will determine the computer’s name automatically, which is very helpful when you need to know which of several computers needs attention!).

It’s been really useful for:

  • Running a long action on a remote machine, even if it’s as simple as deleting all the files in a large directory to free up space, so I don’t have to keep checking to see when it’s done.
  • Running a command on several machines over ssh (usually with screen in zombie mode) to let me know if a process fails and needs to be investigated and/or restarted. (The only downside for me so far is that if the filesystem croaks or some similar problem affects all nodes at once, my inbox needs a bit of clean-up work!).

This is where I got the base code from, and they have a few other ideas on how to use it. This strategy can be easily integrated into already-existing scripts to keep you in the loop as they run. I suppose one could also write a shell script using Python’s os.system() method (or similar newer ones) and sys.argv list that takes as parameters commands to run, and sends the email after all commands in sys.argv have finished.

#!/usr/bin/python

import smtplib, platform

me = platform.node()

fromaddr = 'myusername@gmail.com'
toaddrs  = 'myusername@gmail.com'
msg = 'New notification from '+me+'.'  

# Credentials (if needed)
username = 'myusername'
password = 'mypassword'  

# The actual mail send
server = smtplib.SMTP('smtp.gmail.com:587')
server.starttls()
server.login(username,password)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()