STA2453F08: Statistical Consulting

www.utstat.toronto.edu/~brunner/2453f08


Last Class Meeting: Tuesday April 7th at 10 a.m. in Ramsey Wright 141

 

 

 

Logistic Regression Examples:


proc logistic order=internal descending; /* Always use descending for 0-1 DV */
     title2 'Logistic regression on perjury vote';
     model perjury = ritewing cpercent next0 next2 firsterm;
     nextelec: test next0=next2=0;
     others:   test cpercent=next0=next2=firsterm=0; /* Ctrl for ritewing */
     allvars:  test ritewing=cpercent=next0=next2=firsterm=0;
     /* Just for comparison with Testing Global Null Hypothesis: BETA=0 */

proc logistic order=internal descending; /* Always use descending for 0-1 DV */
     title2 'SAS will make your dummy variables';
     class nextelec;
     model perjury = ritewing cpercent nextelec firsterm;

SAS code for the Principal Componenets analysis of the Walker Music data is now in the directory containing the homework data sets. Look Here.

 


Location and Time

When we get together this term, it will be Tuesday between 10:10 and noon in Ramsey Wright 141.

Instructors (in alphabetical order):

  Email Phone Office
Jerry Brunner (Professor) brunner@utstat.utoronto.ca 416-978-7589 SS6026E
Laurel Duquette (Consulting Service Director) consult@utstat.utoronto.ca 416-978-4455 SS 3112

Course Plan

Assignments

Handouts

Computer Resources


Printing files at home

A convenient way to get a file to your home computer for printing is to email it to yourself. Try
       mail yourname@yourisp.com < fname
where yourname@yourisp.com is your email address and fname is the name of the file, like hw3.lst.

SSH (Secure Shell)

For security reasons, you need to connect using software that probably did not come with your computer. The protocol is SSH, which stands for "Secure SHell." When you use SSH, information travels over the Internet in encrypted form, so hackers have trouble intercepting your password and other information. You can download a free copy of SSH below.

With an Internet connection, SSH applications give you a text-only connection to utstat and other unix machines from your home computer. From utstat's prompt, you can run programs such as SAS, R and emacs.

Different SSH programs are recommended, depending on the operating system that you are using. To use these programs, you must be connected to the Internet, say with a broadband connnnection or via PPP over your phone line.

In any of these SSH programs, the first time you connect to a host, you will be told that the program can't verify that this host is really what it appears to be. Do you want to trust it? SSH is just being sanely paraniod. Say yes.

Copy-paste in Putty

Suppose you want to transfer fairly small amounts of text between the unix machine and your PC. In a normal Windows application like Explorer or Word, the edit menu has Copy and Paste items -- or you can use control-C and control-V. But PuTTY has no menus, and Control-C and control-V don't do what you might expect, especially if emacs is running. But you can still copy-paste; here's how:

Importing data from Excel spreadsheets

Most clients seem to record and keep their data in Miscrosoft Excel spreadsheets. But on unix machines, SAS likes plain text data files. Transferring the data can be a pain, because even if you save the spreadsheet as plain text, SAS will choke on the tab characters, and also the conventions for line breaks differ in Windows and unix/linux. To overcome this minor technical nightmare, proceed as follows.

  1. Save the spreadsheet as comma-delimited text (.csv). Open the file in Word, and save as text with (DOS) line breaks. Word for the Mac calls it "Text Only with Line Breaks (MS-DOS)." In Word 2007 for Windows, I saved the file as plain text and then clicked two radio buttons: one for MS-DOS and another for Insert Line Breaks.
  2. Transfer the data to the unix computer. The full version of PuTTY has an SFTP (Secure File Transfer Protocol) tool called PSFTP. When you start this up, you get a text-only window with a unix-like prompt. To connect to utstat, I typed open brunner@utstat.toronto.edu, gave my password, and then at the prompt typed put, then space, and then dragged the icon of the plain text file to the PSFTP screen. This produced a correct pathname -- the full name and location of the file, which happened to be on the Desktop -- on the PSFTP screen. Then I pressed Enter and the file was transferred.
  3. Once your data are on the unix machine (say with the filename name1.txt, type something like this at the unix prompt:    dos2unix < name1.txt > name2.data     to convert the Windows line breaks to unix line breaks. The result is in a new file called name2.data. SAS can deal with it. The .data part of the file name is arbitrary, but I find it useful.

This process is not pleasant, but there is one nice thing to report. The delimiter=',' option on the SAS infile statement will allow you to read your comma-delimited data directly without any more editing. I tried this and it works. My infile statement was

infile 'name2.data' delimiter=',';

For smaller data sets, it also seems reasonable that you could open the .csv file in Word, and then just copy-paste the whole thing into a PuTTY window where emacs is running. Would you still have to convert the line breaks in this case? I imagine so, but I haven't tried it.

Warning: It is very natural to leave missing data cells empty in an Excel spreadsheet, but if you do this and then export the data as described here, the data file will contain two consecutive commas, which SAS will treat as a single comma; the results are usually disasterous. SAS is being sensible in a way. This is just how it treats spaces. Two spaces the are same as one space unless it is reading the data using a fixed format.

So, if you are reading a raw data file consisting of comma delimited text, it is important to make sure you never have two consecutive commas. The best way to avoid this is, if missing data are to be blank in the spreadsheet, make sure the cell contains an actual blank space (press the space bar), and is not completely empty. SAS treats blank space between two commas as a missing value. One space or several -- it does not matter. The result is still a single missing value.

Getting rid of those mysterious files with a tilde (~)

When you start editing files with emacs, you will notice that additional files ending with a tilde (~) keep turning up in your directory. These are backup files, automatically created by emacs for your protection. I suppose they might be useful sometimes, but I find them annoying. If you get tired of deleting them, use emacs to create a file called .emacs in your home directory. This is an initialization file used to set options for emacs. Beginning it with a period makes it invisible to the ls command (but try ls –a). However, it's still there if you create it, and you can edit it like any other file. In the .emacs file, put a single line saying (setq make−backup–files nil). Don't forget the parentheses! Exit, saving the file. Next time you run emacs, no backup file will be created. Needless to say, your .emacs file can be very long and do a lot if you wish.