A brief introduction – SAS, a Statistical Packaged Language
SAS stands for “Statistical Analysis Software” and was developed at North Carolina State University between 1970s and 1980s. It is a productive and reliable tool you could resort to when you are conducting data mining and statistic learning. In comparison to those primary modern programing languages, such as C++, C# or Java, SAS is much more specialized. It is also different from R and MatLab, which fall under a category in-between mainstream general-purposed programming language and SAS. SAS is categorized as a statistical packaged language, while R is sort of semi-statistical language.
I would leave scripting languages (eg. Python, Perl. Actually, SAS could sort of function like scripting language to integrate SAS with other application, would briefly touch on this topic later) and markup languages (or declarative languages, eg. XML, HTML) out of our comparison.
The aforementioned differences make SAS more productive and user-friendly, because you don’t have to worry about the implementation of data structures. Most usually used data structures in SAS are table(actually, it is vector, I’ll ignore the details here), array and hash map (sort of like dictionary in R or Python, through which you could retrieve value by key) . You could also work with Matrix in SAS via SAS/IML module, but that would be a more advanced topic.
SAS Modules – Base & Advanced SAS is different from SAS/STAT
I circled the first data manipulation sub-square, because this is the module you are tested for and certified for. It is mainly dealing with data manipulation, namely, arranging your raw data into structured table/data set which could later be analyzed by procedures in SAS/STAT module.
If you would spare me your attention here, I earnestly would like to debunk some of your misunderstandings. in case you attended or are attending courses such as Database Marketing, SAS Modeling and Marketing Analytics with SAS. What you learnt in these courses (eg. PROC REG, PROC LOGISTIC, PROC ANOVA, PROC GLM, PROC FACTOR, PROC CLUSTER, PROC FASTCLUS, PROC DISCRIM) are all procedures housed under SAS/STAT module and YOU ARE NOT GOING TO BE TESTED ON THESE TOPICS for your SAS Base/Advanced certificates. For SAS/STAT, it has separate certificate. And the SAS tests we are usually talking about are on Base SAS, Advanced SAS.
Topics for Base SAS include
- DATA step .
- Basic procedures.(PROC FREQ, PROC SORT, PROC MEANS, PROC SUMMARY and PROC TABULATE)
- Branching & Conditional Processing.
Topics for Advanced SAS include 4 parts:
- PROC SQL.
- SAS Macro.
- SAS Optimization.
- Some advanced data manipulation topics.
To conclude, I would like to put it this way: the SAS/STAT procedures and the mathematical concepts or algorithm learned in Database Marketing, SAS Modelling, Marketing Analytics with SAS, BI and ABI (BI and ABI are courses offered by MIS Department) are popular methodologies to reveal the story hidden deep in the data. You could see these methodologies applied routinely in everyday analysis practice or innovatively in some ad hoc projects. However, taking these courses and studying SAS/STAT procedures does not waive the time and effort you are supposed to diligently invest on studying SAS Base and Advanced topics. And the green Little SAS Book is by no means enough.
I would like to take a detour here to show you some other SAS modules before I explain to you why the Little SAS Book is not enough and also before I give you my recommendations of some studying materials. I want to give you a bird view of SAS’s stunning capabilities and display the road map in front of you so that you know what to study after you are certified and where to find help when you encounter some problems after you become a data analyst (usually, problems could be solved within SAS).
SAS/IML and Integrating with R
Beneath is a simple example I snipped from the SAS documentation website. It is simple, but it does show you SAS’s capability of working with matrix and more importantly, by using “proc iml; submit /R; …block of code…; endsubmit;”, SAS could interact with R code and call R functions.
X statement, DDE and Integrating with Excel VBA
The above screenshot shows you how the X statement in SAS sends commands to Windows OS, and how you pass Excel VBA code from SAS to Excel. The codes within square brackets are actually Excel VBA codes. (For those who don’t know Excel VBA, VBA is a language based on Microsoft Visual Basic that enables you to code macro and develop applications in MS Excel, Access and also Word)
Without further ado, I would like to stop my slight detour on SAS IML and DDE here and come back to talk about the studying material and application process for SAS Base and Advanced Certificate test.
The Little SAS Book, Fifth Edition
The little SAS Book is a very handy pocket reference book. If I am going to make a metaphor here, I would say learning SAS is like learning a foreign language, say Japanese. And the little book is only like a pocket phrase book which enables you to quickly know how to bargain with souvenir vendors for a better deal in Nagoya. But it does not mean mastering the language and understanding the culture. The Little SAS Book is recommended by Professors, because they want you to have an easier start. They are reluctant to have you undergo two steep learning curves all together, one for SAS and one for statistics. When they try to explain to you two Weibull distributions can give you a logistic distribution or the idea of random effect and random coefficient models, they don’t want you simultaneously struggle with the SAS problem, such as how you name your array elements (using new names or existing names) would give you two totally different PDVs (Program Data Vector). If this is the first time you run into this acronym – PDV, you should read on. Don’t be panic. I would tell you where you can find a detailed explanation on SAS PDV later. In conclusion, don’t misunderstand Professors’ goodwill and take the Little SAS Book as your SAS Bible.
SAS Certification Prep Guide: Base Programming for SAS 9 , Third Edition
http://goo.gl/YdOKKB (Click here to check it out on Amazon).
SAS Certification Prep Guide: Advanced Programming for SAS 9, Third Edition
http://goo.gl/OXOqXK (Click here to check it out on Amazon).
808 Pages for the Base one, 1040 pages for the Advanced one. Their price is $110.29 and $93.21, respectively. Rather expensive, but if you search these titles suffixed with “PDF” online, it is very easy for you to find out free PDF downloads of their previous edition, namely, the second edition.
For the Base one, I don’t recommend you to read the previous edition, the change is noticeable, I would recommend you to go to our university’s Eugene McDermott Library and read it online. You could either take notes or if you feel taking notes is to time consuming, you could screenshot the important sentences or paragraphs and paste them to your Microsoft PowerPoint or OneNotes for your later reference.
For the Advanced one, download the previous edition. They are quite the same and when it is saved in your local drive, you could always revisit it for real quick if your memory fails you.
The Reasons for Why You Have to Read the Prep Guide
— Base Prep Guide teaches your how to VISUALIZE the data manipulation in SAS
& The Advanced Prep Guide is One of The Best Books to Learn SQL
I don’t want to fully expand this topic here. Because no matter how I try my best to explain to you the PDV and PROC SQL, this tiny blog by no means have enough space for me to give you the full story. You have to read the books and that is why those two books have roughly 2000 pages in total.
PDV (Program Data Vector) is sort of like a conceptual data structure of SAS, you won’t see that directly, it is implemented within SAS in the black box. But the Base Prep Guide draws it for you and let you know how memory is allocated for each variable in compile time. How each variable is initialized and how RETAIN statement would initialize and update the values in PDV in a different way. After read the relevant session in SAS, you would know how SAS read the data in and populate the PDV row by row rather than reading in the entire table or data set all together. The PDV is a single row (that is why it is called a vector rather than a matrix). After you understand PDV, then you would really understand how pointers work in MERGE data step and how the previous value read in are overwritten or reset. Also why you have to explicitly OUTPUT data in PDV in a loop. You would also know how different the PDV would be when you specify MULTIPLE data set in a SINGLE SET statement instead of specifying MULTIPLE SET statement with only ONE data set specified in EACH SET statement. You would also know giving array elements new names or existing names would result in distinctly different PDVs.
And the point for you to understand the PDV is that only by doing so, could you obtain best understanding of your data structure, which in turn would be extremely crucial when you move further to more complicated analysis.
The first part of the Advanced Prep Guide is all on SQL. Though SAS enhances SQL in its PROC SQL procedure but the SQL “SELECT… FROM… WHERE… GROUP BY… HAVING… ORDER BY…” block still works in the same way. The Advanced Prep Guide does a good job in terms of breaking down SQL and teaching you in what order each clause of SQL statement is evaluated. Also how the Cartesian product is first produced based on FROM and later unqualified rows are eliminated according to the subsetting criteria specified in WHERE query, as well as topics on subquery and in-line FROM views. The mechanism of how SQL works is really important, because not the SQL syntax, but the logics is of our first priority.
If you study these two Prep Guides dutifully, I assure you it would be too hard to fail the certificate tests.
Exam Sign-Up Walk-through
Follow this link, you could find all information regarding the test. And the navigation bar and Exam Tab sessions are quite self-explaining.
For example, in the first exam tab, you would find out information on the exam time, number of questions, question types and also exam topics. It mentioned short-answer questions, but indeed, they are all multiple choice questions (at least, when I took the test last year, it was the case). I assume it is still so, and it significantly lower the difficulty of the test, so don’t be anxious. In comparison to GRE, GMAT, or LSAT, it’s a piece of cake.
If you click on the “Exam Registration” Tab, you would see how to sign up for the test.
www.pearsonvue.com/sas, this is the website where you head to and sign up for your test. SAS delegate test administration to Pearson. In order to apply for student discount, do remember to copy your Student ID Card and print out an unofficial transcript from our Galaxy Student Portal. Please see the screenshot underneath.
After you got electronic PDF files for your scanned Student ID Card and your transcript, email them to firstname.lastname@example.org, within a day or two, you would receive a coupon code which you could apply when you check out in www.pearsonvue.com/sas. To see details on how to apply for discount, follow this link http://support.sas.com/certify/faq.html. This coupon would give you a 50% discount on the original $180 test fee and makes the fee for each Base or Advanced Test $90 dollars.
You have to be Base qualified before you take the Advanced test. But you could study for both and take them one in the morning and the other in the afternoon, if you are confident, because the test result comes out immediately like GMAT right after you finish your tests.
Driving directions for the nearest test location: if your drive out from the main entrance in front of JSOM, take left and head straight to the direction where Methodist Hospital is. Once you see the 7/11 store at the intersection after you just pass the Methodist Hospital, take right and the destination is right on your left. It is a new test center, probably for the purpose of accommodating UTD students. This test center is only 5-min driving away.
My humble blog article ends here, but you should stride on to get certified. And getting certified is not the real issue, your proficiency in SAS coding and your capability of applying it to real world problem should always be your primary concerns. No pain, No gain. I wish you good luck.
For more information about our MS in Marketing program
———— Jerry Hao, 02/10/2014