As a part of our ongoing Sas Curriculum, we are currently going through the book “Learning SAS® by Example “A Programmer’s Guide”” by Ron Cody.
Below are the codes and explanations to practice problems from chapter 16-20, that we were given as a learning assignment , For each practice problem the codes and explanations are summarized as under–
The permanent library used for creating the data set and proc reports is A15007.
------libname a15007 "C:\Users\user\Desktop\sasbook\sas_assignment";
Chapter-16
Below are the codes and explanations to practice problems from chapter 16-20, that we were given as a learning assignment , For each practice problem the codes and explanations are summarized as under–
The permanent library used for creating the data set and proc reports is A15007.
------libname a15007 "C:\Users\user\Desktop\sasbook\sas_assignment";
Chapter-16
Problem 1 and 2
1. Using the SAS data set College, compute the mean, median, minimum, and
maximum and the number of both missing and non-missing values for the variables
ClassRank and GPA. Report the statistics to two decimal places.
2. Repeat Problem 1, except compute the desired statistics for each combination of
Gender SchoolSize. Do this twice, once using a BY statement, and once using a
CLASS statement.
Code:
*Data set COLLEGE;
proc format library=a15007;
value $yesno 'Y','1' = 'Yes'
'N','0' = 'No'
' ' = 'Not Given';
value $size 'S' = 'Small'
'M' = 'Medium'
'L' = 'Large'
' ' = 'Missing';
value $gender 'F' = 'Female'
'M' = 'Male'
' ' = 'Not Given';
run;
data a15007.college;
length StudentID $ 5 Gender SchoolSize $ 1;
do i = 1 to 100;
StudentID = put(round(ranuni(123456)*10000),z5.);
if ranuni(0) lt .4 then Gender = 'M';
else Gender = 'F';
if ranuni(0) lt .3 then SchoolSize = 'S';
else if ranuni(0) lt .7 then SchoolSize = 'M';
else SchoolSize = 'L';
if ranuni(0) lt .2 then Scholarship = 'Y';
else Scholarship = 'N';
GPA = round(rannor(0)*.5 + 3.5,.01);
if GPA gt 4 then GPA = 4;
ClassRank = int(ranuni(0)*60 + 41);
if ranuni(0) lt .1 then call missing(ClassRank);
if ranuni(0) lt .05 then call missing(SchoolSize);
if ranuni(0) lt .05 then call missing(GPA);
output;
end;
format Gender $gender1.
SchoolSize $size.
Scholarship $yesno.;
drop i;
run;
Problem- 1
options fmtsearch=(a15007);
title "Statistics on the College Data Set";
proc means data=a15007.college
n
nmiss
mean
median
min
max
maxdec=2;
var ClassRank GPA;
run;
proc sort data=a15007.college out=college;
by Gender SchoolSize;
run;
Problem - 2
title "Statistics on the College Data Set - Using BY";
title2 "Broken down by Gender and School Size";
proc means data=college
n
nmiss
mean
median
min
max
maxdec=2;
by Gender SchoolSize;
var ClassRank GPA;
run;
Output:
Problem 4
Repeat Problem 3 (CLASS statement only), except group small and medium school
sizes together. Do this by writing a new format for SchoolSize (values are S, M, and
L). Do not use any DATA steps.
Code
proc format;
value $groupsize
'S','M' = 'Small and Medium'
'L' = 'Large';
run;
title "Statistics on the College Data Set";
title2 "Broken down by School Size";
proc means data=college
n
mean
median
min
max
maxdec=2;
class SchoolSize;
var ClassRank GPA;
format SchoolSize $groupsize.;
run;
Output
Chapter - 17
Problem 2
Using the SAS data set BloodPressure, generate frequencies for the variable Age.
Use a user-defined format to group ages into three categories: 40 and younger, 41 to
60, and 61 and older. Use the appropriate options to omit the cumulative statistics
and percentages.
Code
/*Create Blood pressue data set*/
data a15007.bloodpressure;
input Gender : $1.
Age
SBP
DBP;
datalines;
M 23 144 90
F 68 110 62
M 55 130 80
F 28 120 70
M 35 142 82
M 45 150 96
F 48 138 88
F 78 132 76
;
proc format;
value agegrp low-40 = '40 and lower'
41-60 = '41 to 60'
61-high = '61 and higher';
run;
title "Using a Format to Regroup Values";
proc freq data=a15007.bloodpressure;
tables age / nocum nopercent;
format age agegrp.;
run;
Output
Problem - 6
Using the SAS data set College, produce a three-way table of Gender (page) by
Scholarship (row) by SchoolSize (column).
Code
title "Three-way Tables";
proc freq data=a15007.college;
tables Gender*Scholarship*SchoolSize;
run;
Output
Chapter 18
Problem - 2
Produce the following table. Note that the ALL column has been renamed Total.
Demographics from COLLEGE Data Set
Code
title "Demographics from COLLEGE Data Set";
proc tabulate data=a15007.college format=6.;
class Gender Scholarship SchoolSize;
tables SchoolSize all,
Gender Scholarship all/ rts=15;
keylabel n=' '
all = 'Total';
run;
Output
Problem - 4
Produce the following table. Note that the keyword ALL has been renamed Total,
Gender is formatted, and ClassRank (a continuous numeric variable) has been
formatted into two groups (0–70 and 71 and higher).Demographics from COLLEGE Data Set
Code
proc format;
value $gender 'F' = 'Female'
'M' = 'Male';
value rank low-70 = 'Low to 70'
71-high = '71 and higher';
run;
title "Demographics from COLLEGE Data Set";
proc tabulate data=a15007.college format=6.;
class Gender Scholarship ClassRank;
tables Scholarship all,
(ClassRank)*(Gender all) / rts=15;
keylabel n=' '
all = 'Total';
format Gender $gender. ClassRank rank.;
run;
Output
Chapter-19
Problem - 2
Run the same two procedures shown in Problem 1, except create a contents file, a
body file, and a frame file.
Code
ods listing close;
ods html body = 'prob19_2_body.html'
contents = 'prob19_2_contents.html'
frame = 'prob19_2_frame.html';
title "Using ODS to Create a Table of Contents";
proc print data=a15007.college(obs=8) noobs;
run;
proc means data=a15007.college n mean maxdec=2;
var GPA ClassRank;
run;
ods html close;
ods listing;
Output
Problem- 4
Send the results of a PROC PRINT on the data set Survey to an RTF file.
Code
*Data set SURVEY;
data a15007.survey;
infile 'C:\Users\user\Desktop\sasbook\sas_assignment\survey.txt' pad;
input ID : $3.
Gender : $1.
Age
Salary
(Ques1-Ques5)(1.);
run;
ods listing close;
ods rtf file='C:\Users\user\Desktop\sasbook\sas_assignment\prob19_4.rtf';
title "Demonstrating RTF Output";
proc print data=a15007.survey noobs;
run;
ods rtf close;
ods listing;
Output
Chapter-20
Problem-2
Repeat Problem 1, except produce a pie chart instead of a bar chart.
Code
*Data set BICYCLES;
data a15007.bicycles;
input Country & $25.
Model & $14.
Manuf : $10.
Units : 5.
UnitCost : comma8.;
TotalSales = (Units * UnitCost) / 1000;
format UnitCost TotalSales dollar10.;
label TotalSales = "Sales in Thousands"
Manuf = "Manufacturer";
datalines;
USA Road Bike Trek 5000 $2,200
USA Road Bike Cannondale 2000 $2,100
USA Mountain Bike Trek 6000 $1,200
USA Mountain Bike Cannondale 4000 $2,700
USA Hybrid Trek 4500 $650
France Road Bike Trek 3400 $2,500
France Road Bike Cannondale 900 $3,700
France Mountain Bike Trek 5600 $1,300
France Mountain Bike Cannondale 800 $1,899
France Hybrid Trek 1100 $540
United Kingdom Road Bike Trek 2444 $2,100
United Kingdom Road Bike Cannondale 1200 $2,123
United Kingdom Hybrid Trek 800 $490
United Kingdom Hybrid Cannondale 500 $880
United Kingdom Mountain Bike Trek 1211 $1,121
Italy Hybrid Trek 700 $690
Italy Road Bike Trek 4500 $2,890
Italy Mountain Bike Trek 3400 $1,877
;
proc gchart data=a15007.bicycles;
pie Country Model;
run;
quit;
Output
Problem - 4
Again, using the Bicycles data set, show the distribution of units sold (Units) for each
value of Model. Your chart should look like this:
Code
options ps=54;
title "Distribution of Units Sold by Model";
pattern value=empty;
proc gchart data=a15007.bicycles;
vbar Units / midpoints = 0 to 6000 by 2000
group = Model;
run;
quit;
Output
Please note : The codes used in this assignment would be available from the following URL:
https://www.dropbox.com/sh/ditfnjuvpxm7eog/AAA3zLMCXClg-5dwG8ED6kU6a?dl=0