An Example Data Set

When data is entered into the SAS program, it usually is in the form of lines of data. You describe the structure of your data (variables present and the number of lines per observation) using the INPUT statement. The system structures a data set from the observations. You can perform algebraic and logical data conversions to generate new variables or to transform variables that are input to the system.

The system allows the use of alphabetic values in the input data. For example instead of requiring a location to be specified as an integer, it is possible to specify state codes of NC, SC, VA, for North Carolina, South Carolina, and Virginia.

Data from a product evaluation survey will be used as an example throughout this documentation to illustrate SAS procedures. An example SAS program which reads and prints a listing of the data values associated with each of the variables in this data set is given in: survey.sas: A Sample Program.

In this survey, individuals were assigned identification numbers (ID) and asked to respond with sex, age, and income as well as their ratings (on a scale of 1 to 9) of three products (R1, R2, and R3). The data collected from ten of the respondents are illustrated below:


 OBS	ID	SEX	AGE	INC	R1	R2	R3

   1	 1	F	34	17	7	2	2
   2	17	M	40	14	5	5	3
   3	33	M	45	 6	7	2	7
   4	49	M	24	14	7	5	7
   5	65	F	52	 9	4	7	7
   6	81	M	45	11	7	7	7
   7	 2	F	24	17	6	5	3
   8	18	F	40	14	7	5	2
   9	34	F	45	 6	6	5	6
  10	50	M	34	17	5	7	5


A VARIABLE is a set of data values for the same measurement. Sex, age, and income are variables in the above data set. SAS variable names can be up to 8 characters in length; valid characters are the letters A-Z, numerals 0-9, and underscore (_). For additional information, see: What are the characteristics and limitations of SAS names?

A DATA VALUE is a single measurement. For example, the individual values for sex, age, or income in the survey data set are data values. Individual data values may be either character or numeric. In the Survey Data Set, age is a numeric variable and sex is a character variable. You may use CHARACTER values that include letters, numbers, blanks, and special characters, but you should not include a semicolon within the data Character data values may consist of up to 200 characters. A character variable identified on a LENGTH or the INPUT statement by including a dollar sign ($) following its name. For example, the following values could be assigned as character values:

JOHN
Mary Anne O'Reilly
1968

Note: When the value 1968 is defined as a character value, SAS will automatically convert it to numeric when used in arithmetic computations. Missing values will be generated if an attempt is made to perform a numeric computation using invalid values of a character variable.

NUMERIC values must be numbers; a + or - sign can precede the number. Unless you use the COMMAw.d format for reading your data, do not include commas in data values which are to be read by SAS. An integer can be written with or without a decimal point. For example the following entries may be defined as numeric:

71
  .000328
-4.

Fortran exponential and double precision data types (e.g., 5.267E-07 or 7.923376723D12) are also supported; however, they may contain no blanks if free format input (which is described in The SAS Data Step and Formats for Input Data) is used.

SAS assigns the value of a decimal point ('.') to missing numeric values. You can code a '.' to indicate a missing value when entering your data for both numeric and character values. Note, however, that when performing logical operations on character values, a missing value is treated as a blank, (' '), unless a special format is used when reading in the data. The treatment of missing values in mathematical operations and logical comparisons is discussed in more detail in Data Assignment and Transformation Statements . For more information, please refer to the SAS Language: Reference - Version 6 Edition under missing values.

An OBSERVATION is a set of data values for each sample or case. In this example, all of the values on each line are a single observation. An observation may include more than one line (record) in the input data set. There should be only one value for each variable in a single observation.

A DATASET is a collection of observations. Ignoring the observation numbers (Obs in the above table), the survey data set consists of 10 observations and 7 variables.

Virginia Tech Computing Center--Distributed Information Systems
Last updated: January 7, 1998