![]() |
| ||||
***Note: The PSID data center automatically merges PSID and CDS data. The instructions below are intended for informative purposes only and will help you understand the structure of the PSID data.***
This information is presented in four separate sections: a) PSID file structure, b) two methods of assembling a cross-year family-individual file, c) assembling a cross-year family file, and d) single-year family files and single-year family-individual files.
The traditional cross-year family-individual file used for the PSID through 1989 has been replaced by separate single-year family files and a cross-year individual file. For instance, through the 1992 data collection year there are 25 single-year family files containing family-level variables collected in each wave of the study from 1968 through 1992 and a single cross-year individual file containing all individual-level variables collected from 1968 to 1992 for both respondents and non-respondents. Thus the "main" PSID data files include two types of data files -- a) single-year family files and b) a cross-year individual file.
Each single-year family file contains one record for each family interviewed in the specified year. The twenty-five single-year family files (one for each year of the study from 1968 through 1992) contain all of the family-level variables collected in each wave. The records in each file are identified by the family Interview Number for that year, in sort order by that variable, and contain the family-level variables for that year.
+-----+ |68fam| +-----+ format: family data 1968 records: one record for each family in 1968 ids: 1968 family Interview Number sort order: 1968 family Interview Number N: 4,802 families MB of data: 3.4 MB +-----+ |69fam| +-----+ format: family data 1969 records: one record for each family in 1969 ids: 1969 family Interview Number sort order: 1969 family Interview Number N: 4,460 families MB of data: 4.4 MB . . . . +-----+ |92fam| +-----+ format: family data 1992 records: one record for each family in 1992 ids: 1992 family Interview Number sort order: 1992 family Interview Number N: 9,829 families MB of data: 22.0 MB
The cross-year individual file contains one record for each person ever in a PSID family from the beginning of the study through the current year. The records in the cross-year individual file are identified by 1968 family Interview Number (V30001) and Person Number (V30002) and are in sort order by these variables. The file also contains the Interview Number of the family with which the person was associated in each year after 1968 and all other individual-level variables from 1968 through 1992.
+--------+ +-----+-----+ +-----+ |sortid's| |68ind|69ind|...|92ind| +--------+ +-----+-----+ +-----+ format: individual data for 1968-1992 records: one record for each person ever-in through 1992 ids: 1968 family Interview Number and Person Number sort order: 1968 family Interview Number and Person Number N: 50,915 persons MB of data: 91.7 MB
Few analysts will want to analyze the full data file for all persons ever in the study, and so your first step is to decide which variables, individuals and years of data interest you.
The basic principle in merging data from a single-year family file with data from the cross-year individual file involves matching the two files using annual Interview Numbers for the year in which the family variables were collected. Thus it is critical that the annual Interview Number variables be retained as part of any subsetted data, either family or individual. The chart below shows the family Interview Number variables for the single-year family files and cross-year individual file.
______________________________ ------------------------------ Year Family Individual File File ------------------------------ 1968 V3 V30001 1969 V442 V30020 1970 V1102 V30043 1971 V1802 V30067 1972 V2402 V30091 1973 V3002 V30117 1974 V3402 V30138 1975 V3802 V30160 1976 V4302 V30188 1977 V5202 V30217 1978 V5702 V30246 1979 V6302 V30283 1980 V6902 V30313 1981 V7502 V30343 1982 V8202 V30373 1983 V8802 V30399 1984 V10002 V30429 1985 V11102 V30463 1986 V12502 V30498 1987 V13702 V30535 1988 V14802 V30570 1989 V16302 V30606 1990 V17702 V30642 1991 V19002 V30689 1992 V20302 V30733 ------------------------------
Note that not each record in the cross-year individual file will have a matching record in every single-year family file. This happens when an individual who was once part of a responding family moves away or dies and is no longer associated with a family in the study; the person is said to be non-response. The non-response person's Interview Number in the cross-year individual file is filled with 0s (as are the other variables) for years in which no data were collected about him or her.
When merging the cross-year individual file with a single-year family file, both SPSS and SAS will fill in system missing values for the 19nn family variables for individuals who were not associated with a responding family in 19nn. Depending on your particular analysis needs, you may or may not wish to include individuals with missing family-year records. Provide appropriate instructions to the programs you use for merging to include or exclude individuals with missing family-year records.
We can think of several approaches to creating a cross-year family-individual file from the components. Two are described and illustrated below. SAS and SPSS statements provided in the SAS and SPSS sub-directories can be used to help construct the programs.
First select individuals and variables from the cross-year individual file (remembering to retain all relevant annual family Interview Number variables) and then match that data with the desired variables from a single-year family file, matching on the appropriate annual family Interview Number variable, using a one-to-many match.
Next, match the resulting file (which now contains one record for each individual with selected variables from the cross-year individual file and the first family file) with a second family file matching on the appropriate annual family Interview Number variable, using a one-to-many match.
Repeat with additional single-year family files until all required family data are obtained and merged with the cross-year individual data, as the diagram below shows.
See SPSS or SAS examples for an illustration of this approach using three years of family data.
. +---------------------------+ +--------------+ . |1968-1992 Individual File | |1st Family | . |N=inds, subset if desired | | File | . | | |N=1yr fam | . +---------------------------+ +--------------+ . | | . +------------------------+ . | STEP 1: Sort and match on first annual family Interview Number . | . +-------------------------+ +-----------+ . |1st Family + 1968-1992 | |2nd Family | . |Individual File | | File | . |N=inds, subset if desired| |N=2yr fam | . +-------------------------+ +-----------+ . | | . +------------------------+ . | STEP 2: Sort and match on second annual family Interview Number . | . +-------------------------+ +-----------+ . |1st Family + 2nd Family | |3rd Family | . |+ 1968-1992 Individual | | File | . |N=inds, subset if desired| |N=3yr fam | . +-------------------------+ +-----------+ . | | . +------------------------+ . | STEP 3: Sort and match on third annual family Interview Number . | . +------------------------------------+ . |1st Family + 2nd family + 3rd Family| . |+ 1968-1992 Individual File | . |N=inds, subset if desired | . +------------------------------------+
Alternatively, you could do a series of one-to-many matches of the single-year family files and the cross-year individual file matching on the appropriate annual family Interview Number and then merge the resulting single-year family-individual files in a one-to-one match using the 1968 Interview Number and Person Number. Detailed steps are noted below.
Step1: Subset annual family Interview Number and other selected variables and select cases from cross-year individual file.
Step2a: Subset selected variables from the year-n family file.
Step2b: Sort subsetted year-n family file from Step 2a by year-n family Interview Number.
Step2c: Sort subsetted cross-year individual file from Step 1 by year-n family Interview Number.
Step2d: Merge sorted cross-year individual file from Step 2c with sorted year-n subsetted family file from 2b (a one-to-many, family-to-individual, match) matching on the year-n family Interview Number.
Step2e: Sort resulting year-n family-individual file from Step 2d by the individual identifiers, 68 family Interview Number (V30001) and Person Number (V30002).
... Repeat Steps 2a-2e for all other years.
Step3: Merge family-individual files from Step 2e by the individual identifiers, 68 family Interview Number (V30001) and Person Number (V30002).
See the diagram for an illustration of this approach.
See SPSS or SAS examples for an illustration of this approach using using 25 years of family data.
. +---------++---------++---------++---------++---------++---------+
. |68-92 In-||1st ||68-92 In-||2nd ||68-92 In-||3rd |
. |dividual ||Family ||dividual ||Family ||dividual ||Family |
. |File ||File ||File ||File ||File ||File |
. |N=inds ||N=1yr fam||N=inds ||N=2yr fam||N=inds ||N=3yr fam|
. +---------++---------++---------++---------++---------++---------+
. | | | | | |
. +-----------+ +-------------+ +------------+
. | | |
Step 2:
Match on 1st year Match on 2nd year Match on 3rd year
Interview Number Interview Number Interview Number
. | | |
. +---------------+ +---------------+ +---------------+
. |1st Family- | |2nd Family- | |3rd Family- |
. |Individual File| |Individual File| |Individual File|
. |N=inds | |N=inds | |N=inds |
. +---------------+ +---------------+ +---------------+
. | | |
. +----------------------+-----------------------+
. |
Step 3: Match on 1968 Interview Number and Person Number |
. +-----------------------------------+
. | |
. | Cross-year Family-Individual File |
. | N=inds |
. +-----------------------------------+
To assemble a 1992 cross-year family file from these files, a procedure similar to one of the above would be followed, but only the cross-year individual records of the 1992 head would be selected from the cross-year individual file. Merge data from the single-year family files using the annual family Interview Number variables to match as described in Method 1 or Method 2 above to create a merged 1968-1992 family-level file for currently responding families.
Each member of a family has a family Interview Number for each wave with a value identical to the values of that data item for all the other family members in that family that year. In addition, except in 1968, each individual is annually assigned a unique sequence number, which indicates the person's position and status for any given year's list of family members. Thus, the first person listed, always the Head of the family, is 01, the second person listed is 02, and so on.
To create a 1992 cross-year family-level file, select from the cross-year individual file those cases where V30734 (1992 Sequence Number) is equal to 01, since each family must have at least one member, although it may or may not have more.*
__________________________________________________________________________
* Variable V30734, Sequence Number, should be used instead of V30735, Relationship to Head, because although each family has one and only one current Head (i.e., where V30734 = 01-20 and V30735 = 10), it is possible that the prior year's Head has moved out since the previous interview and a new Head is present for the current interview. Relationship to Head for movers-out is coded with reference to the previous year's Head, so for both the current Head and the previous Head, V30735 = 10.
There is not an 1968 Sequence Number variable; use V30003, Relationship to Head, instead. There was only one Head per household in 1968.
__________________________________________________________________________
To create other years' cross-year family-level files, the Sequence Number variable for the latest desired year of data should be used and merges done with the appropriate single-year family files. Again, this produces a file of families who were response through the latest year and eliminates families who had already become nonresponding.
Producing single-year family files for cross-sectional analysis is simplicity itself. Simply use the single-year file.
Single-year family-individual files are also relatively simple. Select all individuals whose Sequence Number for the desired year is non-zero (for 1968, use V30003, Relationship to Head, instead) and match the family Interview Number for that year from the individual file with the family Interview Number from the corresponding family file. The family Interview Numbers in the family and individual files are listed in a table in Section "B. Assembling A Cross-Year Family-Individual File", above.
Institute for Social Research | University of Michigan | Privacy | Conditions of Use