![]() |
| ||||
PSID File Structure and Merging PSID Data FilesNote: The PSID data center automatically merges PSID and CDS data. The instructions below are intended for informative purposes only and will help you understand the structure of the PSID data. Contents
This information is presented in four separate sections: a) PSID file structure, b) two methods of assembling a cross-year family-individual file, c) assembling a cross-year family file, and d) single-year family files and single-year family-individual files. A. PSID File StructureThe traditional cross-year family-individual file used for the PSID through 1989 has been replaced by separate single-year family files and a cross-year individual file. For instance, through the 1992 data collection year there are 25 single-year family files containing family-level variables collected in each wave of the study from 1968 through 1992 and a single cross-year individual file containing all individual-level variables collected from 1968 to 1992 for both respondents and non-respondents. Thus the "main" PSID data files include two types of data files -- a) single-year family files and b) a cross-year individual file. 1. The single-year family filesEach single-year family file contains one record for each family interviewed in the specified year. The twenty-five single-year family files (one for each year of the study from 1968 through 1992) contain all of the family-level variables collected in each wave. The records in each file are identified by the family Interview Number for that year, in sort order by that variable, and contain the family-level variables for that year. Annual Family Files -- Contain Family-Level Data Collected In A Single Wave+-----+ |68fam| +-----+ format: family data 1968 records: one record for each family in 1968 ids: 1968 family Interview Number sort order: 1968 family Interview Number N: 4,802 families MB of data: 3.4 MB +-----+ |69fam| +-----+ format: family data 1969 records: one record for each family in 1969 ids: 1969 family Interview Number sort order: 1969 family Interview Number N: 4,460 families MB of data: 4.4 MB . . . . +-----+ |92fam| +-----+ format: family data 1992 records: one record for each family in 1992 ids: 1992 family Interview Number sort order: 1992 family Interview Number N: 9,829 families MB of data: 22.0 MB 2. The cross-year individual fileThe cross-year individual file contains one record for each person ever in a PSID family from the beginning of the study through the current year. The records in the cross-year individual file are identified by 1968 family Interview Number (ER30001) and Person Number (ER30002) and are in sort order by these variables. The file also contains the Interview Number of the family with which the person was associated in each year after 1968 and all other individual-level variables from 1968 through 2005. 1968-2005 Cross-Year Individual File -- Contains All Individual-Level Data Collected From 1968-2005+--------+ +-----+-----+ +-----+ |sortid's| |68ind|69ind|...|05ind| +--------+ +-----+-----+ +-----+ format: individual data for 1968-2005 records: one record for each person ever-in through 2005 ids: 1968 family Interview Number and Person Number sort order: 1968 family Interview Number and Person Number N: 67,271 persons MB of data: 21.71 MB
B. Assembling A Cross-Year Family-Individual FileFew analysts will want to analyze the full data file for all persons ever in the study, and so your first step is to decide which variables, individuals and years of data interest you. The basic principle in merging data from a single-year family file with data from the cross-year individual file involves matching the two files using annual Interview Numbers for the year in which the family variables were collected. Thus it is critical that the annual Interview Number variables be retained as part of any subsetted data, either family or individual. The chart below shows the family Interview Number variables for the single-year family files and cross-year individual file. Family Interview Numbers in Single-year Family Files and in Cross-year Individual File______________________________ ------------------------------ Year Family Individual File File ------------------------------ 1968 V3 ER30001 1969 V442 ER30020 1970 V1102 ER30043 1971 V1802 ER30067 1972 V2402 ER30091 1973 V3002 ER30117 1974 V3402 ER30138 1975 V3802 ER30160 1976 V4302 ER30188 1977 V5202 ER30217 1978 V5702 ER30246 1979 V6302 ER30283 1980 V6902 ER30313 1981 V7502 ER30343 1982 V8202 ER30373 1983 V8802 ER30399 1984 V10002 ER30429 1985 V11102 ER30463 1986 V12502 ER30498 1987 V13702 ER30535 1988 V14802 ER30570 1989 V16302 ER30606 1990 V17702 ER30642 1991 V19002 ER30689 1992 V20302 ER30733 1993 V21602 ER30806 1994 ER2002 ER33101 1995 ER5002 ER33201 1996 ER7002 ER33301 1997 ER10002 ER33401 1999 ER13002 ER33501 2001 ER17002 ER33601 2003 ER21002 ER33701 2005 ER25002 ER33801 ------------------------------ Note that not each record in the cross-year individual file will have a matching record in every single-year family file. This happens when an individual who was once part of a responding family moves away or dies and is no longer associated with a family in the study; the person is said to be non-response. The non-response person's Interview Number in the cross-year individual file is filled with 0s (as are the other variables) for years in which no data were collected about him or her. When merging the cross-year individual file with a single-year family file, both SPSS and SAS will fill in system missing values for the 19nn family variables for individuals who were not associated with a responding family in 19nn. Depending on your particular analysis needs, you may or may not wish to include individuals with missing family-year records. Provide appropriate instructions to the programs you use for merging to include or exclude individuals with missing family-year records. We can think of several approaches to creating a cross-year family-individual file from the components. Two are described and illustrated below. SAS and SPSS statements provided in the SAS and SPSS sub-directories can be used to help construct the programs. 1. Method 1 - Merge Using Family Data Added Sequentially To Cross-Year Individual Data.First select individuals and variables from the cross-year individual file (remembering to retain all relevant annual family Interview Number variables) and then match that data with the desired variables from a single-year family file, matching on the appropriate annual family Interview Number variable, using a one-to-many match. Next, match the resulting file (which now contains one record for each individual with selected variables from the cross-year individual file and the first family file) with a second family file matching on the appropriate annual family Interview Number variable, using a one-to-many match. Repeat with additional single-year family files until all required family data are obtained and merged with the cross-year individual data, as the diagram below shows. See SPSS or SAS examples for an illustration of this approach using three years of family data. Merge Using Family Data Added Sequentially To Cross-Year Individual Data. +---------------------------+ +--------------+ . |1968-1992 Individual File | |1st Family | . |N=inds, subset if desired | | File | . | | |N=1yr fam | . +---------------------------+ +--------------+ . | | . +------------------------+ . | STEP 1: Sort and match on first annual family Interview Number . | . +-------------------------+ +-----------+ . |1st Family + 1968-1992 | |2nd Family | . |Individual File | | File | . |N=inds, subset if desired| |N=2yr fam | . +-------------------------+ +-----------+ . | | . +------------------------+ . | STEP 2: Sort and match on second annual family Interview Number . | . +-------------------------+ +-----------+ . |1st Family + 2nd Family | |3rd Family | . |+ 1968-1992 Individual | | File | . |N=inds, subset if desired| |N=3yr fam | . +-------------------------+ +-----------+ . | | . +------------------------+ . | STEP 3: Sort and match on third annual family Interview Number . | . +------------------------------------+ . |1st Family + 2nd family + 3rd Family| . |+ 1968-1992 Individual File | . |N=inds, subset if desired | . +------------------------------------+
2. Method 2 - Merge Using Multiple Family-Individual Files.Alternatively, you could do a series of one-to-many matches of the single-year family files and the cross-year individual file matching on the appropriate annual family Interview Number and then merge the resulting single-year family-individual files in a one-to-one match using the 1968 Interview Number and Person Number. Detailed steps are noted below.
See the diagram for an illustration of this approach. See SPSS or SAS examples for an illustration of this approach using using 25 years of family data. Illustration Of Merge Using Multiple Family-Individual Files. +---------++---------++---------++---------++---------++---------+
. |68-92 In-||1st ||68-92 In-||2nd ||68-92 In-||3rd |
. |dividual ||Family ||dividual ||Family ||dividual ||Family |
. |File ||File ||File ||File ||File ||File |
. |N=inds ||N=1yr fam||N=inds ||N=2yr fam||N=inds ||N=3yr fam|
. +---------++---------++---------++---------++---------++---------+
. | | | | | |
. +-----------+ +-------------+ +------------+
. | | |
Step 2:
Match on 1st year Match on 2nd year Match on 3rd year
Interview Number Interview Number Interview Number
. | | |
. +---------------+ +---------------+ +---------------+
. |1st Family- | |2nd Family- | |3rd Family- |
. |Individual File| |Individual File| |Individual File|
. |N=inds | |N=inds | |N=inds |
. +---------------+ +---------------+ +---------------+
. | | |
. +----------------------+-----------------------+
. |
Step 3: Match on 1968 Interview Number and Person Number |
. +-----------------------------------+
. | |
. | Cross-year Family-Individual File |
. | N=inds |
. +-----------------------------------+
C. Assembling A Cross-Year Family FileTo assemble a 1992 cross-year family file from these files, a procedure similar to one of the above would be followed, but only the cross-year individual records of the 1992 head would be selected from the cross-year individual file. Merge data from the single-year family files using the annual family Interview Number variables to match as described in Method 1 or Method 2 above to create a merged 1968-1992 family-level file for currently responding families. Each member of a family has a family Interview Number for each wave with a value identical to the values of that data item for all the other family members in that family that year. In addition, except in 1968, each individual is annually assigned a unique sequence number, which indicates the person's position and status for any given year's list of family members. Thus, the first person listed, always the Head of the family, is 01, the second person listed is 02, and so on. To create a 1992 cross-year family-level file, select from the cross-year individual file those cases where ER30734 (1992 Sequence Number) is equal to 01, since each family must have at least one member, although it may or may not have more.* __________________________________________________________________________
__________________________________________________________________________ To create other years' cross-year family-level files, the Sequence Number variable for the latest desired year of data should be used and merges done with the appropriate single-year family files. Again, this produces a file of families who were response through the latest year and eliminates families who had already become nonresponding. D. Single-Year Family Files And Single-Year Family-Individual FilesProducing single-year family files for cross-sectional analysis is simplicity itself. Simply use the single-year file. Single-year family-individual files are also relatively simple. Select all individuals whose Sequence Number for the desired year is non-zero (for 1968, use ER30003, Relationship to Head, instead) and match the family Interview Number for that year from the individual file with the family Interview Number from the corresponding family file. The family Interview Numbers in the family and individual files are listed in a table in Section "B. Assembling A Cross-Year Family-Individual File", above. E. Additional HelpIf you have questions regarding this file, please contact us. |
Institute for Social Research | University of Michigan | Privacy | Conditions of Use