Document Information:
Package base version 2.7.1
Project 5: Statistical
Learning with Multi-Scale Cardiovascular Data
Contact email: [email protected]
(2)
read.table() and read.csv()
An R list is an
object consisting of an ordered collection of objects known as its components.
There is no particular need for the components to be of the same mode or type,
and, for example, a list could consist of a numeric vector, a logical value, a
matrix, a complex vector, a character array, a function, and so on.
Here is a simple
example of how to make a list:
> Lst <-
list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))
Components of
lists may be named, and referred to either by giving the component name as a
character string or the number in double square brackets. For example:
> Lst[[1]]
[1]
"Fred"
> Lst$wife
[1]
"Mary"
> Lst[[2]]
[1]
"Mary"
>
Lst$child.ages
[1] 4 7 9
This is a very
useful convention as it makes it easier to get access to each individual
component of a list.
A data frame is a
format for a dataset that we frequently use in R. It is a basically a list of
data. It can be a vector, a matrix, or any multidimensional array. The
definition of data frame is a list data with class "data.frame".
There are restrictions on lists that may be made into data frames, namely:
(1) The
components must be vectors (numeric, character, or logical), factors, numeric
matrices, lists, or other data frames; (2) Matrices, lists, and data frames
provide as many variables to the new data frame as they have columns, elements,
or variables, respectively; (3) Numeric vectors, logicals and factors are
included as is, and character vectors are coerced to be factors, whose levels
are the unique values appearing in the vector; (4) Vector structures appearing
as variables of the data frame must all have the same length, and matrix
structures must all have the same row size.
A data frame may
for many purposes be regarded as a matrix with columns possibly of differing
modes and attributes. It may be displayed in matrix form, and its rows and
columns extracted using matrix indexing conventions, for example:
>
x<-data.frame(a=1:4,b=c(TRUE,TRUE,FALSE,TRUE),c=c("A","B","C","D"),d=
> x
a b c d
1 1
TRUE A 13
2 2
TRUE B 14
3 3
FALSE C 15
4 4
TRUE D 16
> x[,1] # the
1st column of x
[1] 1 2 3 4
> x[,2] # the
2nd column of x
[1]
TRUE TRUE FALSE TRUE
> x[1,] # the
1st row of x
a b c d
1 1
TRUE A 13
> x[3,] # the
3rd row of x
a b c d
3 3
FALSE C 15
> x[3,4] # the
element in 3rd row and 4th column
[1] 15
2. Merging Data Frames (function: merge() )
(1) Merge() ( online help
file )
The reason why we
need merging data frames is that we often get several data files for one
project. Each file contains one piece of information. In order to have a
thorough analysis, we need to have all files put together in an organized way
so that the analysis can be done on the whole dataset easily.
Merging data
frames ( merge() ) can put two data frames together based on the common column
names or row names. After the merging, the rows are by default
lexicographically sorted on the common columns, unless ��sort = TRUE�� is
specified in the merge() function. Other detailed argument explanations in
merge() can be found in R help files by typing ��?merge()�� in R console, or
simply click online
help file. Merging datasets example:
> authors
surname nationality deceased
1
Tukey
US yes
2
Venables Australia
no
3
Tierney
US no
4
Ripley
UK no
5
McNeil Australia no
> books
name
title
other.author
1
Tukey Exploratory Data
Analysis
<NA>
2
Venables Modern Applied Statistics
... Ripley
3
Tierney
LISP-STAT
<NA>
4
Ripley
Spatial
Statistics
<NA>
5
Ripley Stochastic
Simulation
<NA>
6
McNeil Interactive Data
Analysis
<NA>
7 R
Core An Introduction to
R Venables & Smith
> merge(authors,
books, by.x = "surname", by.y = "name", all = TRUE)
surname nationality
deceased
title
1
McNeil Australia
no Interactive Data Analysis
2 R
Core
<NA>
<NA> An
Introduction to R
3
Ripley
UK
no Spatial
Statistics
4
Ripley
UK
no Stochastic Simulation
5
Tierney
US
no
LISP-STAT
6
Tukey
US yes Exploratory Data
Analysis
7
Venables
other.author
1
<NA>
2 Venables &
Smith
3
<NA>
4
<NA>
5
<NA>
6
<NA>
7
Ripley
Another example:
> x
k1 k2 data
1 NA
1 1
2 NA
NA 2
3 3
NA 3
4 4
4 4
5 5
5 5
> y
k1 k2 data
1 NA
NA 1
2 2
NA 2
3 NA
3 3
4 4
4 4
5 5
5 5
> merge(x, y,
by=c("k1","k2")) # NA's match
k1 k2
data.x data.y
1 4
4 4 4
2 5
5 5 5
3 NA
NA 2 1
> merge(x, y,
by=c("k1","k2"),all=TRUE) # NA's match
k1 k2
data.x data.y
1 2
NA NA 2
2 3
NA 3 NA
3 4
4 4 4
4 5
5 5 5
5 NA
1 1 NA
6 NA
3 NA 3
7 NA
NA 2 1
In order to merge
data frames, we need to have all individual data files read as data frames in
R. So read.table() or read.csv() are needed here.
(2) read.table() and read.csv() ( online help file
)
Large data
objects will usually be read as values from external files rather than entered
during an R session at the keyboard. If variables are to be held mainly in data
frames, as we strongly suggest they should be, an entire data frame can be read
directly with the read.table() function.
To read an entire
data frame directly, the external file will normally have a special form.
The first line of the file should have a
name for each variable in the data frame; (2) Each additional line of the file
has as its first item a row label and the values for each variable.
For example,
Input file form
with names and row labels:
Price Floor Area
Rooms Age Cent.heat
01
52.00 111.0 830
5 6.2 no
02
54.75 128.0 710
5 7.5 no
03
57.50 101.0 1000
5 4.2 no
04
57.50 131.0 690
6 8.8 no
05
59.75 93.0 900
5 1.9 yes
...
Then the function
read.table() can then be used to read the data frame directly
> HousePrice
<- read.table("houses.data" , header=TRUE, as.it=TRUE)
where the
header=TRUE option specifies that the first line is a line of headings, and
hence, by implication from the form of the file, that no explicit row labels
are given. As.it=TRUE means not converting character variables to factors.
read.table() or read.csv() by default convert the character variables (which
are not converted to logical, numeric or complex) to factors.
If the input data
file is in csv format, then use read.csv() with arguments defined above.
Now we are going to
try a simple example on real data. In this example, we first read several data
files in R, then perform several data manipulations according to the analysis
requirement and finally merge data files together to generate a whole dataset.
In this example, we basically used all frequently used functions and commands
for data organization. So understanding this example will benefit you in
understanding data manipulation in R. The next two sections are the data links
and R scripts for this example.
Here are the
datasets used in the example:
(1) JHUcomb.csv (ECG)
(2) icd.data.oct11.2007.csv (SNP)
cleaned version
of sept so that the JHUIDs are in the common format got rid of "new"
or "A" or "B"
(3) age.gender.csv (AGE.GENDER)
(4) ReynRaceData-100207.csv (RACE)
(5) firing.datainducibility.csv (
(6) img.data11.19.07.csv (IMAGE)
Here the code used
for the data above. There two editions of the code. One is heavily commented
for users with limited experience of R. The other one has no comments at all.
They are equivalent and it is up to you which one to use.
(1) Commented Code ( download here
)
#
#
# Merge
#
# 1)
JHUcomb.csv (ECG)
# 2)
icd.data.oct11.2007.csv (SNP) -- cleaned version of sept so that the
#
JHUIDs are in the common format
#
got rid of "new" or "A" or "B"
# 3)
age.gender.csv (AGE.GENDER)
# 4)
ReynRaceData-100207.csv (RACE)
# 5)
firing.datainducibility.csv (
# 6)
img.data11.19.07.csv (IMAGE)
#
# Read data from
file "JHUcomb.csv" in the current folder
# which is
specified under R menu->File->Chang directory.
#
# The file is in
csv format. We use the function
#
read.csv("file path").
#
# The
"as.is=T" means not converting character variables to
# factors.
read.csv() by default converts the character
# variables
(which are not converted to logical, numeric or
# complex) to
factors.
#
ECG<-read.csv("JHUcomb.csv",
as.is=T)
#
# Read data from
"icd.data.oct.11.2007.csv" in the current
# folder. And do not
convert character variables to factors.
#
SNP<-read.csv("icd.data.oct.11.2007.csv",as.is=T)
#
# Read data from
"age.gender.csv" in the current
# folder. And do
not convert character variables to factors.
#
AGE.GENDER<-read.csv("age.gender.csv",as.is=T)
#
# Read data from
"ReynRaceData-100207.csv" in the current
# folder. And do
not convert character variables to factors.
#
RACE<-read.csv("ReynRaceData-100207.csv",as.is=T)
#
# Read data from
"firing.data.4.09.2008.csv" in the current
# folder. And do
not convert character variables to factors.
#
#
# Read data from
"img.data11.19.07.csv" in the current
# folder. And do
not convert character variables to factors.
#
IMAGE<-read.csv("img.data11.19.07.csv",as.is=T)
#
# Need to create
an ID variable for the ECG data
# based on the
first column which is a filename.
#
# Assign N.ECG
the value of number of rows in ECG dataframe.
#
N.ECG<-dim(ECG)[1]
#
# Add a column
called "ID" in ECG dataframe. Assign all ""s
# to the column.
#
#
rep("",N.ECG) means generating a vector by repeating ""
# N.ECG times.
#
ECG$ID<-rep("",N.ECG)
#
# Do a loop to
every row in N.ECG dataframe. In each row, if
# the first
column is a "NA", then assign the ID column "NA".
# Otherwise, if
ECG is string and its first character is a ".",
# then assign ID
column the value of "JHU"+the substring of
# the first
column (from 7th character to the 9th character)
# and without any
separation mark.
# If it is not a
".", then do the same assignment except using
# the string from
4th character to the 6th character.
#
for (i in
seq(1,N.ECG))
{
#
# is.na()
detect whether variable is NA or not. It returns
# either
TRUE or FALSE.
#
# ECG[i,1]
means the element in the ith row and 1st column.
#
if
(is.na(ECG[i,1]))
{
# assign the element in ith row and ID column the value of NA
#
ECG$ID[i]<-NA
}
else
{
#
# Determine whether the 1st character of the string in ith row
# and 1st column is a "."
#
# == is an operator to detect whether both sides are equal
# It returns a logical value of TRUE or FALSE.
#
if (substring(ECG[i,1],1,1)==".")
{
# assign the element in ith row and ID column a string, which
# comprise "JHU" and from 7th to 9th characters in the string
# in the element in ith row and 1st column.
#
ECG$ID[i]<-paste("JHU",substring(ECG[i,1],7,9),sep="")
}
else
{
# assign the element in ith row and ID column a string, which
# comprise "JHU" and from 4th to 6th characters in the string
# in the element in ith row and 1st column.
#
ECG$ID[i]<-paste("JHU",substring(ECG[i,1],4,6),sep="")
}
}
}
#
# Recode the -999's
in ECG as NA's
#
# Look through
each row and each column to see if there is a -999 and
# replace it with
NA.
#
# Look through
each row in ECG.
#
for (i in
seq(1,dim(ECG)[1]))
{
#
Look through each column.
for
(j in seq(1,dim(ECG)[2]))
{
# Determine whether the current cell is not a NA.
#
# ! is an operator for "not". For example. "!(1==2)" is
TRUE
if (!is.na(ECG[i,j]))
{
# Determine whether the current cell is -999, if TRUE, then
# assign the current cell a NA.
if (ECG[i,j]==-999)
{
ECG[i,j]<-NA
}
}
}
}
#
# Make all SNP
calls that equal ERROR, UNDETERMINED or - into NA's
#
# Look through
each row and in column 2 to 7, find all the cells that are
# ERROR,
UNDETERMINED or -, replace them with NA's.
#
# create
indicator for all SNP's having been called
#
# Look through
each row
#
for (i in
seq(1,dim(SNP)[1]))
{
#
Look through each element from 2nd column to 7th column.
for
(j in seq(2,7))
{
# Deter whether the current cell is ERROR, UNDETERMINED or -, if TRUE,
# then rewrite it as a NA.
if
((SNP[i,j]=="UNDETERMINED")||(SNP[i,j]=="-")||(SNP[i,j]=="ERROR"))
{
SNP[i,j]<-NA
}
}
}
# Rename the SNP
columns as "ID", "snp1", "snp2",
"snp3", "snp4", "snp5"
# and
"snp6".
#
names(SNP)<-c("ID","snp1","snp2","snp3","snp4","snp5","snp6")
#
# Recode blank
gender as NA
#
# Look through
each row in column Gender in AGE.GENDER dataframe. Replace
# all the blank cells
with NA's.
#
for (i in
seq(1,dim(AGE.GENDER)[1]))
{
#
Determine whether the current cell is blank. If so, then assign it a NA.
if
(AGE.GENDER$Gender[i]=="")
{
AGE.GENDER$Gender[i]<-NA
}
}
#
# Recode RACE as
NA if it is not A, B, W or O
#
# Look through
the column Race in data frame RACE.
#
for (i in
seq(1,dim(RACE)[1]))
{
#
Determine if the current cell is A, B, W or O, if not, assign it a NA.
if
((RACE$Race[i]!="A")&&
(RACE$Race[i]!="B")&&
(RACE$Race[i]!="W")&&
(RACE$Race[i]!="O"))
{
RACE$Race[i]<-NA
}
}
#
# Rename the 1st
column (PID variable) of data frame SNP, AGE.GENDER,
# RACE as ID.
#
names(AGE.GENDER)[1]<-"ID"
names(RACE)[1]<-"ID"
names(SNP)[1]<-"ID"
#
# Rename the IND Study.ID
variable as ID
#
names(
#
#
# Rename the
IMAGE ReynoldsNum variable as ID
#
names(IMAGE)[1]<-"ID"
#
# Clean the
inducibility data so that
# (a) the ID's
don't have trailing -I
# (b) the
phenotype is either yes, no or NA
#
# Note that
IND$Inducible is the variable telling us if we have
# inducible data
(1) or not (0)
#
# Assign L the
number of rows in data frame IND
#
L<-dim(
#
#
#
for (i in
seq(1,L))
{
#
#
fix the ID by extracting the first 6 characters
#
#
Determine if the Inducible variable is a NA
if
(!is.na(
{
# Determine if the Inducible variable is "no" or "yes"
if ((
{
# If the Inducible variable is "no, ", then we change it to
# "no".
if (
{
}
# For all other cases, we assign it a NA.
else
{
}
}
}
}
#
# Create
indicators that just tell us if an ID is in a dataset
#
# Add a column in
SNP named IDIN.SNP.IND with all 1s.
#
SNP$IDIN.SNP.IND<-rep(1,dim(SNP)[1])
#
# Add a column in
ECG named IDIN.ECG.IND with all 1s.
#
ECG$IDIN.ECG.IND<-rep(1,dim(ECG)[1])
#
# Add a column in
AGE.GENDER named IDIN.AGE.GENDER.IND with all 1s.
#
AGE.GENDER$IDIN.AGE.GENDER.IND<-rep(1,dim(AGE.GENDER)[1])
#
# Add a column in
RACE named IDIN.RACE.IND with all 1s.
#
RACE$IDIN.RACE.IND<-rep(1,dim(RACE)[1])
#
# Add a column in
IND named IDIN.IND.IND with all 1s.
#
#
# Add a column in
IMAGE named IDIN.IMAGE.IND with all 1s.
#
IMAGE$IDIN.IMAGE.IND<-rep(1,dim(IMAGE)[1])
#
# Add a column in
IMAGE named IMAGE.IND with all 1s.
#
IMAGE$IMAGE.IND<-rep(1,dim(IMAGE)[1])
#
# Merge ECG and
SNP data frames together by the common column ID, and name
# it d1.
# Extra rows will
be added to the output for each row in x that has no
# matching row in
y. These rows will have NAs in those columns that are
# usually filled
with values from y.
#
d1<-merge(ECG,SNP,by.x="ID",by.y="ID",all=TRUE)
#
# Merge d1 and
AGE.GENDER data frames together by the common column ID,
# and name it d2.
#
d2<-merge(d1,AGE.GENDER,by.x="ID",by.y="ID",all=TRUE)
#
# Merge d2 and
RACE data frames together by the common column ID, and
# name it d3.
#
d3<-merge(d2,RACE,by.x="ID",by.y="ID",all=TRUE)
#
# Merge d3 and
IND data frames together by the common column ID, and
# name it d4.
#
d4<-merge(d3,
#
# Merge d4 and
IMAGE data frames together by the common column ID, and
# name it d5.
#
d5<-merge(d4,IMAGE,by.x="ID",by.y="ID",all=TRUE)
#
# Rename d5 as d.
#
d<-d5
# remove the variable
d1, d2, d3, d4, d5.
rm(d1)
rm(d2)
rm(d3)
rm(d4)
rm(d5)
#
# Create
indicators of data available
#
# If all the
cells in the same row in column snp1, snp2, snp3, snp4, snp5
# snp6 are not
missing value (NA), then assign the indicator TRUE. Otherwise,
# assign
indicator FALSE. Name the indicator SNP.ALL.IND.
#
d$SNP.ALL.IND<-complete.cases(d$snp1,d$snp2,d$snp3,d$snp4,d$snp5,d$snp6)
#
# If the cell in
column QTVI_log is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the
# indicator
ECG.IND.
#
d$ECG.IND<-complete.cases(d$QTVI_log)
#
# If the cell in
column Birth.Year.x is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the indicator
# AGE.IND.
#
d$AGE.IND<-complete.cases(d$Birth.Year.x)
#
# If the cell in
column Gender is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the indicator
# GENDER.IND.
#
d$GENDER.IND<-complete.cases(d$Gender.x)
#
# If the cell in
column Gender is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the indicator
# RACE.IND.
#
d$RACE.IND<-complete.cases(d$Race)
#
# If the cell in
column Inducible is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the indicator
# IND.IND.
#
d$IND.IND<-complete.cases(d$Inducible)
#
# If the cell in
column DEmass is not missing value (NA), then assign
# the indicator
TRUE. Otherwise, assign indicator FALSE. Name the indicator
# IMAGE.IND.
#
d$IMAGE.IND<-complete.cases(d$DEmass)
#
# Filter out the
non-adults
#
#
# Set missing
birth years to zero
#
d$Birth.Year.x[is.na(d$Birth.Year.x)]<-0
d$Birth.Year.y[is.na(d$Birth.Year.y)]<-0
#
# Create a new Birth.Year
variable:
#
if Birth.Year.x is missing, take Birth.Year.y, otherwise take
#
Birth.Year.x
#
d$Birth.Year<-d$Birth.Year.x+d$Birth.Year.y*(d$Birth.Year.x==0)
#
# Keep only those
born before 1995.
#
#
d[d$Birth.Year<=1995,] means all rows in d that have Birth.Year less or
# equal to 1995.
#
d<-d[d$Birth.Year<=1995,]
#
# Convert firing
& implant dates to date format and create an indicator for
# implantation.
#
# If Implant.Date
is not a NA, assign TRUE to IMPLANT.IND. Otherwise, assign
# FALSE.
#
d$IMPLANT.IND<-complete.cases(d$Implant.Date)
#
# Convert Firings
to another date formate (eg. "09/21/2008") and name it
# Firing.Date.
#
d$Firing.Date<-as.Date(d$Firings,format="%m/%d/%Y")
#
# Convert
Implant.Date to another date formate (eg. "09/21/2008") and still
# name it
Implant.Date.
#
d$Implant.Date<-as.Date(d$Implant.Date,format="%m/%d/%Y")
#
# Calculate the
number of TRUEs in column IMPLANT.IND
#
sum(d$IMPLANT.IND)
#
# Calculate the
number of NAs in column IMPLANT.IND
#
sum(!is.na(d$Implant.Date))
#
# Calculate the
days between Firing.Date and Implant.Date, assign it to
# Days.To.Firing
#
d$Days.To.Firing<-d$Firing.Date-d$Implant.Date
#
# Create a
indicator FIRED.IND to show the NAs in column Days.To.Firing.
#
d$FIRED.IND<-!is.na(d$Days.To.Firing)
#
# Compute days to
today, assuming today is March 4, 2008
#
today<-as.Date("
d$Days.Of.Implant<-today-d$Implant.Date
#
# Create a
indicator for AP.vs.IAP. If AP.vs.IAP is "AP", then assign the
# indicator TRUE,
otherwise, assign FALSE.
#
d$APP.FIRED.IND<-(d$AP.vs.IAP=="AP")
#
#
# Write data
frame d to a csv file
#
write.csv(d,file="data.csv",row.names=F)
#
#
# Make data frame
of those for which we have inducibility data
#
dind<-d[d$IND.IND,]
#
#
# Write data
frame "dind" to a csv file
#
write.csv(dind,file="data.ind.csv",row.names=F)
(2) Uncommented Code ( download
here )
ECG<-read.csv("JHUcomb.csv",
as.is=T)
SNP<-read.csv("icd.data.oct.11.2007.csv",as.is=T)
AGE.GENDER<-read.csv("age.gender.csv",as.is=T)
RACE<-read.csv("ReynRaceData-100207.csv",as.is=T)
IMAGE<-read.csv("img.data11.19.07.csv",as.is=T)
N.ECG<-dim(ECG)[1]
ECG$ID<-rep("",N.ECG)
for (i in
seq(1,N.ECG))
{
if
(is.na(ECG[i,1]))
{
ECG$ID[i]<-NA
}
else
{
if (substring(ECG[i,1],1,1)==".")
{
ECG$ID[i]<-paste("JHU",substring(ECG[i,1],7,9),sep="")
}
else
{
ECG$ID[i]<-paste("JHU",substring(ECG[i,1],4,6),sep="")
}
}
}
for (i in
seq(1,dim(ECG)[1]))
{
for
(j in seq(1,dim(ECG)[2]))
{
if (!is.na(ECG[i,j]))
{
if (ECG[i,j]==-999)
{
ECG[i,j]<-NA
}
}
}
}
for (i in
seq(1,dim(SNP)[1]))
{
for
(j in seq(2,7))
{
if
((SNP[i,j]=="UNDETERMINED")||(SNP[i,j]=="-")||(SNP[i,j]=="ERROR"))
{
SNP[i,j]<-NA
}
}
}
names(SNP)<-c("ID","snp1","snp2","snp3","snp4","snp5","snp6")
for (i in
seq(1,dim(AGE.GENDER)[1]))
{
if
(AGE.GENDER$Gender[i]=="")
{
AGE.GENDER$Gender[i]<-NA
}
}
for (i in
seq(1,dim(RACE)[1]))
{
if
((RACE$Race[i]!="A")&&
(RACE$Race[i]!="B")&&
(RACE$Race[i]!="W")&&
(RACE$Race[i]!="O"))
{
RACE$Race[i]<-NA
}
}
names(AGE.GENDER)[1]<-"ID"
names(RACE)[1]<-"ID"
names(SNP)[1]<-"ID"
names(
names(IMAGE)[1]<-"ID"
L<-dim(
for (i in
seq(1,L))
{
if
(!is.na(
{
if ((
{
if (
{
}
else
{
}
}
}
}
SNP$IDIN.SNP.IND<-rep(1,dim(SNP)[1])
ECG$IDIN.ECG.IND<-rep(1,dim(ECG)[1])
AGE.GENDER$IDIN.AGE.GENDER.IND<-rep(1,dim(AGE.GENDER)[1])
RACE$IDIN.RACE.IND<-rep(1,dim(RACE)[1])
IMAGE$IDIN.IMAGE.IND<-rep(1,dim(IMAGE)[1])
IMAGE$IMAGE.IND<-rep(1,dim(IMAGE)[1])
d1<-merge(ECG,SNP,by.x="ID",by.y="ID",all=TRUE)
d2<-merge(d1,AGE.GENDER,by.x="ID",by.y="ID",all=TRUE)
d3<-merge(d2,RACE,by.x="ID",by.y="ID",all=TRUE)
d4<-merge(d3,
d5<-merge(d4,IMAGE,by.x="ID",by.y="ID",all=TRUE)
d<-d5
rm(d1)
rm(d2)
rm(d3)
rm(d4)
rm(d5)
d$SNP.ALL.IND<-complete.cases(d$snp1,d$snp2,d$snp3,d$snp4,d$snp5,d$snp6)
d$ECG.IND<-complete.cases(d$QTVI_log)
d$AGE.IND<-complete.cases(d$Birth.Year.x)
d$GENDER.IND<-complete.cases(d$Gender.x)
d$RACE.IND<-complete.cases(d$Race)
d$IND.IND<-complete.cases(d$Inducible)
d$IMAGE.IND<-complete.cases(d$DEmass)
d$Birth.Year.x[is.na(d$Birth.Year.x)]<-0
d$Birth.Year.y[is.na(d$Birth.Year.y)]<-0
d$Birth.Year<-d$Birth.Year.x+d$Birth.Year.y*(d$Birth.Year.x==0)
d<-d[d$Birth.Year<=1995,]
d$IMPLANT.IND<-complete.cases(d$Implant.Date)
d$Firing.Date<-as.Date(d$Firings,format="%m/%d/%Y")
d$Implant.Date<-as.Date(d$Implant.Date,format="%m/%d/%Y")
sum(d$IMPLANT.IND)
sum(!is.na(d$Implant.Date))
d$Days.To.Firing<-d$Firing.Date-d$Implant.Date
d$FIRED.IND<-!is.na(d$Days.To.Firing)
today<-as.Date("
d$Days.Of.Implant<-today-d$Implant.Date
d$APP.FIRED.IND<-(d$AP.vs.IAP=="AP")
write.csv(d,file="data.csv",row.names=F)
dind<-d[d$IND.IND,]
write.csv(dind,file="data.ind.csv",row.names=F)
d$CLINICAL.IND<-(d$AGE.IND)*(d$RACE.IND)*(d$GENDER.IND)
dind$CLINICAL.IND<-(dind$AGE.IND)*(dind$RACE.IND)*(dind$GENDER.IND)
Bill
Venables and David M. Smith, An
introduction to R. www.r-project.org,
Phil Spector, Introduction
to S & S-PLUS. Springer, 12/23/1993.