Group challenge 1
  Spatial Statistics - BIOSTAT 696/896
Michele Peruzzi 
        
            University of Michigan
          
     
 
Groups?
- Have some groups been formed?
- Troubles?
- I have enabled Piazza if you need to find group partners
- You could e.g. post your background (MS/PhD/what year…) and look for complementary group members
Getting started
- This group challenge is about getting things started with point-referenced data
- The goal of this group activity is to make the first steps towards your final group project
- Some of the points I include can only be completed if you have your own data
- For these reasons, although not required, I very highly recommend that you begin searching for data for your final project
Point-referenced data requirements
- Spatial coordinates (latitude and longitude, or equivalent depending on the domain)
- Not necessarily environmental/ecology/satellite
- Must have a continuous variable that varies in space
- If you do preprocessing (eg data transformations), must document everything
- Check in with me before submitting your answers
Checklist for datasets, part 1
- Source of the data. Link, citation.
- Overall data description (number of observations, number of variables)
- Are there missing data? Is there a pattern or an intuitive reason for the missingness?
- What is in the dataset? Describe each column. Summary statistics. Correlation analysis (non-spatial is ok). Preliminary data analysis using methods that you know
- Make summary figures based on non-spatial descriptive statistics
- Describe the spatial component of the data: what is the spatial domain, its dimension, how are observations indexed in space? Do you think modeling covariance as decaying with distance is appropriate?
- Visualize the empirical covariogram for the variable of interest. Does the variogram suggest spatial dependence in the data?
- Map the data. Map the spatial variables, and write short but meaningful captions for your figures.
Checklist for datasets, part 2
- How do you imagine spatial dependence may play a role between the variables in the dataset?
- What are some research questions that these data could help answer (at least one for each dataset)?
- Write an intuitive directed acyclic graphical model outlining the important variables, the parameters, and how they could be related to each other (no need for other assumptions – keep it simple)
- What are the potential results that you may anticipate?
- What are some potential pitfalls or shortcomings of your model?
- What additional data could be useful for the purpose of your analysis?
Tips
- Keep it simple!
- You can start thinking about your final project format now. Slides/poster
- You can submit your answers as a slide set or in poster format
- If you choose poster: OK to leave lots of empty space now, or to cut later
- Submit your code too!
- Preferred formats: .qmdfile that compiles into a.pdfdocument. Submit both as.ziparchive
Can’t find data?
- You can use the purpleair.csvdataset for point-referenced data
- Warning! If more groups use the same data, I will have to evaluate by comparing