CPSY 501 Fall 2010
Data Analysis Project

Introduction

The 501 major project involves quantitative analysis of an existing data in a focused way, including a detailed description of the analysis process. The purpose of the project is to practice and demonstrate what you are learning in this course and, as such, the major portion of your analysis should be a complex multiple regression or kind of ANOVA (i.e., it is not sufficient to only use bivariate chi-square tests, t-tests, and/or correlations). Resorting to non-parametric methods should only be done with instructor permission. The project can be completed in a group of two or three people. Unlike the class assignments, you will submit one single paper for the whole group. It is also possible to do the project individually, if you prefer. You need to identify your topic and obtain a data set as soon as possible, so that you can have your project approved by us in time for you to do the work for it. The project must be approved by the instructor before you proceed. You must also obtain permission from the TWU Research Ethics Board (REB) to conduct a re-analysis of an existing data-set (since your analysis will be different from the original purpose that the data was collected for). You are NOT permitted to collect a new set of data for this project; you will not have the time to recruit human participants, collect and analyse a data set before the project is due.

Selecting a Suitable Dataset

You are expected to obtain an existing data-set for re-analysis. You may use data that you have previously collected, obtain one of the existing data sets that are maintained by the department (with permission from the original owner and his/her supervisor), or obtain data from one of the many public archives available on the Internet. However, the analysis that you perform will ordinarily be new in some way (i.e., simply re-running analyses that have already been performed and reported in a thesis document or research report is usually inappropriate). Extending or correcting previous analyses may be possible, but typically such strategies are too complex for this kind of assignment. Normally, your data set needs to contain at least 50 cases / people. Make sure you confirm the size of your sample when selecting your data set. Exceptions to these requirements must be argued cogently (i.e., with very good reasons). Our course website has some ideas to help you find a dataset. You will need to have at least separate three variables in your analysis (e.g., 2 predictors and 1 outcome variable in a regression; 2 IVs and 1 DV in a factorial ANOVA). However, you may have several more variables to include in your analysis, as long as you have a sufficient sample size. There are four milestones for your project. Each part has a written component that is due by the start of class (9am) on the Friday when it is due. All written components are to be turned in electronically via myCourses.

Data Set Description (5 marks)

Due Fri 1 Oct

After you have formed your group and obtained a set of data, you need submit a brief written description of the data set, including:
  1. Name of the "owner(s)" of the data (and whether you have obtained their permission to use the data)
  2. Sample size
  3. The name of each variable that you may be using, and what it is supposed to represent
  4. Number of missing cases for each variable
  5. Level of measurement for each variable
  6. Means and standard deviations for all continuous / ratio / "scale" variables
  7. Box-plots and histograms for all continuous / ratio / "scale" variables
  8. The frequency of each category of response for all ordinal or nominal / categorical variables (possible variations: with large numbers of discrete values for ordinal variables, you need to present quartiles and boxplots) Note that, if you intend to use part of a larger data set, you only need to describe the variables that you are actually considering for your analysis – "extra" variables can be omitted from the description.
  9. Reference for the data set (theses, published reports, data sets, as appropriate all in APA style)
Your write-up should be formatted as a single document (Word or similar), with the box-plots, histograms, etc. included as figures. For this first assignment, strict APA style is not needed, but the document should be clean and easy to read.

Project Proposal (5 marks)

Due 24hrs before meeting, before Fri 8 Oct

Your entire team must meet with the instructor sometime before the due date. The instructor is available only at certain times, so be sure to book the meeting well before the deadline. All your team members must be present. At least 24 hours before your meeting, submit your project proposal online. The proposal is a short (half-page to 1 page) document describing Come to the meeting with an electronic copy of your dataset (upload to myCourses or bring a USB key) and come prepared to discuss your proposal. It is also helpful to bring printed copies of your dataset description and proposal.

REB Forms (3 marks)

Due Fri 22 Oct

After obtaining instructor's permission to proceed with your project, you need to complete the appropriate ethics approval form from the university's Research Ethics Board (REB). Use the "Request for Ethical Review - Analysis of Existing Data Form" form (not the "Request for Ethical Review" form), available for download on the TWU website at http://www.twu.ca/academics/research/ethics/approval-forms.html Instructions for completing the form:
  1. List one member of your group as the principal investigator (it doesn't matter who) and your instructor as the supervisor
  2. Remove all identifying information (e.g., names, e-mail addresses) from the data set before completing the form.
  3. If your data set was originally collected at another institution, attach copies of the original REB application from that institution, and the certificate of approval. You also need a letter of permission from the current owner of the dataset (for theses, this is usually the thesis supervisor).
  4. If your data set is from a TWU thesis, find and note the REB file number for the original thesis.
  5. If your data set was downloaded from a public archive, provide full descriptions of the archive, links, and statements of ethical research practice.
Consult with your instructor if any parts of the REB form are confusing to fill out, but do so well before the due date. Upload the completed REB form to myCourses before the due date. If you have electronic copies of the supporting documents (e.g., permission letter, original REB from other institution), upload those too. Submit two copies of the completed form, including the signature of whoever will be the Principal Investigator, to your instructor, who will then review the form, sign it, and submit it to the REB office (unless there are problems with it, in which case you'll need to fix them first!). You may not perform your analyses on your dataset until you have received REB approval! In the past this has taken as little as 2-3 weeks when expedited, but it could take as long as 4-6 weeks depending on workload of the TWU REB. Any errors or incompleteness in the form may extend the time! You may not perform any analyses on the dataset until REB approves your project!

Final Project Manuscript (32 marks)

Due Sat 18 Dec, 12:00pm noon

Although the objective of this course is to prepare you for research, you should remember that this paper is not the same as a research journal article. The object of this paper is a detailed treatment of the statistical procedures and results. As a result, you will go into much more detail on the statistical analysis than you would for a typical journal article. The substantive issues that you are dealing with in CPSY are less important here: we only need enough details to know what kind of variables you have and how they fit within the existing theory. Unless otherwise specified, all sections of the paper should conform to APA manuscript format (see chapter 1 of the APA publication manual for an overview). Use the following structure, including the headings as listed here, to practice your APA style:

Title Page:

Proper APA title page with names, affiliations, running head, etc. (see pp. 296-297 of the APA publication manual for details).

Abstract Page:

A brief abstract, of no more than 250 words (see p. 298 of the APA manual).

Introduction:

Should be brief and to the point; much shorter than what is typically found in an article or manuscript. Begin with a selective conceptual overview of the topic and select a conceptual model or research on it (or, if theory is sparse, give reasoning about why this is an important topic to study). You will generally have access to some background literature in the reports you have reviewed (theses or publications). Then relate the model or study to your specific research questions: your research questions should follow clearly from your conceptual overview. At the end of the introduction, clearly state the questions you will be asking in your study (in non-statistical terms).

Method and Data Set:

Describe your participants (age, gender, other demographic information) and where they were recruited from, if known. Briefly describe data collection procedures, if known, including any experimental manipulation (e.g., randomization, different treatment conditions, etc.) that was applied. Next, describe each of your variables: For your outcome variable / DV, provide the means and standard deviation (if you are comparing groups, also provide that information for each comparison group). Note: make sure you use your final, "cleaned-up" data set to obtain this information. For all predictors / IVs, describe how were they operationally defined, and what were they intended to measure. If a standardized test was used, report the established reliability and validity of the test (for this assignment, you do not have to calculate the reliability that you obtained in your sample). Also, if you are using someone else's data, you must acknowledge your data source in full (i.e., data in this study were originally collected by ____ for a study on ____, and used with his/her permission). Also, describe the actual analytical procedure that you used (providing sufficient information to allow readers to replicate your study), and explain why you chose that procedure, (i.e., what makes it a better choice for answering your research question than other possible procedures).

Preliminary Analyses and Results:

In this section, you will go into much more detail than you would normally find in a published manuscript (remember, the whole point of the project is to show us what you have learned and so you illustrate details not included in journals). First, describe how you explored and "cleaned up" your data. Identify the amount of missing data in the data set, and how you checked for systematic patterns of missing data. Describe how you checked for potential outliers, identify which cases may be potential outliers, and explain how you dealt with them. Describe all the assumptions that your chosen analytical procedure makes, and explain how you examined whether each one was met. For every test assumption that was violated, describe (a) the implications for interpretation of data and (b) what steps (if any) can be taken to correct the procedure. If you have violations of assumptions of normality, make every effort to correct the problem, whether by recoding/categorizing the variable, adjusting your set of predictors, applying an arithmetic transform to the variable, or researching other methods. Relying on non-parametric methods should only be a last resort, and with instructor permission. Different non-parametric methods also have their own assumptions; you need to justify those as well. After running the final analyses, describe the results of your analytical process, including any post hoc exploration that was done. Remember to assess and report effect sizes (not just significance tests). Also calculate and report the power that you obtained in your analysis (obtained, or "observed" power). Report the results in both words and statistical notation in accord with APA style (e.g., "The results demonstrated that first year students are more familiar with APA format than third-year students, but the effect was only moderate in size, F(1,232) = 24.56, p = .003, η2 = .09"). Please note that it is not a show-stopper to have non-significant results, as long as you have properly followed all the preceding steps. But the ultimate objective is to understand the dataset and explore its structure, relationship between variables, etc. -- not just a "yes/no" answer to the RQs.

Discussion:

Describe how your results answer (or don't answer) the questions that you raised in the introduction. Also discuss the limitations of your research. In particular, focus on any statistical / analytical limitations that you have. Unlike a regular manuscript, do not talk about the future directions or the broader implications of your study.

References:

List all the references that you cited in your paper, using proper APA style. Please note that the style varies according to whether the citation is an article, book, book chapter, or electronic document. (see pp. 215-283 of the APA publication manual for details).

Tables and Figures:

You are required to include at least one table or figure in your manuscript (I suggest either a summary of your main statistical analysis, a graph showing the direction or strength of your effect, or at least a summary of your participant demographic characteristics). Remember to refer to it in your written text. See pp. 147-200 of the APA publication manual for details on formatting tables and figures.

SPSS Output:

In addition to uploading your final manuscript (as a Word .doc or similar), also upload a cleaned, well-labelled SPSS output file (.spv) with only the relevant plots and analyses used in your paper. Please ensure there is no scratch work or duplicate runs in the output! Label and number the different parts of the output, to make it easy to match each output with your discussion in the methods and results sections. You can edit the output file to insert text boxes as labels. The maximum length for your paper (excluding the SPSS output) is 15 pages. Quality is much more important than quality; if you can do a thorough and clear job describing your analysis in less than 15 pages, all the better.