<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Statistical Consulting Center Forums - Design, Sample-Size, and Power</title>
        <description>Forum on design, sample-size, and power analysis.</description>
        <link>http://forums.stat.ucla.edu/list.php?11</link>
        <lastBuildDate>Mon, 23 Nov 2009 22:49:14 -0800</lastBuildDate>
        <generator>Phorum 5.2.10</generator>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,135,135#msg-135</guid>
            <title>Sample size (3 replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,135,135#msg-135</link>
            <description><![CDATA[ I have a question regarding sample size and generalizability to the greater population.  I am looking at a study of motion picture data.  The sample was taken at 35 different movie theaters in 25 different cities from opening night box office information for a particular movie. Surveys were handed out at these 35 theaters, from which 607 total responses were culled. I am trying to figure out if the information received from this sample is generalizable to the population as a whole, or if I can use it as a guide, but it isn't necessarily representative of the whole population.  For example, the returned sample says that 13% of respondents were females between the ages of 18-24.  Can I extract from this to assume that 13% of the moviegoers to this particular film were females between 18-24?  Is the sample size statistically significant?  And in general, how many samples would be necessary in order to generalize out to the population as a whole?  is there other information I'm even missing??  (:<br />
<br />
Thanks so much,<br />
<br />
Dave]]></description>
            <dc:creator>DaveK</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Mon, 19 Oct 2009 14:07:32 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,121,121#msg-121</guid>
            <title>Variable picking for multiple regression (13 replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,121,121#msg-121</link>
            <description><![CDATA[ I have two quick questions about setting up variables for a regression analysis.  First- I currently have a list of about 30 potential variables that I think are possible variables in a regression analysis I am running. I need to winnow this down before I start actual testing because otherwise it would be too unwieldy (as well as probably overfitting, although my sample size is over 5,000 responses-- would this be overfitting??). I have maybe five possible independent variables in there that are nominal categories with 20 to over 100 different classifications.  What is the best way to start testing- should I do an individual univariate test first for each independent variable, to help winnow some variables out?  Or should I use dummy variables for several of my nominal categories and do a multiple regression in the beginning? This seems unwieldy as, like I said, certain independent variables have 100 different nominal classifications...<br />
<br />
   Secondly- I don't want to overfit my model by throwing in too many unnecessary independent variables to test for. But how can you take out any potential variables (even after having possibly done a univariate test and found no strict correlation to the dependent response variable), without knowing of any potential interaction effects that could arise among variables being tested?  One variable might be tested with a basic univariate test and thrown out as having no correlation to my response variable, but that variable WITH another one might have been important!  <br />
<br />
Thank you so much for your help...<br />
<br />
-MP]]></description>
            <dc:creator>MarcoP</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Thu, 15 Oct 2009 11:13:50 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,81,81#msg-81</guid>
            <title>Interpreting standardized test scores (9 replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,81,81#msg-81</link>
            <description><![CDATA[ I have been trying to understand the reports from my state's (AK) reporting of high school standardized test scores.  While I teach MBA statistics, I do not have a background in psychometrics.  I have been particularly confused by the meaning of reported confidence intervals for individual student test performance.  Upon my questioning, I received an explanation from the consulting firm that designs and analyzes the state tests that psychometricians interpret standard errors differently than statisticians.  The latter estimate standard errors from sample data while the former estimate standard errors based on model assumptions.  The test in question is a Rasch test.<br />
<br />
I've attemtped to read some documentation on Rasch tests - the jargon is overwhelming and I cannot afford the time to become a psychometrician myself (nor do I care to do that, from what I have seen).  I am looking for an intuitive explanation of how confidence intervals for individuals' test scores can be derived and what such confidence intervals mean.  Given that repeated tests are not administered to any sample of students - a straightforward way of obtaining a confidence interval for the variation in test scores from one test to another - how is a confidence interval for an individual obtained from a single test administration?<br />
<br />
The interpretation that they offer for the confidence interval is that &quot;If [student name] were to take a similar test multiple times, the range of these scores would fall between xxx and yyy 80% of the time.&quot;<br />
<br />
Thanks for any assistance you can provide.]]></description>
            <dc:creator>dlehman</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Thu, 17 Sep 2009 17:07:22 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,55,55#msg-55</guid>
            <title>Validation for a rubric type questionaire (1 reply)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,55,55#msg-55</link>
            <description><![CDATA[ I wonder if we can test the validity via statistics for a rubric type instrument with only nine items and for each item has 5 descriptors (ordinal scale). My sample is only less than 30. <br />
 <br />
Thank you in advance for your inputs.<br />
 <br />
Eins]]></description>
            <dc:creator>Eins</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Mon, 15 Jun 2009 08:50:13 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,47,47#msg-47</guid>
            <title>Dealing with Left-Censored Data (1 reply)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,47,47#msg-47</link>
            <description><![CDATA[ I am doing an analysis on the amount of certain types of dioxins present in a sample(n=65) and I want to compare them to a baseline like NHANES 2003-04. So to compare them, we would use the Wilcoxon Rank-Sum Test. <br />
  The problem we have is that the data are left-censored, that is, for some samples the amount detected in their blood was below the detection level(DL) of the machine. The literature is inconsistent on how to fill-in these nondetects(ND), some use ND=DL/2 or ND=0, while others will use imputation for these values. Since our samples do not have the same DL for each dioxin, we should use the generalized Wilcoxon Rank-Sum or the Gehan's statistic. How can we do these tests in R?<br />
   We believe using imputation is incorrect, because we have a low sample-size and choosing a distribution seems impractical with so few observations compared to our baseline NHANES, which has 600 or so observations. Do you have any suggestions for the analysis?]]></description>
            <dc:creator>hilbertspaces</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Sat, 09 May 2009 10:28:15 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,38,38#msg-38</guid>
            <title>Taguchi multivariate experiment design (1 reply)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,38,38#msg-38</link>
            <description><![CDATA[ I'm designing an experiment based on the Genichi Taguchi methodology which is in essence a fractional factorial design.  Basically I have 5 factors each at two levels of which at least two factors are inter-related.  I have created the necessary L8 orthogonal array and designated column 1-5 for factors 1-5 and column 6 for the interaction between factor 1 and 2. <br />
<br />
The data:<br />
<br />
All variables are dichotomous including the criterion.<br />
The criterion mean is estimated to be around 0.0157.<br />
<br />
Expectations:<br />
<br />
We are looking to detect a change to that mean as low as 5% with a sample size of 1200000 and be able to calculate each factors contribution to the change along with identifying the optimum configuration of factors and levels and estimating the results at specific confidence level intervals. <br />
<br />
<br />
This is my question:<br />
<br />
What would be the best type of analysis for the experiment results?<br />
<br />
a) ANOVA<br />
b) General Multivariate Linear Regression<br />
c) Logistical Regression<br />
<br />
Should I be considering any other type of analysis?<br />
<br />
<br />
Thanks]]></description>
            <dc:creator>bmillares</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Tue, 21 Apr 2009 13:08:58 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,32,32#msg-32</guid>
            <title>Repeated measures categorical data analysis (6 replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,32,32#msg-32</link>
            <description><![CDATA[ Sorry if this is a simple question but I have designed a study in which I gathered some incidental and opportunistic data, without really thinking properly as to how I could analyse it. I am now stuck.<br />
<br />
32 participants (16 in two groups) answered a question. Subsequent to answering this question, participants indicated whether their answer was based on a guess, a feeling or a memory - so a categorical variable with 3 levels. Each participant provided 22 of these categorical ratings. I thought initially, I could analyse the data by converting it into relative proportions and use a mixed analysis of variance with 1 between and 1 within. However, of course, the observations are not independent of one another. Moreover, they average out to 33.33% across the within-subject. It doesn't seem like a chi-square test is a good idea though, as each of the 22 for each subject would be more related to one another than any of the other observations. What analysis can I use - or is it only possible to explore this data?]]></description>
            <dc:creator>jb421</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Sat, 26 Sep 2009 12:55:32 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,31,31#msg-31</guid>
            <title>Comparing Rates (2 replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,31,31#msg-31</link>
            <description><![CDATA[ I am working with a lot of data that are in the form of rates. They are not proportions because they can exceed 1.<br />
<br />
In one case, the rates are cost/effectiveness ratios such as REVENUE over EXPENSE, and these are typically much greater than 1, perhaps even in the thousands.<br />
<br />
To get to my point, the most general case I have is FOLLOWERS over FOLLOWING which, as a rate, follows more of a power law with lots of between 0 and 1, few greater than that.<br />
<br />
Is there a general method (probably non-parametric) I can use to compare two rates? In the cases where the rates are computed from samples, the samples are typically pretty small (10-20).]]></description>
            <dc:creator>Ryan Rosario</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Thu, 09 Apr 2009 09:33:24 -0700</pubDate>
        </item>
        <item>
            <guid>http://forums.stat.ucla.edu/read.php?11,18,18#msg-18</guid>
            <title>First post (no replies)</title>
            <link>http://forums.stat.ucla.edu/read.php?11,18,18#msg-18</link>
            <description><![CDATA[ Welcome to the SCC's Design, Sample-Size, and Power forum. If you are subscribed to the feed or are a moderator you should receive notice of this posting.]]></description>
            <dc:creator>Jose Hales-Garcia</dc:creator>
            <category>Design, Sample-Size, and Power</category>
            <pubDate>Tue, 24 Mar 2009 08:31:18 -0700</pubDate>
        </item>
    </channel>
</rss>
