I fit a linear regression: Y=a + b(x)

Then I get the value a (intercept) and b (slope)

I want to solve the value of x, given Y=acp;

So I solve the equation x=(acp-a)/b;

The var0=stderr_intercept**2;

var1=stderr_x**2;

est_stderr=sqrt(var1*(acp**2)/(slope**4)+var0/(slope**2)+var1*(int**2)/(slope**4) );

L95_log=Y-TINV(0.975, df-1)*est_log_stderr;

U95_log=Y+TINV(0.975, df-1)*est_log_stderr;

Now my regression model changed, because the previous model is lack of fit

I want to fit quadratic regression: Y= a(x^2) + b (x) +C

I know that when Y=0, x=(-b+/-sqrt(b2-4ac))/(2a)

(Or go to link: [www.purplemath.com])

So I run a quadratic regression, and get intercept, b (slope1), a (slope2)

And how to get X value when Y=acp

And how to get the estimated standerr to calculate confidence interval?

Thank you.]]>

var0=stderr_intercept**2;

var1=stderr_log_conc**2;

acp=log(mn); mn is a constant value

Assay Sensitivity=(acp-intercept)/slope;

est_log_stderr=sqrt(var1*(acp**2)/(slope**4)+var0/(slope**2)+var1*(int**2)/(slope**4) );

L95_log=AS-TINV(0.975, df-1)*est_log_stderr;

U95_log=AS+TINV(0.975, df-1)*est_log_stderr;

But this model is lack of fit;

Then I use Log(Y)=log(x)+log(x)^2 (A quadratic regression)

In this case, how would the Assay Sensitivity be calculated, since there are two slopes, one for log(x) and another for log(x)^2

And also, how the stderr be estimated so I can calculate the CI for Assay Sensitivity? Thanks.]]>

I am doing a Cox regression analysis on 300 patientes with up to five years of follow-up. The have 28 events.

In univariate analysis one biomarker (BIO) is time-dependet (assessed with and without time-dependent variable).

I would like to present hazard ratio for subjects at one and three years of follow-up with BIO=1 mg/L. I use SPSS 17.0. How do I calculate the 95% confidence interval for the hazard ratio? Yours Mikke

variables

bio beta koeff 1,143 S.E.M 0,308

bio*time beta koeff -,582 S.E.M 0,192

HR At one year: EXP(1*1,143+1*(-0,582))=1,75.

HR At three years: EXP (1*1,143+3*(-0,582))=0,55.]]>

Thanks,

Bob]]>

I'm trying to determine if a set of predictors can discriminate patients from the controls. Dependent variable: Patients or healthy controls

I used a sample of 9000 (2700 patients and 6300 controls) as the discovery datase to figure out what are good predictors. I used Logistic regression analysis trying to identify good predictors and have located 350 predictors that are significant at p=0.01.

Then I use these 350 predictors to run logistic regression on an independent dataset (1600 patients and 1800 controls) ad to get an estimate of r-square and calculate probability of being a patient in order to determine the performance of these 350 predictors on an independent dataset.

Of course, the r-square dropped from 0.75 (in the discovery dataset) to 0.27 (in the validating dataset) because of some factors that caused the reduction.

My questions is that is this analysis design OK? Is there any concern that I should not use this approach. Thank you very much.

Joanne]]>

I'm a grad student, working on behaviour of zoo orangutans and how visitors affect them. Am having problems with my analysis as they are repeated measurements over time, and there are categorical dependent variables, lots of issues to consider. I have been reading till my head is spinning, and have tried to contact many statisticians both locally and overseas, but I'm either asking people in the wrong field, or they are too busy to advise!

I came across this stats forum, and was really happy to see that the people here have experience in categorical multivariate stats. hopefully someone can advise on what I am asking =)

WIll describe my project as briefly as I can:

I'm intersted in seeing if zoo visitors affect the behaviour of orangutans in my local (Singapore) zoo. From previous captive animal studies, things like

FOr my independent (visitor) variables --> 2 continuous and 1 categorical predictors

Dependent (orangutan) variables --> 1 or maybe 2 categorical variables

Data was collected on the dot, once every 5 minutes, from the same group of orangutans (4-5 individuals). Data from the visitors was also collected at the same time. I have repeated observations over 1-3 hour sessions, a few sessions a day, over a period spanning 4 months. i.e. about 200hours x 12 points per hour. This 200 hours is divided among 2 different groups of orangutans, who are alternated in 2 different exhibits (so 4 different permutations).

1) I was thinking of running everything in a multinomial logistic regression analysis (to avoid collapsing my data and also for simultaneous consideration of all factors). I understand that logistic regression needs independent data, however, I do not want to generalize any results to zoo orangutans, I just wish to investigate if there is any relationship between visitors and orangutans from this particular zoo exhibit. Hence, would it make sense to use multinomial log reg if i put orangutan ID as a factor?

2) Other consideration: behaviour of individuals may be non-independent over time.

If i calculate that the average time taken for individuals to change their activity is less than 10min, and i subsample my data at 10min intervals, might that be enough to overcome possible temporal independence? As it seems very complicated to test for autocorrelation in categorical data

3) I understand that Generalized Estimating equations can handle correlated data. However the outcome has to be binary or count. Unless I run separate analysis for each behaviour, I'm not sure if this will work. Also, the coding of time as a within-subjects variable I'm unsure about, as the same time points are repeated over different days.

Hope my explanation is sufficient, and not too long, I look forward to any advice on this dataset!

Summary of my study design can be found at

http://docs.google.com/Doc?docid=0AZCZA37PzM97ZGRncWduNF8xZzNkNmcyaGg&hl=en]]>

This is a hypothetical example, however my study problem is of a similar nature.

I want to investigate the effects of a combination of categorical variables on performance in a test. Hence I wish to simultaneously investigate combination of factors all naturally categorical or made into a categorical variable - like sex (males/females) x gre scores (high/low) x color of hair (black/blonde) on the ABC test performance (taken aggregate of 2 similar tests - say test X and test Y, and hence I am considering the independent variable to be an average score of X and Y and call it ABC and code it in terms of good/impaired performance).

So for instance one potential qt I would ask is - How would being a male, with black eyes and a high gre score predict my performance on the ABC test?

Can I use a logistic regression? If yes then which one? What should I be careful about..would the combination of dependent variables affect my results?

Thanks in advance,

Diana]]>

I am planning to use Cox regression analysis with proportional hazard model.

I am using SPSS 17.0

My question is; How many condfunders can I adjust for. I have 20 event until follow-up

Yours,

Mikke]]>

I have two boxes, each containing 25 widgets. I would like to know if the widgets are equivalent.

For each widget there are measures along several dimensions, let's just say height, width, length, weight, and internal diameter for discussion of the problem.

So for each dimension I have 25 measures from box A and 25 measures from box B.

I could test each dimension individually. Get the mean and standard deviation for width for box a, get the mean and standard deviation for width for box b and do a t-test to see if the mean's are significantly difference. But I see a problem with this approach if not all the t-test gives the same result. Any thoughts on a better approach?]]>

I have frequencies from two nominal factors (A ,B ) with 2 levels. So I've got a 2 x 2 cross table. I could run a chi-square to test independence but I have this table for every subject. I'm not sure if I have to use a log-linear model with SUBJECTS,A,B as factors. And if so do I have to mark that A and B are repeated measures?

Many thanks in advance for your help.

RenĂ©]]>

I'm trying a simple GEE model (gamma log link) in SPSS for the first time and wanted to build a prediction equation with 95% confidence interval for a continuous dependent variable. Using 7 years (7 repeated measures in each of N geographic areas) to build the model to predict EMS calls (911 call volume) in year 8 for each geographic area. Independent variables are a population estimate and a fixed effect coded 1 or 0. Not sure how to build the prediction equation from the output:

parameter B STDERR 95% Wald CI(lb,ub) Exp( B ) 95% Wald CI for Exp( B ) (lb,ub)

intercept 6.619 .2050 6.218 7.021 749.559 501.563 1120.177

population 3.1E-5 3.4E-6 2.4E-5 3.7E-5 1 1 1

fixed(0) 0.320 .1315 -.578 -.062 .726 .561 .940

fixed(1) 0

scale .208

Appreciate any help...

thanks, Carol]]>

Hello there,

I have a question about spatial regression modeling that I'm hoping you can

help me with.

As background, there are a lot of studies that show that states lower the

amount of money they give poor families (benefit levels) to keep them in

line with those of surrounding jurisdictions. They do so to avoid drawing

welfare recipients from surrounding states. To test this idea, they usually

run ordinary least squares regression with states as the unit of analysis

and the benefit level for a family of 3 as the outcome variable. The models

include various state economic, political, and demographic characteristics.

To test for the influence of surrounding states, they include a predictor

variable that is the mean benefit level of neighboring states.

Is there a difference between this model and a regression model with a

spatially lagged dependent variable like this one?:

yi= B0 + B1xi + pwiyi +ei

where wiyi is the lag in which the values of states assigned as neighbors in

the connectivity vector (wi) are modeled as a predictor variable?

Also, if the spatial lag is significant, is it correct to say that this

model shows that a state's benefit level is related to the benefit levels of

surrounding state *even after adjusting for any similarities in economic,

political, and demographic similarities*? The idea here is that neighboring

states may have similar benefit levels simply because they are more similar

to one another than states that are are far away.

Thanks for your help. Have a good Thanksgiving!

Amanda]]>

They may choose from five Likert-style responses:

'Strongly Disagree'

'Disagree'

'Neutral'

'Strongly Agree'

'Agree'

A colleague suggested dichotomizing the responses - i.e. 'Strongly Disagree' and 'Disagree' combined into '0/no', and 'Strongly Agree' and 'Agree' combined into '1/yes' and then using logistic to regress the response about barriers to diagnosis on risk factors (family history, childhood health issues, and current (adult) health issues).

I am here for a second opinion.

I also am unsure of what to do with the 'Neutral' responses when dichotimizing.

Thank you!]]>

I have 2 types of dependent variables: binary (yes/no) and ordinal (5 likert categories). And 4 types predictors: binary, continuous (age), ordinal (education) and interval (income).

I would like to analyse each outcome against the whole set of predictors, preferable with SPSS. I assume that I have to use Binary Logistic Regression for the binary outcome; and Ordinal Logistic Regression for the ordinal outcome. However, I don't know how to deal with the ordinal & interval predictors in both mentioned methods. All the books i have read only show to use the Logistic Regression with categorial predictor (using dummy variable), without considering the rank/order of the categorial predictor.

Many thanks in advance for your help.]]>

The Training dataset is about 2500 cases and 6500 controls. 700 variables were selected comparing case vs control group at p<0.001.

The independent dataset for validating the result is 500 cases and 500 controls.

So I use Logistic Regression on the training dataset, and selected 350 predicting variables (out from the 700 significant variables) using stepwise selection, cut-off at p=0.05

Now I'd like to apply these 350 variables on the 500 case/500 control dataset; and see how good it can do the prediction of an individual being a patient. But it is small and 350 variables will be overfitting using Logistic Regression. I still apply Logistic Regression to do stepwise selection, using p=0.05 (35 variables were selected) and p=0.10 (60 variables were selected), p=0.15 (76 variables were selected) and p=0.20 (118 variables were selected); the Hosmer & Lemeshow Goodness of fit for all four selections, p>0.05.

I'm not sure if this is a good approach, or do you have any suggestion in terms of how to use the independent dataset to do validation from the 350 variables selected from the training dataset? Thank you very much.]]>