Abstract Background Theoretically, increased levels of physical activity self-efficacy (PASE) should lead to increased physical activity, but few studies have reported this effect among youth. This failure may be at least partially attributable to measurement limitations. In this study, Item Response Modeling (IRM) was used to develop new physical activity and sedentary behavior change self-efficacy scales. The validity of the new scales was compared with accelerometer assessments of physical activity and sedentary behavior. Methods New PASE and sedentary behavior change (TV viewing, computer video game use, and telephone use) self-efficacy items were developed. The scales were completed by 714, 6th grade students in seven US cities. A limited number of participants (83) also wore an accelerometer for five days and provided at least 3 full days of complete data. The new scales were analyzed using Classical Test Theory (CTT) and IRM; a reduced set of items was produced with IRM and correlated with accelerometer counts per minute and minutes of sedentary, light and moderate to vigorous activity per day after school. Results The PASE items discriminated between high and low levels of PASE. Full and reduced scales were weakly correlated (r = 0.18) with accelerometer counts per minute after school for boys, with comparable associations for girls. Weaker correlations were observed between PASE and minutes of moderate to vigorous activity (r = 0.09 – 0.11). The uni-dimensionality of the sedentary scales was established by both exploratory factor analysis and the fit of items to the underlying variable and reliability was assessed across the length of the underlying variable with some limitations. The reduced sedentary behavior scales had poor reliability. The full scales were moderately correlated with light intensity physical activity after school (r = 0.17 to 0.33) and sedentary behavior (r = -0.29 to -0.12) among the boys, but not for girls. Conclusion New physical activity and sedentary behavior change self-efficacy scales have fewer items than classical test theory derived alternatives and have reasonable validity for boys, but more work is needed to develop comparable scales for girls. Fitting the items to a underlying variable could be useful in tailoring interventions to this scale.