In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly.
Published in | American Journal of Theoretical and Applied Statistics (Volume 13, Issue 6) |
DOI | 10.11648/j.ajtas.20241306.12 |
Page(s) | 193-202 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2024. Published by Science Publishing Group |
Regression Analysis, Least Squares, Robust Regression, Outliers, Simulation
[1] | Huber, Peter J. "Robust estimation of a location parameter." In Breakthroughs in statistics: Methodology and distribution, pp. 492-518. New York, NY: Springer New York, 1992. |
[2] | Hampel, Frank R. "The influence curve and its role in robust estimation." Journal of the American Statistical Association 69, no. 346 (1974): 383-393. |
[3] | Rousseeuw, Peter J. "Least median of squares regression." Journal of the American Statistical Association 79, no. 388 (1984): 871-880. |
[4] | Yohai, Victor J. "High breakdown-point and high efficiency robust estimates for regression." The Annals of Statistics (1987): 642-656. |
[5] | Maronna, Ricardo A., R. Douglas Martin, Victor J. Yohai, and Matías Salibián-Barrera. Robust statistics: theory and methods (with R). John Wiley & Sons, 2019. |
[6] | Schumacker, R. E., Monahan, M. P., and Mount, R. E. (2002). A comparison of OLS and robust regression using S-PLUS. Multiple Linear Regression Viewpoints, 28(2), 10-13. |
[7] | Ellis, S., and Morgenthaler, S. (1992). Leverage and Breakdown in L1 Regression. Journal of the American Statistical Association, 87(417), 143-148. |
[8] | Davies, P. (1993). Aspects of Robust Linear Regression. The Annals of Statistics, 21(4), 1843-1899. Retrieved April 24, 2020, from |
[9] | Rousseeuw, P. J., and Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John Wiley & Sons. |
[10] | Lai, P., and Lee, S. (2005). An Overview of Asymptotic Properties of Lp Regression under General Classes of Error Distributions. Journal of the American Statistical Association, 100(470), 446-458. Retrieved April 24, 2020, from |
[11] | Lai, P., and Lee, S. (2008). Ratewise Efficient Estimation Of Regression Coefficients Based On Lp Procedures. Statistica Sinica, 18(4), 1619-1640. Retrieved April 24, 2020, from |
[12] | Bouaziz, S., Tagliasacchi, A., and Pauly, M. (2013, August). Sparse iterative closest point. In Computer graphics forum (Vol. 32, No. 5, pp. 113-123). Oxford, UK: Blackwell Publishing Ltd. |
[13] | Hasselman, Berend (2018). nleqslv: Solve Systems of Nonlinear Equations. R package version 3.3.2. |
[14] | Fox, John, and Sanford Weisberg. An R companion to applied regression. Sage publications, 2018. |
[15] | Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2. |
[16] | Cont, Rama. "Empirical properties of asset returns: stylized facts and statistical issues." Quantitative finance 1, no. 2 (2001): 223. |
[17] | Hoek, Gerard, Bert Brunekreef, Sandra Goldbohm, Paul Fischer, and Piet A. van den Brandt. "Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study." The lancet 360, no. 9341 (2002): 1203-1209. |
[18] | Stijnen, Theo, Taye H. Hamza, and Pinar Özdemir. "Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data." Statistics in medicine 29, no. 29 (2010): 3046-3067. |
[19] | Cutler, Winnifred, James Kolter, Catherine Chambliss, Heather O’Neill, and Hugo M. Montesinos-Yufa. "Long term absence of invasive breast cancer diagnosis in 2,402,672 pre and postmenopausal women: A systematic review and meta-analysis." Plos one 15, no. 9 (2020): e0237925. |
[20] | Montesinos-Yufa, Hugo Moises, and Emily Musgrove. "A Sentiment Analysis of News Articles Published Before and During the COVID-19 Pandemic." International Journal on Data Science and Technology 10, no. 2 (2024): 38-44. |
[21] | Montesinos-Yufa, H. M., Nagasuru-McKeever, T. (2024). Gender-Specific Mental Health Outcomes in Central America: A Natural Experiment. International Journal on Data Science and Technology, 10(3), 45-50. |
[22] | Coleman, E., Innocent, J., Kircher, S., Montesinos-Yufa, H. M., Trauger, M. (2024). A Pandemic of Mental Health: Evidence from the U. S. International Journal of Data Science and Analysis, 10(4), 77-85. |
APA Style
Hoffman, K., Montesinos-Yufa, H. M. (2024). Assessing the Quality of Ordinary Least Squares in General Lp Spaces. American Journal of Theoretical and Applied Statistics, 13(6), 193-202. https://doi.org/10.11648/j.ajtas.20241306.12
ACS Style
Hoffman, K.; Montesinos-Yufa, H. M. Assessing the Quality of Ordinary Least Squares in General Lp Spaces. Am. J. Theor. Appl. Stat. 2024, 13(6), 193-202. doi: 10.11648/j.ajtas.20241306.12
AMA Style
Hoffman K, Montesinos-Yufa HM. Assessing the Quality of Ordinary Least Squares in General Lp Spaces. Am J Theor Appl Stat. 2024;13(6):193-202. doi: 10.11648/j.ajtas.20241306.12
@article{10.11648/j.ajtas.20241306.12, author = {Kevin Hoffman and Hugo Moises Montesinos-Yufa}, title = {Assessing the Quality of Ordinary Least Squares in General Lp Spaces }, journal = {American Journal of Theoretical and Applied Statistics}, volume = {13}, number = {6}, pages = {193-202}, doi = {10.11648/j.ajtas.20241306.12}, url = {https://doi.org/10.11648/j.ajtas.20241306.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20241306.12}, abstract = {In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly. }, year = {2024} }
TY - JOUR T1 - Assessing the Quality of Ordinary Least Squares in General Lp Spaces AU - Kevin Hoffman AU - Hugo Moises Montesinos-Yufa Y1 - 2024/11/18 PY - 2024 N1 - https://doi.org/10.11648/j.ajtas.20241306.12 DO - 10.11648/j.ajtas.20241306.12 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 193 EP - 202 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20241306.12 AB - In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly. VL - 13 IS - 6 ER -