regression-tests.csv
bootstrap-accuracy
(bootstrap-accuracy path values-path do-resampling)
Runs bootstrapping to estimate accuracy. path: A path to the source csv file with the original sample. do-resampling: Method to get random resampling with replacement. values-path: A path to the csv file with values from another sample. returns: A hash-map with csv file names and content. Accuracy (single value): pho_j=|y~_j-y^_j|, y^_j=X'_j*b. y~_j: [N x 1] vector of the true observed values (in [values-path]). y^_j: [N x 1] vector of the fitted values from the model. X'_j: [N x (p+1)] matrix of the explanatory variables (in [values-path]). N: number of observations (in [values-path]). Accuracy (in subset): pho(k,S)=argmin_(pho- >=0)[#{pho_i <= pho- | i in S} >= km]. pho(k,S): a (100 x k) percentile of the accuracy sample. k: belongs to [0,1]. S: subset of values (subset in [values-path]). m: number of values in S. #: denotes the number of elements in the set. Accuracy estimates: pho(Q_1,S)=pho(0.25,S), pho(Q_2,S)=pho(0.50,S), pho(Q_3,S)=pho(0.75,S), pho_max(S)=pho(1,S). Accuracy estimates are calculated for each group in [values-path]. Confidence intervals (bootstrapping): estimates: pho(Q_1), pho(Q_2), pho(Q_3), pho_max all calculated after resampling with replacement. out: mean with 95% percentile confidence interval. bootstrap scheme: percentile bootstrap (Efron & Tibshirani, 1993), left border - value at position of the largest integer not greater than alpha/2*[n-rep], right border - value at position of the smallest integer not less than (1-alpha/2)*[n-rep]. confidence level: alpha=0.05. ## Usage (require '[regression-tests.csv :refer :all]) ;; sample.csv ;; y,x1,x2,x3 ;; 1.1,1,1,4 ;; 1,2,3,2 ;; 0.95,2,2,3 ;; 1.15,1.5,1.5,1.5 ;; 2.1,3,3.1,5 ;; 2.05,3.5,3,5.5 ;; 3,4,3,6 ;; 3.01,3.8,2.5,6.3 ;; 3.02,3.9,2.7,6.5 ;; 2.9,4.2,3.4,6 ;; values.csv ;; y,urban,x1,x2,x3 ;; 1,1,1,1,1 ;; 2,1,1,1,1 ;; 1.1,1,1.1,1.3,1.1 ;; 1.4,1,2,1,3 ;; 3,1,3,1,2 ;; 2.1,1,1,1,2 ;; 2.4,1,3,1,3 ;; 1.3,1,1,0,1 ;; 1.5,1,2,1,0 ;; 3.4,1,3,4,1 ;; 0.1,1,3,1,1 ;; 0,1,1,1,1 ;; 1,1,1,1,1 (bootstrap-accuracy "sample.csv" "values.csv" (fn[indexes] (->> (map #(map (partial nth indexes) %&) [0 1 2 0 4 5 6 7 8 6] [0 1 5 3 5 5 6 7 0 9] [0 1 2 3 5 5 6 7 8 9] [0 1 2 3 4 5 8 7 8 9]) (apply mapv vector)))) => {:ci {"accuracy-bootstrap.csv" '(["id" "group-id" "95-percent-ci-1" "95-percent-ci-2" "mean"] ["p-25-percent-1" "1" "0.097270" "0.732594" "0.306091"] ["p-25-percent-all" "all" "0.097270" "0.732594" "0.306091"] ["p-50-percent-1" "1" "0.391024" "0.914979" "0.569725"] ["p-50-percent-all" "all" "0.391024" "0.914979" "0.569725"] ["p-75-percent-1" "1" "1.020029" "1.450733" "1.226903"] ["p-75-percent-all" "all" "1.020029" "1.450733" "1.226903"] ["p-max-1" "1" "2.215106" "3.011073" "2.525272"] ["p-max-all" "all" "2.215106" "3.011073" "2.525272"] ["p-min-1" "1" "0.078317" "0.128894" "0.098244"] ["p-min-all" "all" "0.078317" "0.128894" "0.098244"])} :samples {"accuracy-sample.csv" '(["min" "quartile-1" "quartile-2" "quartile-3" "max"] ["0.097507" "0.732594" "0.914979" "1.450733" "2.466349"] ["0.094261" "0.097270" "0.674088" "1.020029" "3.011073"] ["0.128894" "0.158486" "0.474098" "1.131372" "2.708930"] ["0.092243" "0.272683" "0.394438" "1.264116" "2.224903"] ["0.078317" "0.269425" "0.391024" "1.268268" "2.215106"])}} ## References [1] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 7(1): 1-26. DOI:10.1214/aos/1176344552. [2] Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York: Chapman and Hall.
bootstrap-independence-tests
(bootstrap-independence-tests path neighbours-path do-resampling)
Runs bootstrapping to get estimates for the independence tests. path: A path to the source csv file with the original sample. neighbours-path: A path to the source csv file with neigbours data. do-resampling: Method to get random resampling with replacement. returns: A hash-map with csv file names and content. Bootstrap hypothesis testing on spatial autocorrelation estimates: Moran's I (Moran, 1950), Geary's C (Geary, 1954) coefficients. out: mean with 95% percentile confidence interval, p-value. bootstrapping: bootstrap sample (pairs) is drawn from original residuals with replacement. confidence level: alpha=0.05. The equal-tail p-value in the two-tailed test is calculated as a twofold minimum between 1) the relative number of bootstrap statistics equal or less than a test statistic (for the original sample) 2) the relative number of bootstrap statistics bigger than a test statistic (for the original sample). An equal-tailed property means that the probability of a value to be from the left side of an interval is the same as the probability of a value to be from the right side of an interval (Efron and Tibshirani, 1993). Original neighbours matrix of spatial proximity is normalized by the number of neighbours of the i-th observation. ## Usage (require '[regression-tests.csv :refer :all]) ;; sample.csv ;; y,x1,x2,x3 ;; 1.1,1,1,4 ;; 1,2,3,2 ;; 0.95,2,2,3 ;; 1.15,1.5,1.5,1.5 ;; 2.1,3,3.1,5 ;; 2.05,3.5,3,5.5 ;; 3,4,3,6 ;; 3.01,3.8,2.5,6.3 ;; 3.02,3.9,2.7,6.5 ;; 2.9,4.2,3.4,6 ;; neighbours.csv ;; y,n1,n2,n3,n4 ;; 0,1,5,, ;; 1,0,,, ;; 2,,, ;; 33,9,,, ;; 4,,, ;; 5,0,7,8,9 ;; 6,,, ;; 7,5,,, ;; 8,5,,, ;; 9,33,5,, (bootstrap-independence-tests "sample.csv" "neighbours.csv" (fn[indexes] (->> (map #(map (partial nth indexes) %&) [0 1 2 0 4 5 6 7 8 6] [0 1 5 3 5 5 6 7 0 9] [0 1 2 3 5 5 6 7 8 9] [0 1 2 3 4 5 8 7 8 9]) (apply mapv vector)))) => {"independence-tests-bootstrap.csv" '(["statistics" "95-percent-ci-1" "95-percent-ci-2" "mean" "p-value"] ["morans-i-test" "-0.497198" "-0.184985" "-0.340847" "0.400000"] ["geary-c-test" "0.953672" "2.352850" "1.514549" "0.800000"]) "morans-i-test-sample.csv" '(["value"] ["-0.453250"] ["-0.497198"] ["-0.184985"] ["-0.298643"] ["-0.270161"]) "geary-c-test-sample.csv" '(["value"] ["2.352850"] ["1.380878"] ["0.953672"] ["1.457173"] ["1.428174"])} ## References [1] Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York: Chapman and Hall. [2] Geary, R. (1954). The Contiguity Ratio and Statistical Mapping. The Incorporated Statistician, 5(3): 115-145. DOI: 10.2307/2986645. [3] Moran, P. (1950). Notes on Continuous Stochastic Phenomena. Biometrika, 37(1-2): 17-23. DOI: 10.2307/2332142. [4] Lin, K.-P., Long, Z.-H., & Ou, B. (2011). The Size and Power of Bootstrap Tests for Spatial Dependence in a Linear Regression Model. Computational Economics, 38(2): 153-171. DOI: 10.1007/s10614-010-9224-0.
bootstrap-regression
(bootstrap-regression path do-resampling)
Runs bootstrapping to obtain estimates from the regression model. path: A path to the source csv file with the original sample. do-resampling: Method to get random resampling with replacement. returns: A hash-map with csv file names and content. Confidence intervals (bootstrapping): estimates: b_0, b_1, ..., b_n; R-square, MSE (mean square error). out: mean with 95% percentile confidence interval. bootstrap scheme: percentile bootstrap (Efron & Tibshirani, 1993), left border - value at position of the largest integer not greater than alpha/2*[n-rep], right border - value at position of the smallest integer not less than (1-alpha/2)*[n-rep]. ## Usage (require '[regression-tests.csv :refer :all]) ;; sample.csv ;; y,x1,x2,x3 ;; 1.1,1,1,4 ;; 1,2,3,2 ;; 0.95,2,2,3 ;; 1.15,1.5,1.5,1.5 ;; 2.1,3,3.1,5 ;; 2.05,3.5,3,5.5 ;; 3,4,3,6 ;; 3.01,3.8,2.5,6.3 ;; 3.02,3.9,2.7,6.5 ;; 2.9,4.2,3.4,6 (bootstrap-regression "sample.csv" (fn[indexes] (->> (map #(map (partial nth indexes) %&) [0 1 2 0 4 5 6 7 8 6] [0 1 5 3 5 5 6 7 0 9] [0 1 2 3 5 5 6 7 8 9] [0 1 2 3 4 5 8 7 8 9]) (apply mapv vector)))) => {"regression-stat-bootstrap.csv" '(["statistics" "95-percent-ci-1" "95-percent-ci-2" "mean"] ["x1" "0.509545" "1.008406" "0.810853"] ["x2" "-0.636473" "-0.117615" "-0.410165"] ["x3" "-0.014289" "0.262771" "0.098158"] ["intercept" "-0.387295" "0.736617" "0.238792"] ["r-squared" "0.940166" "0.957364" "0.948597"] ["mse" "0.051196" "0.074844" "0.062514"])} ## References [1] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 7(1): 1-26. DOI:10.1214/aos/1176344552. [2] Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York: Chapman and Hall.
run-permutation-tests
(run-permutation-tests path do-shuffle)
Runs permutation tests. path: A path to the source csv file with the original sample. do-shuffle: Method to get random permutations. returns: A hash-map with csv file names and content. Hypothesis testing (permutation tests): 1) Overall model significance - exact permutation test on R-square. H0: b_1=b_2=...=b_p=0. out: approximate p-value, calculated after permutations with 95%-normal approximation interval. 2) Significance of the i-th coefficient - approximate permutation test (Freedman & Lane, 1983) on t-statistic. H0: b_i=0. out: approximate p-value, calculated after permutations with 95%-normal approximation interval. ## Usage (require '[regression-tests.csv :refer :all]) ;; sample.csv ;; y,x1,x2,x3 ;; 1.1,1,1,4 ;; 1,2,3,2 ;; 0.95,2,2,3 ;; 1.15,1.5,1.5,1.5 ;; 2.1,3,3.1,5 ;; 2.05,3.5,3,5.5 ;; 3,4,3,6 ;; 3.01,3.8,2.5,6.3 ;; 3.02,3.9,2.7,6.5 ;; 2.9,4.2,3.4,6 (run-permutation-tests "sample.csv" (fn[indexes] (->> (map #(map (partial nth indexes) %&) [0 2 1 8 3 4 5 6 9 7] [5 0 1 3 2 4 8 6 7 9] [0 2 3 5 4 1 6 7 8 9]) (apply mapv vector)))) => {:tests {"permutation_tests.csv" '(["test" "p-value" "lower-bound-ci" "upper-bound-ci"] ["overall-test-r2" "0.250000" "-0.174352" "0.674352"] ["x1-test-t-stat" "0.000000" "0.000000" "0.000000"] ["x2-test-t-stat" "0.000000" "0.000000" "0.000000"] ["x3-test-t-stat" "0.250000" "-0.174352" "0.674352"])} :samples {"permutation_r2_sample.csv" '(["value"] ["0.946715"] ["0.748369"] ["0.797480"] ["0.699669"])}} ## References [1] Anderson, M. (2001). Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences, 58(3): 626-639. DOI: 10.1139/f01-004. [2] Freedman, D., & Lane, D. (1983). A Nonstochastic Interpretation of Reported Significance Levels. Journal of Business & Economic Statistics, 1(4): 292-298. DOI: 10.2307/1391660.