Sampling Stats
Warning
This library is under development, none of the presented solutions are available for download.
This module allows calculating parameters for simple and stratified sampling based on the volumes of the sampling units.
Class Parameters
SamplingStats(volume_df)
Parameters | Description |
---|---|
volume_df | The dataframe containing volume data for each sampling unit. |
Class Functions
SamplingStats.simple(total_area, plot_id, plot_area,
volume, error_lim=10, conf=95)#(1)!
SamplingStats.stratified(total_area, stratum_id, stratum_area,
plot_id, plot_area, volume, error_lim=10, conf=95)#(2)!
SamplingStats.stratified_anova()
-
total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
plot_id = Name of the column containing the unique identifier of the plot/sample unit.
plot_area = Name of the column containing the area in square meters of the plot/sample unit.
volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations. -
total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
stratum_id = Name of the column containing the unique identifier of the stratum.
plot_id = Name of the column containing the unique identifier of the plot/sample unit.
plot_area = Name of the column containing the area in square meters of the plot/sample unit.
volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations.
Parameters | Description |
---|---|
.simple() | Returns a DataFrame containing statistical parameters and sample sufficiency for simple random sampling. |
.stratified() | Returns a DataFrame containing statistical parameters and sample sufficiency for stratified sampling. |
.stratified_anova() | Returns a DataFrame containing analysis of variance (ANOVA) for the stratification performed using the .stratified() method. |
Simple Sampling
Usage Example
Consider the adaptation of the example used by Sanquetta et al. (2014) to illustrate the calculation of statistics for the simple random sampling process.
Fazenda | Parcela | area_total (m²) | area_parcela (m²) | Volume (m³) | limite_erro(%) | nivel_confianca(%) |
---|---|---|---|---|---|---|
Fazenda 1 | 1 | 400000 | 600 | 20,85 | 10 | 95 |
Fazenda 1 | 2 | 400000 | 600 | 19,47 | 10 | 95 |
Fazenda 1 | 3 | 400000 | 600 | 24,13 | 10 | 95 |
Fazenda 1 | 4 | 400000 | 600 | 24,34 | 10 | 95 |
Fazenda 1 | 5 | 400000 | 600 | 25,13 | 10 | 95 |
Fazenda 1 | 6 | 400000 | 600 | 22,37 | 10 | 95 |
Fazenda 1 | 7 | 400000 | 600 | 22,51 | 10 | 95 |
Fazenda 1 | 8 | 400000 | 600 | 19,78 | 10 | 95 |
Fazenda 1 | 9 | 400000 | 600 | 25,05 | 10 | 95 |
Fazenda 1 | 10 | 400000 | 600 | 28,84 | 10 | 95 |
Fazenda 1 | 11 | 400000 | 600 | 23,70 | 10 | 95 |
Fazenda 1 | 12 | 400000 | 600 | 24,78 | 10 | 95 |
Fazenda 1 | 13 | 400000 | 600 | 22,58 | 10 | 95 |
Fazenda 1 | 14 | 400000 | 600 | 23,70 | 10 | 95 |
Fazenda 1 | 15 | 400000 | 600 | 36,16 | 10 | 95 |
Fazenda 1 | 16 | 400000 | 600 | 17,83 | 10 | 95 |
sampling_stats_simple_example.py | |
---|---|
- Import the
SamplingStats
class. - Import
pandas
for data manipulation.
- Load an
.xlsx
file containing the data. - Create the variable
ss
containing theSamplingStats
class and passing the DataFramedf
. - Specify the column names for each parameter of the
.simple()
function and save the results in the variabless_result
. - Save the results to a
simple_sampling_stats.xls
file for later viewing.

The following information will be generated by the simple()
function:
metric | value |
---|---|
population | finite |
real_n_par | 16 |
ideal_n_par | 15 |
mean (m³/plot) | 23,83 |
variance (m³/plot) | 17,82 |
st_deviation (m³/plot) | 4,22 |
coeff_variation (%) | 17,72 |
variance_of_the_mean (m³/plot) | 1,09 |
st_error_of_the_mean (m³/plot) | 1,04 |
abs_sampl_error (m³/plot) | 2,24 |
rel_sampl_error (%) | 9,39 |
mean_confidence_interval (m³) | (21.59, 26.06) |
confidence_interval_total population (m³) | (14400.52, 17383.7) |
total (m³/ha) | 397,3 |
total_population (m³) | 15892,11 |
Stratified Sampling
Usage Example
sampling_stats_stratified_example.py | |
---|---|
- Import the
SamplingStats
class. - Import
pandas
for data manipulation.
- Load the
.xlsx
file containing the data. - Create the variable
ss
containing theSamplingStats
class and passing the DataFramedf_stratified
. - Specify the column names for each parameter of the
.stratified()
function and save the results in the variabless_result
. - Save the results to a
stratified_sampling_stats.xlsx
file for later viewing. - Save the variance analysis values in the variable
anova
. - Save the variance analysis to a file named
anova.xlsx
.
For this example, we will also use the values obtained in Sanquetta et al. (2014).
Download example file.
The following information will be generated by the stratified()
function. In this case, statistical summaries are generated for both the total and the individual strata.
metrics | total | Estrato 1 | Estrato 2 |
---|---|---|---|
population | finite | finite | finite |
real_n_par | 24 | 12 | 12 |
ideal_n_par | 8 | 5,2 | 2,8 |
mean_stratified (m³/plot) | 107,25 | 89,08 | 125,42 |
variance (m³/plot) | 137,91 | 71,54 | 261,17 |
st_deviation (m³/plot) | 11,15 | 8,46 | 16,16 |
coeff_variation (%) | 10,4 | 9,49 | 12,89 |
variance_of_the_mean (m³/plot) | 5,05 | 5,85 | 21,02 |
st_error_of_the_mean (m³/plot) | 2,25 | 2,42 | 4,58 |
abs_sampl_error (m³) | 5,5 | 5,92 | 11,22 |
rel_sampl_error (%) | 5,13 | 6,64 | 8,94 |
mean_confidence_interval (m³) | (101.75, 112.75) | (83.16, 95.0) | (114.2, 136.63) |
confidence_interval_total population (m³) | (101752.9, 112747.1) | (54056.81, 61751.53) | (39969.52, 47822.15) |
total (m³/ha) | 107,25 | 89,08 | 125,42 |
total_population (m³) | 107250 | 57904,17 | 43895,83 |
Analysis of variance (ANOVA) generated:
Source of Variation | SS | df | MS | F | F_critical | H0 |
---|---|---|---|---|---|---|
Between Strata | 7920,667 | 1 | 7920,667 | 47,61273 | 4,30095 | Rejected |
Within Strata | 3659,833 | 22 | 166,356 | |||
Total | 11580,5 | 23 |
Equations Used
Sample Intensity
Finite Populations
Infinite Populations
Statistics
Sample Intensity
Finite Populations
Infinite Populations
Statistics
Analysis of Variance
Notation
- \( N \): Total number of units in the population or potential number
- \( n \): Number of sampled or measured units
- \( n_h \): Number of sampled or measured units in stratum \( h \)
- \( t \): Value from the Student's t-distribution
- \( s_x^2 \): Variance
- \( s_h^2 \): Variance of stratum \( h \)
- \( s_{\bar{x}(st)} \): Standard error of the stratified mean
- \( W_h \): Proportion of stratum \( h \) in the population
- \( E \): Tolerated error limit (%)
- \( \bar{x} \): Sample mean
- \( \bar{x}_h \): Sample mean of stratum \( h \)
- \( x_{ih} \): Volume of the \( i \)-th plot within stratum \( h \)
References
SANQUETTA, C. R.; CORTE, A. P. D.; RODRIGUES, A. L.; WATZLAWICK, L. F. (2014). Inventários florestais: planejamento e execução. Curitiba: Multi-Graphic, 406 p.