Skip to content

Sampling Stats

Warning

This library is under development, none of the presented solutions are available for download.

This module allows calculating parameters for simple and stratified sampling based on the volumes of the sampling units.


Class Parameters

SamplingStats(volume_df)
Parameters Description
volume_df The dataframe containing volume data for each sampling unit.

Class Functions

functions e parameters
  SamplingStats.simple(total_area, plot_id, plot_area,
                       volume, error_lim=10, conf=95)#(1)!
  SamplingStats.stratified(total_area, stratum_id, stratum_area,
                           plot_id, plot_area, volume, error_lim=10, conf=95)#(2)!
  SamplingStats.stratified_anova()

  1. total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
    plot_id = Name of the column containing the unique identifier of the plot/sample unit.
    plot_area = Name of the column containing the area in square meters of the plot/sample unit.
    volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
    error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
    conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations.

  2. total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
    stratum_id = Name of the column containing the unique identifier of the stratum.
    plot_id = Name of the column containing the unique identifier of the plot/sample unit.
    plot_area = Name of the column containing the area in square meters of the plot/sample unit.
    volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
    error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
    conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations.

Parameters Description
.simple() Returns a DataFrame containing statistical parameters and sample sufficiency for simple random sampling.
.stratified() Returns a DataFrame containing statistical parameters and sample sufficiency for stratified sampling.
.stratified_anova() Returns a DataFrame containing analysis of variance (ANOVA) for the stratification performed using the .stratified() method.

Simple Sampling

Usage Example

Consider the adaptation of the example used by Sanquetta et al. (2014) to illustrate the calculation of statistics for the simple random sampling process.

Fazenda Parcela area_total (m²) area_parcela (m²) Volume (m³) limite_erro(%) nivel_confianca(%)
Fazenda 1 1 400000 600 20,85 10 95
Fazenda 1 2 400000 600 19,47 10 95
Fazenda 1 3 400000 600 24,13 10 95
Fazenda 1 4 400000 600 24,34 10 95
Fazenda 1 5 400000 600 25,13 10 95
Fazenda 1 6 400000 600 22,37 10 95
Fazenda 1 7 400000 600 22,51 10 95
Fazenda 1 8 400000 600 19,78 10 95
Fazenda 1 9 400000 600 25,05 10 95
Fazenda 1 10 400000 600 28,84 10 95
Fazenda 1 11 400000 600 23,70 10 95
Fazenda 1 12 400000 600 24,78 10 95
Fazenda 1 13 400000 600 22,58 10 95
Fazenda 1 14 400000 600 23,70 10 95
Fazenda 1 15 400000 600 36,16 10 95
Fazenda 1 16 400000 600 17,83 10 95

Download example file.

sampling_stats_simple_example.py
1
2
3
from fptools.sampling_stats import SamplingStats#(1)!

import pandas as pd#(2)!

  1. Import the SamplingStats class.
  2. Import pandas for data manipulation.

sampling_stats_simple_example.py
df = pd.read_excel(r'sua_pasta/volume_parcelas_simples_pt.xlsx')#(1)!

ss = SamplingStats(df)#(2)!

ss_results = ss.simple(total_area='area_total (m²)', plot_id='Parcela',
                       plot_area ='area_parcela (m²)',volume = 'Volume (m³)',
                       error_lim='limite_erro(%)', conf='nivel_confianca(%)')#(3)!

ss_results.to_excel('simple_sampling_stats.xlsx', index=False)#(4)!

  1. Load an .xlsx file containing the data.
  2. Create the variable ss containing the SamplingStats class and passing the DataFrame df.
  3. Specify the column names for each parameter of the .simple() function and save the results in the variable ss_result.
  4. Save the results to a simple_sampling_stats.xls file for later viewing.
A message will appear in the code executor output indicating the success of the calculations.
output

The following information will be generated by the simple() function:

metric value
population finite
real_n_par 16
ideal_n_par 15
mean (m³/plot) 23,83
variance (m³/plot) 17,82
st_deviation (m³/plot) 4,22
coeff_variation (%) 17,72
variance_of_the_mean (m³/plot) 1,09
st_error_of_the_mean (m³/plot) 1,04
abs_sampl_error (m³/plot) 2,24
rel_sampl_error (%) 9,39
mean_confidence_interval (m³) (21.59, 26.06)
confidence_interval_total population (m³) (14400.52, 17383.7)
total (m³/ha) 397,3
total_population (m³) 15892,11

Stratified Sampling

Usage Example

sampling_stats_stratified_example.py
1
2
3
from fptools.sampling_stats import SamplingStats#(1)!

import pandas as pd#(2)!

  1. Import the SamplingStats class.
  2. Import pandas for data manipulation.

sampling_stats_stratified_example.py
df_stratified = pd.read_excel(r'your_folder/volume_parcelas_estratificado_pt.xlsx')#(1)!

ss = SamplingStats(df_stratified)#(2)!

ss_results = ss.simpless.stratified(total_area='area_total (m²)',
                           stratum_id='Estrato',
                           stratum_area='Área do estrato (m²)',
                           plot_id='Parcela',
                           plot_area = 'area_parcela (m²)',
                           volume="Volume (m³)",
                           error_lim='limite_erro(%)',
                           conf='nivel_confianca(%)')#(3)!

ss_results.to_excel('stratified_sampling_stats.xlsx', index=False)#(4)!
anova = ss.stratified_anova()#(5)!
anova.to_excel("anova.xlsx", index=False)#(6)!

  1. Load the .xlsx file containing the data.
  2. Create the variable ss containing the SamplingStats class and passing the DataFrame df_stratified.
  3. Specify the column names for each parameter of the .stratified() function and save the results in the variable ss_result.
  4. Save the results to a stratified_sampling_stats.xlsx file for later viewing.
  5. Save the variance analysis values in the variable anova.
  6. Save the variance analysis to a file named anova.xlsx.

For this example, we will also use the values obtained in Sanquetta et al. (2014).
Download example file.

The following information will be generated by the stratified() function. In this case, statistical summaries are generated for both the total and the individual strata.

metrics total Estrato 1 Estrato 2
population finite finite finite
real_n_par 24 12 12
ideal_n_par 8 5,2 2,8
mean_stratified (m³/plot) 107,25 89,08 125,42
variance (m³/plot) 137,91 71,54 261,17
st_deviation (m³/plot) 11,15 8,46 16,16
coeff_variation (%) 10,4 9,49 12,89
variance_of_the_mean (m³/plot) 5,05 5,85 21,02
st_error_of_the_mean (m³/plot) 2,25 2,42 4,58
abs_sampl_error (m³) 5,5 5,92 11,22
rel_sampl_error (%) 5,13 6,64 8,94
mean_confidence_interval (m³) (101.75, 112.75) (83.16, 95.0) (114.2, 136.63)
confidence_interval_total population (m³) (101752.9, 112747.1) (54056.81, 61751.53) (39969.52, 47822.15)
total (m³/ha) 107,25 89,08 125,42
total_population (m³) 107250 57904,17 43895,83

Analysis of variance (ANOVA) generated:

Source of Variation SS df MS F F_critical H0
Between Strata 7920,667 1 7920,667 47,61273 4,30095 Rejected
Within Strata 3659,833 22 166,356
Total 11580,5 23

Equations Used

  • Simple
  • Sample Intensity
    Finite Populations
    \[ \text{Ideal number of plots}: \operatorname{n} = \frac{N t^2 S_x^2}{N E^2 + t^2 S_x^2} \]
    Infinite Populations
    \[ \text{Ideal number of plots}: \operatorname{n} = \frac{t^2 S_x^2}{E^2} \]
    Statistics
    \[ \text{Arithmetic Mean}: \quad \bar{x} = \frac{\sum_{i=1}^{n} X_i}{n} \]
    \[ \text{Variance}: \quad s_x^2 = \frac{\sum_{i=1}^{n} (X_i - \bar{x})^2}{n - 1} \]
    \[ \text{Standard Deviation}: \quad s_x = \sqrt{ \frac{\sum_{i=1}^{n} (X_i - \bar{x})^2}{n - 1} } \]
    \[ \text{Variance of the Mean}: \quad s_{\bar{x}}^2 = \frac{s_x^2}{n} \cdot \left( \frac{N - n}{N} \right) \]
    \[ \text{Standard Error}: \quad s_{\bar{x}} = \pm \frac{s_x}{\sqrt{n}} \cdot \sqrt{1 - f} \]
    \[ \text{Coefficient of Variation}: \quad \operatorname{cv}(\%) = \frac{s_x}{\bar{x}} \cdot 100 \]
    \[ \text{Absolute Sampling Error}: \quad E_a = \pm t \cdot s_{\bar{x}} \]
    \[ \text{Relative Sampling Error}: \quad E_r = \pm \frac{t \cdot s_{\bar{x}}}{\bar{x}} \cdot 100 \]
    \[ \text{Confidence Interval for the Mean}: \quad IC \left[ \bar{x} - (t \cdot s_{\bar{x}}) \leq \bar{X} \leq \bar{x} + (t \cdot s_{\bar{x}}) \right] = P \]
    \[ \text{Population Total}: \quad \hat{X} = N \cdot \bar{x} \]
    \[ \text{Confidence Interval for the Total}: \quad IC \left[ \hat{X} - N(t \cdot s_{\bar{x}}) \leq X \leq \hat{X} + N(t \cdot s_{\bar{x}}) \right] = P \]

  • Stratified
  • Sample Intensity
    Finite Populations
    \[ \text{Ideal number of plots}: \operatorname{n}= \frac{t^2 \sum_{h=1}^{L} W_h s_h^2}{E^2 + {t^2} \sum_{h=1}^{L} \frac{W_h s_h^2}{N}} \]
    Infinite Populations
    \[ \text{Ideal number of plots}: \operatorname{n} = \frac{t^2 \sum_{h=1}^{L} W_h s_h^2}{E^2} \]
    Statistics
    \[ \text{Stratum Mean}: \quad \bar{x}_h = \frac{\sum_{i=1}^{n_h} x_{ih}}{n_h} \]
    \[ \text{Stratified Mean}: \quad \bar{x}_{st} = \frac{\sum_{h=1}^{L} \left( N_h \cdot \bar{x}_h \right)}{N} = \sum_{h=1}^{L} \left( W_h \cdot \bar{x}_h \right) \]
    \[ \text{Stratum Variance}: \quad s_h^2 = \frac{\sum_{i=1}^{n_h} (x_{ih} - \bar{x}_h)^2}{n_h - 1} \]
    \[ \text{Stratified Variance}: \quad s_{st}^2 = \sum_{h=1}^{L} \left( W_h s_h^2 \right) \]
    \[ \text{Variance of the Stratified Mean}: \quad s_{\bar{x}(st)}^2 = \sum_{h=1}^{L} W_h^2 \cdot \frac{s_h^2}{n_h} - \sum_{h=1}^{L} \frac{W_h s_h^2}{N} \]
    \[ \text{Stratified Standard Error}: \quad s_{\bar{x}(st)} = \sqrt{ \sum_{h=1}^{L} W_h^2 \cdot \frac{s_h^2}{n_h} - \sum_{h=1}^{L} \frac{W_h s_h^2}{N} } \]
    \[ \text{Absolute Sampling Error}: \quad E_a = \pm t \cdot s_{\bar{x}(st)} \]
    \[ \text{Relative Sampling Error}: \quad E_r = \pm \frac{t \cdot s_{\bar{x}(st)}}{\bar{x}_{(st)}} \cdot 100 \]
    \[ \text{Confidence Interval for the Stratified Mean}: \quad IC \left[ \bar{x}_{(st)} - (t \cdot s_{\bar{x}(st)}) \leq \bar{X} \leq \bar{x}_{(st)} + (t \cdot s_{\bar{x}(st)}) \right] = P \]
    \[ \text{Stratum Total}: \quad \hat{X}_h = N_h \cdot \bar{x}_h \]
    \[ \text{Population Total}: \quad \hat{X} = \sum_{h=1}^{L} \hat{X}_h = N \cdot \bar{x}_{st} \]
    \[ \text{Confidence Interval for the Total}: \quad IC\left[ \hat{X} - N(t \cdot s_{\bar{x}(st)}) \leq X \leq \hat{X} + N(t \cdot s_{\bar{x}(st)}) \right] = P \]
    Analysis of Variance
    \[ \text{Sum of Squares Between Strata}: \operatorname{SS}_b = \sum_{h=1}^{L} n_h \left( \bar{x}_h - \bar{x} \right)^2 \]
    \[ \text{Sum of Squares Within Strata}: \operatorname{SS}_w = \sum_{h=1}^{L} \sum_{i=1}^{n_h} \left( x_{ih} - \bar{x}_h \right)^2 \]
    \[ \text{Total Sum of Squares}: \operatorname{SS}_t = \sum_{h=1}^{L} \sum_{i=1}^{n_h} \left( x_{ih} - \bar{x} \right)^2 \]
    \[ \text{Mean Square Between Strata}: \operatorname{MS}_b = \frac{\operatorname{SS}_b}{\operatorname{df}_b} \]
    \[ \text{Mean Square Within Strata}: \operatorname{MS}_w = \frac{\operatorname{SS}_w}{\operatorname{df}_w} \]
    \[ \text{Calculated F-Value}: \operatorname{F} = \frac{\operatorname{MS}_b}{\operatorname{MS}_w} \]

    Notation

    • \( N \): Total number of units in the population or potential number
    • \( n \): Number of sampled or measured units
    • \( n_h \): Number of sampled or measured units in stratum \( h \)
    • \( t \): Value from the Student's t-distribution
    • \( s_x^2 \): Variance
    • \( s_h^2 \): Variance of stratum \( h \)
    • \( s_{\bar{x}(st)} \): Standard error of the stratified mean
    • \( W_h \): Proportion of stratum \( h \) in the population
    • \( E \): Tolerated error limit (%)
    • \( \bar{x} \): Sample mean
    • \( \bar{x}_h \): Sample mean of stratum \( h \)
    • \( x_{ih} \): Volume of the \( i \)-th plot within stratum \( h \)

    References

    SANQUETTA, C. R.; CORTE, A. P. D.; RODRIGUES, A. L.; WATZLAWICK, L. F. (2014). Inventários florestais: planejamento e execução. Curitiba: Multi-Graphic, 406 p.