Sampling Stats

Warning

This library is under development, none of the presented solutions are available for download.

This module allows calculating parameters for simple and stratified sampling based on the volumes of the sampling units.

Class Parameters

SamplingStats(volume_df)

Parameters	Description
volume_df	The dataframe containing volume data for each sampling unit.

Class Functions

functions e parameters

  SamplingStats.simple(total_area, plot_id, plot_area,
                       volume, error_lim=10, conf=95)#(1)!
  SamplingStats.stratified(total_area, stratum_id, stratum_area,
                           plot_id, plot_area, volume, error_lim=10, conf=95)#(2)!
  SamplingStats.stratified_anova()

total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
plot_id = Name of the column containing the unique identifier of the plot/sample unit.
plot_area = Name of the column containing the area in square meters of the plot/sample unit.
volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations.
total_area = Name of the column containing the total area value in square meters of the assessed forest stand.
stratum_id = Name of the column containing the unique identifier of the stratum.
plot_id = Name of the column containing the unique identifier of the plot/sample unit.
plot_area = Name of the column containing the area in square meters of the plot/sample unit.
volume = Name of the column containing the volume values in cubic meters for each plot/sample unit.
error_lim = (Optional) Numeric value or name of the column containing the acceptable error limit as a percentage.
conf = (Optional) Numeric value or name of the column representing the confidence level (e.g., 95%) to be used in statistical calculations.

Parameters	Description
.simple()	Returns a DataFrame containing statistical parameters and sample sufficiency for simple random sampling.
.stratified()	Returns a DataFrame containing statistical parameters and sample sufficiency for stratified sampling.
.stratified_anova()	Returns a DataFrame containing analysis of variance (ANOVA) for the stratification performed using the `.stratified()` method.

Simple Sampling

Usage Example

Consider the adaptation of the example used by Sanquetta et al. (2014) to illustrate the calculation of statistics for the simple random sampling process.

Fazenda	Parcela	area_total (m²)	area_parcela (m²)	Volume (m³)	limite_erro(%)	nivel_confianca(%)
Fazenda 1	1	400000	600	20.85	10	95
Fazenda 1	2	400000	600	19.47	10	95
Fazenda 1	3	400000	600	24.13	10	95
Fazenda 1	4	400000	600	24.34	10	95
Fazenda 1	5	400000	600	25.13	10	95
Fazenda 1	6	400000	600	22.37	10	95
Fazenda 1	7	400000	600	22.51	10	95
Fazenda 1	8	400000	600	19.78	10	95
Fazenda 1	9	400000	600	25.05	10	95
Fazenda 1	10	400000	600	28.84	10	95
Fazenda 1	11	400000	600	23.70	10	95
Fazenda 1	12	400000	600	24.78	10	95
Fazenda 1	13	400000	600	22.58	10	95
Fazenda 1	14	400000	600	23.70	10	95
Fazenda 1	15	400000	600	36.16	10	95
Fazenda 1	16	400000	600	17.83	10	95

Download example file.

sampling_stats_simple_example.py
from fptools.sampling_stats import SamplingStats#(1)!

import pandas as pd#(2)!

Import the SamplingStats class.
Import pandas for data manipulation.

sampling_stats_simple_example.py
df = pd.read_excel(r'sua_pasta/volume_parcelas_simples_pt.xlsx')#(1)!

ss = SamplingStats(df)#(2)!

ss_results = ss.simple(total_area='area_total (m²)', plot_id='Parcela',
                       plot_area ='area_parcela (m²)',volume = 'Volume (m³)',
                       error_lim='limite_erro(%)', conf='nivel_confianca(%)')#(3)!

ss_results.to_excel('simple_sampling_stats.xlsx', index=False)#(4)!

Load an .xlsx file containing the data.
Create the variable ss containing the SamplingStats class and passing the DataFrame df.
Specify the column names for each parameter of the .simple() function and save the results in the variable ss_result.
Save the results to a simple_sampling_stats.xls file for later viewing.

A message will appear in the code executor output indicating the success of the calculations.

The following information will be generated by the simple() function:

metric	value
population	finite
real_n_par	16
ideal_n_par	15
mean (m³/plot)	23.83
variance (m³/plot)	17.82
st_deviation (m³/plot)	4.22
coeff_variation (%)	17.72
variance_of_the_mean (m³/plot)	1.09
st_error_of_the_mean (m³/plot)	1.04
abs_sampl_error (m³/plot)	2.24
rel_sampl_error (%)	9.39
mean_confidence_interval (m³)	(21.59, 26.06)
confidence_interval_total population (m³)	(14400.52, 17383.7)
total (m³/ha)	397.3
total_population (m³)	15892.11

Stratified Sampling

Usage Example

sampling_stats_stratified_example.py
from fptools.sampling_stats import SamplingStats#(1)!

import pandas as pd#(2)!

Import the SamplingStats class.
Import pandas for data manipulation.

sampling_stats_stratified_example.py
df_stratified = pd.read_excel(r'your_folder/volume_parcelas_estratificado_pt.xlsx')#(1)!

ss = SamplingStats(df_stratified)#(2)!

ss_results = ss.simpless.stratified(total_area='area_total (m²)',
                           stratum_id='Estrato',
                           stratum_area='Área do estrato (m²)',
                           plot_id='Parcela',
                           plot_area = 'area_parcela (m²)',
                           volume="Volume (m³)",
                           error_lim='limite_erro(%)',
                           conf='nivel_confianca(%)')#(3)!

ss_results.to_excel('stratified_sampling_stats.xlsx', index=False)#(4)!
anova = ss.stratified_anova()#(5)!
anova.to_excel("anova.xlsx", index=False)#(6)!

Load the .xlsx file containing the data.
Create the variable ss containing the SamplingStats class and passing the DataFrame df_stratified.
Specify the column names for each parameter of the .stratified() function and save the results in the variable ss_result.
Save the results to a stratified_sampling_stats.xlsx file for later viewing.
Save the variance analysis values in the variable anova.
Save the variance analysis to a file named anova.xlsx.

For this example, we will also use the values obtained in Sanquetta et al. (2014).
Download example file.

The following information will be generated by the stratified() function. In this case, statistical summaries are generated for both the total and the individual strata.

metrics	total	Estrato 1	Estrato 2
population	finite	finite	finite
real_n_par	24	12	12
ideal_n_par	8	5.2	2.8
mean (m³/plot)	107.25	89.08	125.42
variance (m³/plot)	137.91	71.54	261.17
st_deviation (m³/plot)	11.15	8.46	16.16
coeff_variation (%)	10.4	9.49	12.89
variance_of_the_mean (m³/plot)	5.05	5.85	21.02
st_error_of_the_mean (m³/plot)	2.25	2.42	4.58
abs_sampl_error (m³)	5.5	5.92	11.22
rel_sampl_error (%)	5.13	6.64	8.94
mean_confidence_interval (m³)	(101.75, 112.75)	(83.16, 95.0)	(114.2, 136.63)
confidence_interval_total population (m³)	(101752.9, 112747.1)	(54056.81, 61751.53)	(39969.52, 47822.15)
total (m³/ha)	107.25	89.08	125.42
total_population (m³)	107250	57904.17	43895.83

Analysis of variance (ANOVA) generated:

Source of Variation	SS	df	MS	F	F_critical	H₀
Between Strata	8,633.527	1.000	8,633.527	51.898	4.301	Rejected
Within Strata	3,659.833	22.000	166.356
Total	12,293.360	23.000

Equations Used

Simple

Sample Intensity

Finite Populations

\[ \text{Ideal number of plots}: \operatorname{n} = \frac{N t^2 S_x^2}{N E^2 + t^2 S_x^2} \]

Infinite Populations

\[ \text{Ideal number of plots}: \operatorname{n} = \frac{t^2 S_x^2}{E^2} \]

Statistics

\[ \text{Arithmetic Mean}: \quad \bar{x} = \frac{\sum_{i=1}^{n} X_i}{n} \]

\[ \text{Variance}: \quad s_x^2 = \frac{\sum_{i=1}^{n} (X_i - \bar{x})^2}{n - 1} \]

\[ \text{Standard Deviation}: \quad s_x = \sqrt{ \frac{\sum_{i=1}^{n} (X_i - \bar{x})^2}{n - 1} } \]

\[ \text{Variance of the Mean}: \quad s_{\bar{x}}^2 = \frac{s_x^2}{n} \cdot \left( \frac{N - n}{N} \right) \]

\[ \text{Standard Error}: \quad s_{\bar{x}} = \pm \frac{s_x}{\sqrt{n}} \cdot \sqrt{1 - f} \]

\[ \text{Coefficient of Variation}: \quad \operatorname{cv}(\%) = \frac{s_x}{\bar{x}} \cdot 100 \]

\[ \text{Absolute Sampling Error}: \quad E_a = \pm t \cdot s_{\bar{x}} \]

\[ \text{Relative Sampling Error}: \quad E_r = \pm \frac{t \cdot s_{\bar{x}}}{\bar{x}} \cdot 100 \]

\[ \text{Confidence Interval for the Mean}: \quad IC \left[ \bar{x} - (t \cdot s_{\bar{x}}) \leq \bar{X} \leq \bar{x} + (t \cdot s_{\bar{x}}) \right] = P \]

\[ \text{Population Total}: \quad \hat{X} = N \cdot \bar{x} \]

\[ \text{Confidence Interval for the Total}: \quad IC \left[ \hat{X} - N(t \cdot s_{\bar{x}}) \leq X \leq \hat{X} + N(t \cdot s_{\bar{x}}) \right] = P \]

Stratified

Sample Intensity

Finite Populations

\[ \text{Ideal number of plots}: \operatorname{n}= \frac{t^2 \sum_{h=1}^{L} W_h s_h^2}{E^2 + {t^2} \sum_{h=1}^{L} \frac{W_h s_h^2}{N}} \]

Infinite Populations

\[ \text{Ideal number of plots}: \operatorname{n} = \frac{t^2 \sum_{h=1}^{L} W_h s_h^2}{E^2} \]

Statistics

\[ \text{Stratum Mean}: \quad \bar{x}_h = \frac{\sum_{i=1}^{n_h} x_{ih}}{n_h} \]

\[ \text{Stratified Mean}: \quad \bar{x}_{st} = \frac{\sum_{h=1}^{L} \left( N_h \cdot \bar{x}_h \right)}{N} = \sum_{h=1}^{L} \left( W_h \cdot \bar{x}_h \right) \]

\[ \text{Stratum Variance}: \quad s_h^2 = \frac{\sum_{i=1}^{n_h} (x_{ih} - \bar{x}_h)^2}{n_h - 1} \]

\[ \text{Stratified Variance}: \quad s_{st}^2 = \sum_{h=1}^{L} \left( W_h s_h^2 \right) \]

\[ \text{Variance of the Stratified Mean}: \quad s_{\bar{x}(st)}^2 = \sum_{h=1}^{L} W_h^2 \cdot \frac{s_h^2}{n_h} - \sum_{h=1}^{L} \frac{W_h s_h^2}{N} \]

\[ \text{Stratified Standard Error}: \quad s_{\bar{x}(st)} = \sqrt{ \sum_{h=1}^{L} W_h^2 \cdot \frac{s_h^2}{n_h} - \sum_{h=1}^{L} \frac{W_h s_h^2}{N} } \]

\[ \text{Absolute Sampling Error}: \quad E_a = \pm t \cdot s_{\bar{x}(st)} \]

\[ \text{Relative Sampling Error}: \quad E_r = \pm \frac{t \cdot s_{\bar{x}(st)}}{\bar{x}_{(st)}} \cdot 100 \]

\[ \text{Confidence Interval for the Stratified Mean}: \quad IC \left[ \bar{x}_{(st)} - (t \cdot s_{\bar{x}(st)}) \leq \bar{X} \leq \bar{x}_{(st)} + (t \cdot s_{\bar{x}(st)}) \right] = P \]

\[ \text{Stratum Total}: \quad \hat{X}_h = N_h \cdot \bar{x}_h \]

\[ \text{Population Total}: \quad \hat{X} = \sum_{h=1}^{L} \hat{X}_h = N \cdot \bar{x}_{st} \]

\[ \text{Confidence Interval for the Total}: \quad IC\left[ \hat{X} - N(t \cdot s_{\bar{x}(st)}) \leq X \leq \hat{X} + N(t \cdot s_{\bar{x}(st)}) \right] = P \]

Analysis of Variance

\[ \text{Sum of Squares Between Strata}: \operatorname{SS}_b = \sum_{h=1}^{L} n_h \left( \bar{x}_h - \bar{x} \right)^2 \]

\[ \text{Sum of Squares Within Strata}: \operatorname{SS}_w = \sum_{h=1}^{L} \sum_{i=1}^{n_h} \left( x_{ih} - \bar{x}_h \right)^2 \]

\[ \text{Total Sum of Squares}: \operatorname{SS}_t = \sum_{h=1}^{L} \sum_{i=1}^{n_h} \left( x_{ih} - \bar{x} \right)^2 \]

\[ \text{Mean Square Between Strata}: \operatorname{MS}_b = \frac{\operatorname{SS}_b}{\operatorname{df}_b} \]

\[ \text{Mean Square Within Strata}: \operatorname{MS}_w = \frac{\operatorname{SS}_w}{\operatorname{df}_w} \]

\[ \text{Calculated F-Value}: \operatorname{F} = \frac{\operatorname{MS}_b}{\operatorname{MS}_w} \]

Notation

\( N \): Total number of units in the population or potential number
\( n \): Number of sampled or measured units
\( n_h \): Number of sampled or measured units in stratum \( h \)
\( t \): Value from the Student's t-distribution
\( s_x^2 \): Variance
\( s_h^2 \): Variance of stratum \( h \)
\( s_{\bar{x}(st)} \): Standard error of the stratified mean
\( W_h \): Proportion of stratum \( h \) in the population
\( E \): Tolerated error limit (%)
\( \bar{x} \): Sample mean
\( \bar{x}_h \): Sample mean of stratum \( h \)
\( x_{ih} \): Volume of the \( i \)-th plot within stratum \( h \)

References

SANQUETTA, C. R.; CORTE, A. P. D.; RODRIGUES, A. L.; WATZLAWICK, L. F. (2014). Inventários florestais: planejamento e execução. Curitiba: Multi-Graphic, 406 p.