The evaluation of convergence is important not only for determining the dynamic of member states in the EU but also as a support to policy makers.

The R package convergEU is a suite of functions to download, clean and analyze some convergence features.

In this document, the package convergEU is described and the main functionalities illustrated.

1 Datasets on EU member states

Two types of sources are considered: data produced by Eurofound, available without and active Internet connection, and Eurostat data that can be downloaded on the fly, upon necessity from this package.

1.1 Locally accessible datasets

Some datasets are accessible from package convergEU using the R function data(), for example :

data("emp_20_64_MS",package = "convergEU")
head(emp_20_64_MS)

Eurofound datasets are locally available within the convergEU package, see:

data(package = "convergEU")

A description of the above data is available by the R help, for example:

help(emp_20_64_MS)

Eurofond local data are considered below:

data(dbEurofound)
head(dbEurofound)
#> # A tibble: 6 × 17
#>    time geo   geo_label sex     lifesatisf health goodhealth_p trustlocal volunt
#>   <dbl> <chr> <chr>     <chr>        <dbl>  <dbl>        <dbl>      <dbl>  <dbl>
#> 1  1960 AD    Andorra   Females         NA     NA           NA         NA     NA
#> 2  1960 AD    Andorra   Males           NA     NA           NA         NA     NA
#> 3  1960 AD    Andorra   Total           NA     NA           NA         NA     NA
#> 4  1960 AL    Albania   Females         NA     NA           NA         NA     NA
#> 5  1960 AL    Albania   Males           NA     NA           NA         NA     NA
#> 6  1960 AL    Albania   Total           NA     NA           NA         NA     NA
#> # ℹ 8 more variables: volunt_p <dbl>, caring_h <dbl>, socialexc_i <dbl>,
#> #   JQIskill_i <dbl>, JQIenviron_i <dbl>, JQIintensity_i <dbl>,
#> #   JQItime_i <dbl>, exposdiscr_p <dbl>

where variable names are:

names(dbEurofound)
#>  [1] "time"           "geo"            "geo_label"      "sex"           
#>  [5] "lifesatisf"     "health"         "goodhealth_p"   "trustlocal"    
#>  [9] "volunt"         "volunt_p"       "caring_h"       "socialexc_i"   
#> [13] "JQIskill_i"     "JQIenviron_i"   "JQIintensity_i" "JQItime_i"     
#> [17] "exposdiscr_p"

and time ranges in the interval:

c(min(dbEurofound$time), max(dbEurofound$time))
#> [1] 1960 2017

and the dataset is not complete in such a time range for all considered countries.

Further details on Eurofound dataset are available as follows (metainformation):

data(dbEUF2018meta)
print(dbEUF2018meta,n=20,width=100)
#> # A tibble: 13 × 10
#>    DIMENSION          SUBDIMENSION      INDICATOR Code_in_database Official_code
#>    <chr>              <chr>             <chr>     <chr>            <chr>        
#>  1 Quality of life    Life satisfaction Mean lif… lifesatisf       y16_q4       
#>  2 Quality of life    Health            Mean hea… health           y16_q48      
#>  3 Quality of life    Health            Percenta… goodhealth_p     y16_q48      
#>  4 Quality of life    Quality of socie… Mean lev… trustlocal       y16_q35f     
#>  5 Quality of life    Quality of socie… Level of… volunt           y16_q29a     
#>  6 Quality of life    Quality of socie… Percenta… volunt_p         y16_q29a     
#>  7 Quality of life    Quality of socie… Hours pe… caring_h         y16_q43a     
#>  8 Quality of life    Quality of socie… Social E… socialexc_i      y16_socexind…
#>  9 Working conditions Working conditio… JQI_Skil… JQIskill_i       wq_slim -  J…
#> 10 Working conditions Working conditio… JQI_Phys… JQIenviron_i     envsec_slim …
#> 11 Working conditions Working conditio… JQI_Inte… JQIintensity_i   intens_slim …
#> 12 Working conditions Working conditio… JQI_Work… JQItime_i        wlb_slim - J…
#> 13 Working conditions Working conditio… Expositi… exposdiscr_p     disc_d -  Ha…
#>    Unit  Source_organisation Source_reference Disaggregation Bookmark_URL       
#>    <chr> <chr>               <chr>            <chr>          <chr>              
#>  1 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  2 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  3 %     Eurofound           EQLS             sex            https://www.eurofo…
#>  4 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  5 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  6 %     Eurofound           EQLS             sex            https://www.eurofo…
#>  7 hours Eurofound           EQLS             sex            https://www.eurofo…
#>  8 index Eurofound           EQLS             sex            https://www.eurofo…
#>  9 index Eurofound           EWCS             sex            https://www.eurofo…
#> 10 index Eurofound           EWCS             sex            https://www.eurofo…
#> 11 index Eurofound           EWCS             sex            https://www.eurofo…
#> 12 index Eurofound           EWCS             sex            https://www.eurofo…
#> 13 %     Eurofound           EWCS             sex            https://www.eurofo…

NOTE: within convergeEU package, Eurofound data are statically stored. Please update this package to have the most recent version of Eurofound data.

The first step of an analysis is data preparation. This amounts to choose a time interval, an indicator and a set of countries (MS, Member States), for example:

convergEU_glb()$EU12$memberStates$codeMS
#>  [1] "BE" "DK" "FR" "DE" "EL" "IE" "IT" "LU" "NL" "PT" "ES" "UK"

thus, selecting “lifesatisf” from the column “Code_in_database”

myTB <- extract_indicator_EUF(
    indicator_code = "lifesatisf", #Code_in_database
    fromTime=2003,
    toTime=2016,
    gender= c("Total","Females","Males")[2],
    countries= convergEU_glb()$EU12$memberStates$codeMS
    )
  
myTB
#> $res
#> # A tibble: 4 × 14
#>    time sex       BE    DE    DK    EL    ES    FR    IE    IT    LU    NL    PT
#>   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Femal…  7.38  7.48  8.37  6.75  7.43  6.97  7.89  7.12  7.77  7.54  5.86
#> 2  2007 Femal…  7.49  7.17  8.51  6.55  7.14  7.34  7.66  6.54  7.87  7.90  6.09
#> 3  2011 Femal…  7.47  7.27  8.30  6.13  7.51  7.17  7.41  6.86  7.72  7.74  6.72
#> 4  2016 Femal…  7.27  7.28  8.33  5.30  6.97  7.24  7.66  6.56  7.96  7.74  6.79
#> # ℹ 1 more variable: UK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which results in a complete dataset ready for further analysis. IMPORTANT: the analysis of convergence is performed on clean and imputed data, i.e. a tidy dataset in the format years by countries. This means that the dataset must always have these characteristics:

If missing values are present, then imputation is required, as described in the next sections.

Another illustrative example follows.

print(dbEUF2018meta,n=20,width=100)
#> # A tibble: 13 × 10
#>    DIMENSION          SUBDIMENSION      INDICATOR Code_in_database Official_code
#>    <chr>              <chr>             <chr>     <chr>            <chr>        
#>  1 Quality of life    Life satisfaction Mean lif… lifesatisf       y16_q4       
#>  2 Quality of life    Health            Mean hea… health           y16_q48      
#>  3 Quality of life    Health            Percenta… goodhealth_p     y16_q48      
#>  4 Quality of life    Quality of socie… Mean lev… trustlocal       y16_q35f     
#>  5 Quality of life    Quality of socie… Level of… volunt           y16_q29a     
#>  6 Quality of life    Quality of socie… Percenta… volunt_p         y16_q29a     
#>  7 Quality of life    Quality of socie… Hours pe… caring_h         y16_q43a     
#>  8 Quality of life    Quality of socie… Social E… socialexc_i      y16_socexind…
#>  9 Working conditions Working conditio… JQI_Skil… JQIskill_i       wq_slim -  J…
#> 10 Working conditions Working conditio… JQI_Phys… JQIenviron_i     envsec_slim …
#> 11 Working conditions Working conditio… JQI_Inte… JQIintensity_i   intens_slim …
#> 12 Working conditions Working conditio… JQI_Work… JQItime_i        wlb_slim - J…
#> 13 Working conditions Working conditio… Expositi… exposdiscr_p     disc_d -  Ha…
#>    Unit  Source_organisation Source_reference Disaggregation Bookmark_URL       
#>    <chr> <chr>               <chr>            <chr>          <chr>              
#>  1 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  2 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  3 %     Eurofound           EQLS             sex            https://www.eurofo…
#>  4 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  5 --    Eurofound           EQLS             sex            https://www.eurofo…
#>  6 %     Eurofound           EQLS             sex            https://www.eurofo…
#>  7 hours Eurofound           EQLS             sex            https://www.eurofo…
#>  8 index Eurofound           EQLS             sex            https://www.eurofo…
#>  9 index Eurofound           EWCS             sex            https://www.eurofo…
#> 10 index Eurofound           EWCS             sex            https://www.eurofo…
#> 11 index Eurofound           EWCS             sex            https://www.eurofo…
#> 12 index Eurofound           EWCS             sex            https://www.eurofo…
#> 13 %     Eurofound           EWCS             sex            https://www.eurofo…
 
names(convergEU_glb())
#>  [1] "EUcodes"         "EA"              "EA19"            "EU12"           
#>  [5] "EU15"            "EU25"            "EU27_2007"       "EU27_2019"      
#>  [9] "EU27_2020"       "EU27"            "EU28"            "geoRefEUF"      
#> [13] "metaEUStat"      "tmpl_out"        "paralintags"     "rounDigits"     
#> [17] "epsilonV"        "scoreBoaTB"      "labels_clusters"
myTB <- extract_indicator_EUF(
    indicator_code = "JQIintensity_i", #Code_in_database
    fromTime= 1965,
    toTime=2016,
    gender= c("Total","Females","Males")[1],
    countries= convergEU_glb()$EU27_2020$memberStates$codeMS
    )
  
print(myTB$res,n=35,width=250)
#> # A tibble: 5 × 29
#>    time sex      AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total  48.8  33.2  NA    NA    NA    40.8  39.0  NA    40.9  34.2  47.1
#> 2  2000 Total  42.9  37.3  43.7  53.4  42.6  40.9  37.6  38.2  43.5  36.2  46.7
#> 3  2005 Total  47.6  42.8  33.8  50.7  45.8  46.9  47.9  41.7  50.5  41.2  49.6
#> 4  2010 Total  42.1  40.2  31.2  52.5  41.9  44.9  39.1  41.9  48.6  38.0  45.9
#> 5  2015 Total  42.4  41.5  34.6  57.2  36.7  40.2  45.0  38.7  49.3  46.5  41.1
#>      FR    HR    HU    IE    IT    LT    LU    LV    MT    NL    PL    PT    RO
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  38.4  NA    NA    39.0  34.1  NA    31.4  NA    NA    41.8  NA    36.2  NA  
#> 2  39.5  NA    42.1  42.2  39.7  29.9  37.6  32.8  50.0  41.3  40.9  31.8  47.8
#> 3  40.5  31.6  47.2  36.9  41.9  37.3  40.6  34.3  48.4  40.3  35.7  40.1  45.9
#> 4  43.0  39.5  48.7  47.0  40.8  33.2  40.8  31.9  44.0  38.5  31.4  31.6  43.3
#> 5  42.7  38.4  44.7  42.8  38.1  37.8  42.4  31.5  44.8  38.7  35.0  36.8  54.2
#>      SE    SI    SK
#>   <dbl> <dbl> <dbl>
#> 1  43.3  NA    NA  
#> 2  47.9  29.5  41.6
#> 3  48.1  49.2  39.6
#> 4  45.9  48.2  37.6
#> 5  46.1  43.0  35.9

Imputation must take place before doing any analysis:

myTBinp <- impute_dataset(myTB$res, timeName = "time",
                          countries=convergEU_glb()$EU27_2020$memberStates$codeMS,
                          tailMiss = c("cut", "constant")[2],
                          headMiss = c("cut", "constant")[2]) 
print(myTBinp$res,n=35,width=250)
#> # A tibble: 5 × 29
#>    time sex      AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1995 Total  48.8  33.2  43.7  53.4  42.6  40.8  39.0  38.2  40.9  34.2  47.1
#> 2  2000 Total  42.9  37.3  43.7  53.4  42.6  40.9  37.6  38.2  43.5  36.2  46.7
#> 3  2005 Total  47.6  42.8  33.8  50.7  45.8  46.9  47.9  41.7  50.5  41.2  49.6
#> 4  2010 Total  42.1  40.2  31.2  52.5  41.9  44.9  39.1  41.9  48.6  38.0  45.9
#> 5  2015 Total  42.4  41.5  34.6  57.2  36.7  40.2  45.0  38.7  49.3  46.5  41.1
#>      FR    HR    HU    IE    IT    LT    LU    LV    MT    NL    PL    PT    RO
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  38.4  31.6  42.1  39.0  34.1  29.9  31.4  32.8  50.0  41.8  40.9  36.2  47.8
#> 2  39.5  31.6  42.1  42.2  39.7  29.9  37.6  32.8  50.0  41.3  40.9  31.8  47.8
#> 3  40.5  31.6  47.2  36.9  41.9  37.3  40.6  34.3  48.4  40.3  35.7  40.1  45.9
#> 4  43.0  39.5  48.7  47.0  40.8  33.2  40.8  31.9  44.0  38.5  31.4  31.6  43.3
#> 5  42.7  38.4  44.7  42.8  38.1  37.8  42.4  31.5  44.8  38.7  35.0  36.8  54.2
#>      SE    SI    SK
#>   <dbl> <dbl> <dbl>
#> 1  43.3  29.5  41.6
#> 2  47.9  29.5  41.6
#> 3  48.1  49.2  39.6
#> 4  45.9  48.2  37.6
#> 5  46.1  43.0  35.9

1.2 Metaresults and missing values check

Several functions in convergEU package return a list with metainformation, that is three components: res, msg, err. The first list component, res, is the actual result, if computed. The second component, msg is a message decorating the computed result, possibly a warning. The third component, err, is an error message or a list of errors when a result is not computed. Below this behavior is illustrated for function check_data.

The structure of the standard dataset is a time by countries rectangular table. All variables are quantitative. The following function check for such features:

check_data(emp_20_64_MS)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

where the list component res is TRUE, that is all checks are passed.

In case of qualitative variable or missing data checks fail, for example if time is qualitative:

tmp <-  emp_20_64_MS
tmp <-  mutate(tmp, time=factor(emp_20_64_MS$time))
check_data(tmp)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: qualitative variables  in the dataframe."

the err component explains what went wrong.

Similar errors are signaled if the dataset is not complete:

tmp <-  emp_20_64_MS 
tmp[3:6,1]<- NA
check_data(tmp)
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

1.3 Imputation for artificially generated missing values in the Eurofound database

Let’s consider the following indicator from the Eurofound database:


myTB <- extract_indicator_EUF(
    indicator_code = "exposdiscr_p", #Code_in_database
    fromTime=1966,
    toTime=2016,
    gender= c("Total","Females","Males")[1],
    countries= convergEU_glb()$EU12$memberStates$codeMS
    )

where missing value are absent

sapply(myTB$res,function(vx)sum(is.na(vx)))
#> time  sex   BE   DE   DK   EL   ES   FR   IE   IT   LU   NL   PT   UK 
#>    0    0    0    0    0    0    0    0    0    0    0    0    0    0

thus an artificial dataset is built by introducing some missing values and by taking further years for testing purposes:

set.seed(1999)
myTB2 <- dplyr::bind_rows(myTB$res,myTB$res,myTB$res)
myTB2 <- dplyr::mutate(myTB2, time= seq(1975,2015,5))
for(aux in 3:14){
  myTB2[[aux]] <-   myTB2[[aux]] + c(runif(6,-2.5,2.5),0,0,0)
}

myTB2[["BE"]][1:2] <-  NA
myTB2[["DE"]][8:9] <-  NA
myTB2[["IT"]][c(3,4, 6,7,8)] <-  NA
myTB2[["DK"]][6] <-  NA
myTB2
#> # A tibble: 9 × 14
#>    time sex      BE    DE    DK    EL    ES    FR    IE    IT    LU    NL    PT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1975 Total NA     6.68  4.62  9.77 0.134  6.25  8.94  4.20 10.8   8.56  2.97
#> 2  1980 Total NA     8.71  3.97 10.6  6.90  11.6   3.16  5.23 11.2   5.43  5.11
#> 3  1985 Total  9.57  6.35  3.49  7.86 4.67  10.8   6.37 NA    11.4  12.0   5.74
#> 4  1990 Total  5.68  6.29  4.61  9.55 3.97   6.09  6.99 NA     8.09  9.46  3.88
#> 5  1995 Total 13.0   8.74  3.06  9.36 3.97  11.9   5.60  2.68 10.7   5.04  1.56
#> 6  2000 Total  9.75  7.63 NA     6.51 5.85  11.4   8.39 NA    15.6  11.7   5.61
#> 7  2005 Total  6.14  4.53  5.66  7.89 2.13   5.08  6.75 NA     8.86  8.51  4.97
#> 8  2010 Total 11.0  NA     4.93  8.32 4.47  10.6   5.32 NA    10.9   6.02  3.88
#> 9  2015 Total  9.65 NA     5.40  7.88 4.88  11.2   6.83  6.75 13.6  12.2   3.61
#> # ℹ 1 more variable: UK <dbl>

Now an imputation function may be called to prepare data for calculations on convergence. The two examples below differ about what to do with missing starting values.

toBeProcessed <- c( "IT","BE", "DE", "DK","UK")
# debug(impute_dataset)

impute_dataset(myTB2, countries=toBeProcessed,
                            timeName = "time",
                            tailMiss = c("cut", "constant")[1],
                            headMiss = c("cut", "constant")[1]) 
#> $res
#> # A tibble: 5 × 14
#>    time sex      BE    DE    DK    EL    ES    FR    IE    IT    LU    NL    PT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1985 Total  9.57  6.35  3.49  7.86  4.67 10.8   6.37  4.38 11.4  12.0   5.74
#> 2  1990 Total  5.68  6.29  4.61  9.55  3.97  6.09  6.99  3.53  8.09  9.46  3.88
#> 3  1995 Total 13.0   8.74  3.06  9.36  3.97 11.9   5.60  2.68 10.7   5.04  1.56
#> 4  2000 Total  9.75  7.63  4.36  6.51  5.85 11.4   8.39  3.70 15.6  11.7   5.61
#> 5  2005 Total  6.14  4.53  5.66  7.89  2.13  5.08  6.75  4.71  8.86  8.51  4.97
#> # ℹ 1 more variable: UK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

impute_dataset(myTB2, countries=toBeProcessed,
                            timeName = "time",
                            tailMiss = c("cut", "constant")[2],
                            headMiss = c("cut", "constant")[1]) 
#> $res
#> # A tibble: 7 × 14
#>    time sex      BE    DE    DK    EL    ES    FR    IE    IT    LU    NL    PT
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  1975 Total  9.57  6.68  4.62  9.77 0.134  6.25  8.94  4.20 10.8   8.56  2.97
#> 2  1980 Total  9.57  8.71  3.97 10.6  6.90  11.6   3.16  5.23 11.2   5.43  5.11
#> 3  1985 Total  9.57  6.35  3.49  7.86 4.67  10.8   6.37  4.38 11.4  12.0   5.74
#> 4  1990 Total  5.68  6.29  4.61  9.55 3.97   6.09  6.99  3.53  8.09  9.46  3.88
#> 5  1995 Total 13.0   8.74  3.06  9.36 3.97  11.9   5.60  2.68 10.7   5.04  1.56
#> 6  2000 Total  9.75  7.63  4.36  6.51 5.85  11.4   8.39  3.70 15.6  11.7   5.61
#> 7  2005 Total  6.14  4.53  5.66  7.89 2.13   5.08  6.75  4.71  8.86  8.51  4.97
#> # ℹ 1 more variable: UK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

The above calculations passed numerical tests and comparisons. If a country is processed but it has no missing, then no numerical value change.

2 On Convergence

Several measures of convergence have been recently proposed by Eurofound (Eurofound (2018), Upward convergence in the EU: Concepts, measurements and indicators, Publications Office of the European Union, Luxembourg; by: Massimiliano Mascherini, Martina Bisello, Hans Dubois and Franz Eiffe)

In this section each each measure is considered by one or more examples.

2.1 Beta-convergence

Let’s assume we have a dataset (tibble) of sorted times by countries values. The calculations are performed according to the following linear model: \[ ln(y_{m,i,t+\tau})-ln(y_{m,i,t}) = \beta_0 + \beta_1 ln(y_{m,i,t}) +\epsilon_{m,i,t} \] where \(m\) represent the member state of EU (country), \(i\) refers to an indicator of interest, \(t\) is the reference time and \(\tau \in \{1,2,\ldots\}\) the length of the time window (typically \(1\) or more years).

In the simplest case, just two time values are considered, \(t\) and \(t+\tau\), while in a more general setup all observed times in set \(\{t,t+1,\ldots,t+\tau-1, t+\tau\}\) are included into regression.

In this more general case, the current implementation of beta-convergence function always maintain the same reference time across different years and it divides the left hand side by the amount of time elasped as an option, that is the alternative formula: \[ \tau^{-1}(ln(y_{m,i,t+\tau})-ln(y_{m,i,t})) = \beta_0 + \beta_1 ln(y_{m,i,t}) +\epsilon_{m,i,t} \] is available.

The output of beta_conv() is a list in which transformed data, the point estimate of \(\beta_1\) and a standard two tails test is reported (p-value and adjusted R squared). One tail test \(H_0: \beta_1 \geq 0\) against \(H_1: \beta1< 0\) might be of some interest, but it is not implemented.

Below an example on how to invoke the function:

#library(ggplot2)
#library(dplyr)
#library(tibble)

testTB <- tribble(
  ~time, ~countryA ,  ~countryB,  ~countryC,
    2000,     0.8,   2.7,    3.9,
    2001,     1.2,   3.2,    4.2,
    2002,     0.9,   2.9,    4.1,
    2003,     1.3,   2.9,    4.0,
    2004,     1.2,   3.1,    4.1,
    2005,     1.2,   3.0,    4.0
  )
 
res <- beta_conv(tavDes = testTB, time_0 = 2002, time_t = 2004, 
                 all_within = TRUE, 
                 timeName = "time")
res
#> $res
#> $res$workTB
#> # A tibble: 6 × 3
#>   deltaIndic  indic countries
#>        <dbl>  <dbl> <chr>    
#> 1     0.184  -0.105 countryA 
#> 2     0       1.06  countryB 
#> 3    -0.0123  1.41  countryC 
#> 4     0.144  -0.105 countryA 
#> 5     0.0333  1.06  countryB 
#> 6     0       1.41  countryC 
#> 
#> $res$model
#> 
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#> 
#> Coefficients:
#> (Intercept)        indic  
#>      0.1495      -0.1156  
#> 
#> 
#> $res$summary
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)    0.149    0.0134     11.1  0.000371
#> 2 indic         -0.116    0.0131     -8.80 0.000920
#> 
#> $res$beta1
#> [1] -0.1156032
#> 
#> $res$adj.r.squared
#> [1] 0.9386146
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

but note that this is not the common practice, which considers the first and last time instead.

In order to consider just two times, starting and ending times, the option all_within = FALSE must be specified

res <- beta_conv(tavDes = testTB, time_0 = 2002, time_t = 2004, 
                 all_within = FALSE, 
                 timeName = "time")
res
#> $res
#> $res$workTB
#> # A tibble: 3 × 3
#>   deltaIndic  indic countries
#>        <dbl>  <dbl> <chr>    
#> 1     0.144  -0.105 countryA 
#> 2     0.0333  1.06  countryB 
#> 3     0       1.41  countryC 
#> 
#> $res$model
#> 
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#> 
#> Coefficients:
#> (Intercept)        indic  
#>     0.13393     -0.09475  
#> 
#> 
#> $res$summary
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.134   0.000353      380. 0.00168
#> 2 indic        -0.0948  0.000345     -275. 0.00232
#> 
#> $res$beta1
#> [1] -0.09475194
#> 
#> $res$adj.r.squared
#> [1] 0.9999735
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Note that all_within = FALSE is the default.

2.2 Sigma-convergence

The key concept in sigma-convergence is variability with respect to the mean. Let \(Y_{m,i,t}\) be the value of indicator \(i\) for member state \(m\) at time \(t\), and \(\overline{Y}_{A,i,t}\) the average over aggregation \(A\), for example \(A = EU27_2020\), than:

the average is \(\overline{Y}_{A,i,t} = n(A)^{-1}\sum_{m \in A} Y_{m,i,t}\), where \(n(A)\) is the number of member states within aggregation \(A\);
the standard deviation is \(s_{A,i,t} = \sqrt(n(A)^{-1} \sum_{m\in A} (Y_{m,i,t} - \overline{Y}_{A,i,t})^2)\);
the coefficient of variation is \(CV(A,i,t) = 100\cdot \frac{s_{A,i,t}}{\overline{Y}_{A,i,t}}\).

For each year, the above summaries are calculated to quantify if a reduction in heterogeneity took place.

In this section we assume that all member states contributing to the unweighted mean are contained into the dataset, for example:

testTB <- tribble(
  ~time, ~countryA ,  ~countryB,  ~countryC,
    2000,     0.8,   2.7,    3.9,
    2001,     1.2,   3.2,    4.2,
    2002,     0.9,   2.9,    4.1,
    2003,     1.3,   2.9,    4.0,
    2004,     1.2,   3.1,    4.1,
    2005,     1.2,   3.0,    4.0
  )

sigma_conv(testTB,timeName="time")
#> $res
#> # A tibble: 6 × 5
#>    time stdDev    CV  mean devianceT
#>   <dbl>  <dbl> <dbl> <dbl>     <dbl>
#> 1  2000   1.28 0.517  2.47      4.89
#> 2  2001   1.25 0.435  2.87      4.67
#> 3  2002   1.32 0.501  2.63      5.23
#> 4  2003   1.11 0.406  2.73      3.69
#> 5  2004   1.20 0.430  2.8       4.34
#> 6  2005   1.16 0.424  2.73      4.03
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

It is possible to select a time window, as follows:

sigma_conv(testTB,timeName="time",time_0 = 2002,time_t = 2004)
#> $res
#> # A tibble: 3 × 5
#>    time stdDev    CV  mean devianceT
#>   <dbl>  <dbl> <dbl> <dbl>     <dbl>
#> 1  2002   1.32 0.501  2.63      5.23
#> 2  2003   1.11 0.406  2.73      3.69
#> 3  2004   1.20 0.430  2.8       4.34
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
sigma_conv(testTB,time_0 = 2002,time_t = 2004)
#> $res
#> # A tibble: 3 × 5
#>    time stdDev    CV  mean devianceT
#>   <dbl>  <dbl> <dbl> <dbl>     <dbl>
#> 1  2002   1.32 0.501  2.63      5.23
#> 2  2003   1.11 0.406  2.73      3.69
#> 3  2004   1.20 0.430  2.8       4.34
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

More interesting calculations deal with an Eurofound dataset emp_20_64_MS. Note that all and only countries in EU28 are included, those that contribute to the average:

data(emp_20_64_MS)
mySTB <- sigma_conv(emp_20_64_MS)
mySTB
#> $res
#> # A tibble: 17 × 5
#>     time stdDev     CV  mean devianceT
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.
#>  2  2003   5.95 0.0878  67.8      991.
#>  3  2004   5.70 0.0839  67.9      909.
#>  4  2005   5.54 0.0809  68.4      858.
#>  5  2006   5.57 0.0801  69.6      869.
#>  6  2007   5.47 0.0775  70.6      838.
#>  7  2008   5.36 0.0755  71.0      804.
#>  8  2009   5.03 0.0730  69.0      710.
#>  9  2010   5.24 0.0769  68.1      768.
#> 10  2011   5.59 0.0821  68.1      875.
#> 11  2012   5.98 0.0880  68       1002.
#> 12  2013   6.28 0.0922  68.0     1103.
#> 13  2014   5.98 0.0867  69.0     1000.
#> 14  2015   5.74 0.0820  70.0      922.
#> 15  2016   5.60 0.0789  71.0      879.
#> 16  2017   5.37 0.0741  72.5      808.
#> 17  2018   5.30 0.0717  73.8      786.
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

As a first step, the departure from the mean is characterized

res <- departure_mean(oriTB = emp_20_64_MS, sigmaTB = mySTB$res)
names(res$res)
#> [1] "departures"      "squaredContrib"  "devianceContrib"
res$res$departures
#> # A tibble: 17 × 33
#>     time stdDev     CV  mean devianceT    AT    BE    BG    CY    CZ    DE    DK
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.     0     0    -1     1     0     0     1
#>  2  2003   5.95 0.0878  67.8      991.     0     0    -1     1     0     0     1
#>  3  2004   5.70 0.0839  67.9      909.     0     0    -1     1     0     0     1
#>  4  2005   5.54 0.0809  68.4      858.     0     0    -1     1     0     0     1
#>  5  2006   5.57 0.0801  69.6      869.     0     0     0     1     0     0     1
#>  6  2007   5.47 0.0775  70.6      838.     0     0     0     1     0     0     1
#>  7  2008   5.36 0.0755  71.0      804.     0     0     0     1     0     0     1
#>  8  2009   5.03 0.0730  69.0      710.     0     0     0     1     0     1     1
#>  9  2010   5.24 0.0769  68.1      768.     1     0     0     1     0     1     1
#> 10  2011   5.59 0.0821  68.1      875.     1     0     0     0     0     1     1
#> 11  2012   5.98 0.0880  68       1002.     1     0     0     0     0     1     1
#> 12  2013   6.28 0.0922  68.0     1103.     1     0     0     0     0     1     0
#> 13  2014   5.98 0.0867  69.0     1000.     0     0     0     0     0     1     0
#> 14  2015   5.74 0.0820  70.0      922.     0     0     0     0     0     1     0
#> 15  2016   5.60 0.0789  71.0      879.     0     0     0     0     1     1     0
#> 16  2017   5.37 0.0741  72.5      808.     0     0     0     0     1     1     0
#> 17  2018   5.30 0.0717  73.8      786.     0     0     0     0     1     1     0
#> # ℹ 21 more variables: EE <dbl>, EL <dbl>, ES <dbl>, FI <dbl>, FR <dbl>,
#> #   HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>,
#> #   MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>,
#> #   SK <dbl>, UK <dbl>

where \(-1,0,1\) indicates values respectively below \(-1\), within the interval \((-1,1)\) and above \(+1\). Details on the contribution of each MS to the variance at a given time \(t\) is evaluate by the square of the difference \((Y_{m,i,t} - \overline{Y}_{EU27,i,t})^2\) between the indicator \(i\) of country \(m\) at time \(t\) and the unweighted average over member states, say EU27:

res$res$squaredContrib
#> # A tibble: 17 × 28
#>        AT     BE       BG       CY    CZ      DE    DK     EE    EL     ES    FI
#>     <dbl>  <dbl>    <dbl>    <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
#>  1 11.4    7.94  121.     57.5     17.5  1.64e+0 116.   0.612  22.3 18.6   32.3 
#>  2 12.5   10.7    82.3    58.2     10.4  3.95e-1  92.7  1.77   15.8 12.1   26.3 
#>  3  0.243  4.44   45.0    60.7      4.81 5.10e-5 104.   5.26   13.0  7.33  21.1 
#>  4  3.91   3.69   42.5    35.7      5.19 9.58e-1  91.7 12.8    16.2  0.849 21.0 
#>  5  4.16   9.37   19.9    38.9      2.69 2.37e+0  96.8 40.2    15.7  0.314 18.8 
#>  6  4.84   8.41    4.84   38.4      1.96 5.29e+0  70.6 39.7    23.0  0.810 17.6 
#>  7  7.98   8.85    0.0756 30.5      2.03 9.15e+0  59.7 37.5    21.9  6.13  23.3 
#>  8 19.3    3.62    0.0414 39.6      3.60 2.70e+1  50.4  0.993  11.6 25.0   20.2 
#>  9 33.5    0.264  11.7    47.4      5.22 4.74e+1  46.0  1.73   18.6 28.2   23.9 
#> 10 37.7    0.579  26.6    28.5      8.06 7.12e+1  45.4  6.45   71.6 36.7   32.9 
#> 11 41.0    0.640  25       4.84    12.2  7.92e+1  39.7 17.6   169   70.6   36   
#> 12 43.0    0.710  20.6     0.710   19.9  8.57e+1  39.2 27.6   229.  89.2   27.6 
#> 13 27.3    2.81   15.0     1.89    20.5  7.61e+1  32.8 28.4   246.  82.4   17.0 
#> 14 18.6    7.74    8.31    4.34    23.2  6.43e+1  29.4 42.5   227.  63.7    8.51
#> 15 14.4   11.0    11.0     5.34    32.4  5.76e+1  24.9 31.2   219.  50.6    5.71
#> 16  8.37  16.1     1.46    2.91    35.9  4.48e+1  16.8 38.4   216.  49.1    2.87
#> 17  5.52  17.2     2.10    0.00250 36.6  3.66e+1  13.3 31.9   206.  46.9    6.00
#> # ℹ 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

It is also possible to decompose the numerator of the variance, called deviance, at each time in order to appreciate the percentage of contribution provided by each member state to the total deviance, \[100 \cdot \frac{(Y_{m,i,t} - \overline{Y}_{EU27,i,t})^2}{ \sum_{m} (Y_{m,i,t} - \overline{Y}_{EU27,i,t})^2 }\] for the indicator \(i\) of country \(m\) at time \(t\).

##  sigma_conv(testTB,timeName="time",time_0 = 2002,time_t = 2004)
res$res$devianceContrib
#> # A tibble: 17 × 28
#>        AT     BE       BG       CY    CZ      DE    DK     EE    EL     ES    FI
#>     <dbl>  <dbl>    <dbl>    <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
#>  1 1.02   0.706  10.8     5.11     1.56  1.46e-1 10.3  0.0544  1.98 1.66   2.87 
#>  2 1.26   1.08    8.30    5.87     1.05  3.99e-2  9.35 0.178   1.59 1.22   2.65 
#>  3 0.0267 0.488   4.95    6.68     0.529 5.61e-6 11.4  0.578   1.43 0.806  2.32 
#>  4 0.456  0.430   4.95    4.16     0.605 1.12e-1 10.7  1.49    1.88 0.0989 2.44 
#>  5 0.479  1.08    2.29    4.48     0.309 2.73e-1 11.1  4.63    1.81 0.0362 2.17 
#>  6 0.578  1.00    0.578   4.59     0.234 6.31e-1  8.42 4.74    2.75 0.0967 2.11 
#>  7 0.993  1.10    0.00941 3.80     0.253 1.14e+0  7.43 4.67    2.72 0.762  2.90 
#>  8 2.72   0.511   0.00584 5.59     0.507 3.81e+0  7.10 0.140   1.63 3.53   2.85 
#>  9 4.36   0.0344  1.52    6.17     0.680 6.17e+0  6.00 0.225   2.42 3.68   3.11 
#> 10 4.31   0.0662  3.04    3.26     0.922 8.14e+0  5.19 0.737   8.18 4.20   3.77 
#> 11 4.09   0.0639  2.50    0.483    1.22  7.91e+0  3.96 1.76   16.9  7.04   3.59 
#> 12 3.90   0.0644  1.87    0.0644   1.80  7.77e+0  3.55 2.51   20.8  8.08   2.51 
#> 13 2.73   0.280   1.50    0.189    2.05  7.61e+0  3.28 2.83   24.6  8.23   1.70 
#> 14 2.02   0.839   0.901   0.470    2.52  6.97e+0  3.18 4.61   24.7  6.91   0.923
#> 15 1.63   1.25    1.25    0.608    3.68  6.55e+0  2.83 3.56   25.0  5.75   0.650
#> 16 1.04   1.99    0.180   0.360    4.44  5.54e+0  2.07 4.74   26.8  6.07   0.354
#> 17 0.703  2.19    0.268   0.000318 4.66  4.66e+0  1.70 4.06   26.2  5.97   0.764
#> # ℹ 17 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>

thus each row adds to \(100\).

It is possible to produce a graphical output about the main features of country time series, as shown below:

myGG <- graph_departure(res$res$departures,
                timeName = "time",
                displace = 0.25,
                displaceh = 0.45,
                dimeFontNum = 4,
                myfont_scale = 1.35,
                x_angle = 45,
                color_rect = c("-1"='red1', "0"='gray80',"1"='lightskyblue1'),
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.9
                )
myGG
#> $res

#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Any selection of countries is feasible:

#myWW1<- warnings()
myGG <- graph_departure(res$res$departures[1:10],
                timeName = "time",
                displace = 0.25,
                displaceh = 0.45,
                dimeFontNum = 4,
                myfont_scale = 1.35,
                x_angle = 45,
                color_rect = c("-1"='red1', "0"='gray80',"1"='lightskyblue1'),
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.29
                )

myGG
#> $res

#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

2.3 Gamma-convergence

We now introduce gamma convergence by an index based on ranks.

Let \(y_{m,i,t}\) be the value of indicator \(i\) for member state \(m\) at time \(t=0,1,\ldots, T\), and \(\{ \tilde{y}_{m,i,t}: m \in A )\) the ranks for indicator \(i\) over member states in the reference set \(A\), for example \(A = EU27\), at a given time \(t\). The sum of ranks within member state \(m\) is: \[ \tilde{y}^{(s)}_{m,i} = \sum_{t=0}^T \tilde{y}_{m,i,t} \] thus the variance of the sum of ranks over the given interval \[ Var\left[ \{\tilde{y}^{(s)}_{m,i}: m \in A \} \right] \] may be compared to the variance of ranks in the reference time \(t=0\): \[ Var\left[ \{\tilde{y}_{m,i,0}: m \in A \} \right] \]

The Kendall index KI, with respect to aggregation \(A\) of member states for the indicator \(i\) over a given time interval is: \[ KI(A,i,T) = \frac{Var\left[ \{\tilde{y}^{(s)}_{m,i}: m \in A \} \right] }{ (T+1)^2 ~~Var\left[\{\tilde{y}_{m,i,0}: m \in A \}\right] } \]

The measure of gamma-convergence is obtained with the following function:

gamma_conv(emp_20_64_MS,2002,2016)
#> $res
#> [1] 0.7374964
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Note the starting time is zero, the reference, but first a copy of the dataset is performed.

(timeCounTB <- testTB)
#> # A tibble: 6 × 4
#>    time countryA countryB countryC
#>   <dbl>    <dbl>    <dbl>    <dbl>
#> 1  2000      0.8      2.7      3.9
#> 2  2001      1.2      3.2      4.2
#> 3  2002      0.9      2.9      4.1
#> 4  2003      1.3      2.9      4  
#> 5  2004      1.2      3.1      4.1
#> 6  2005      1.2      3        4

Now we move to ranks within time using rank():

tmp <- c( 3, 6, 9, 1, 12)
rank(tmp)
#> [1] 2 3 4 1 5

therefore with the above data:

# debug(gamma_conv)
(gamma_conv(timeCounTB,ref=2000,last=2005,timeName = "time"))
#> $res
#> [1] 0.7346939
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2004,timeName = "time"))
#> $res
#> [1] 0.6944444
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2003,timeName = "time"))
#> $res
#> [1] 0.64
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2002,timeName = "time"))
#> $res
#> [1] 0.5625
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2000,last=2001,timeName = "time"))
#> $res
#> [1] 0.4444444
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

and changing reference year:

(gamma_conv(timeCounTB,ref=2001,last=2005,timeName = "time"))
#> $res
#> [1] 0.7346939
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL
(gamma_conv(timeCounTB,ref=2002,last=2004,timeName = "time"))
#> $res
#> [1] 0.6944444
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Now we exchange values and calculate gamma-convergence:

timeCounTB2 <- timeCounTB
timeCounTB2[2,2:4] <-  timeCounTB[2,4:2]
timeCounTB2[4,2:4] <-  timeCounTB[4,c(4,2,3)]
timeCounTB2
#> # A tibble: 6 × 4
#>    time countryA countryB countryC
#>   <dbl>    <dbl>    <dbl>    <dbl>
#> 1  2000      0.8      2.7      3.9
#> 2  2001      4.2      3.2      1.2
#> 3  2002      0.9      2.9      4.1
#> 4  2003      4        1.3      2.9
#> 5  2004      1.2      3.1      4.1
#> 6  2005      1.2      3        4

gamma_conv(timeCounTB2,last=2005,ref=2000, timeName = "time",printRanks = T)
#> Ranks:
#>      countryA countryB countryC
#> [1,]        1        2        3
#> [2,]        3        2        1
#> [3,]        1        2        3
#> [4,]        3        1        2
#> [5,]        1        2        3
#> [6,]        1        2        3
#> $res
#> [1] 0.1428571
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

and after random permutation:

timeCounTB3 <- cbind(timeCounTB[1],t(apply(timeCounTB,1,
                                        function(vet)vet[sample(2:4,3)])))


timeCounTB3
#>   time   1   2   3
#> 1 2000 0.8 2.7 3.9
#> 2 2001 1.2 3.2 4.2
#> 3 2002 4.1 2.9 0.9
#> 4 2003 1.3 4.0 2.9
#> 5 2004 4.1 3.1 1.2
#> 6 2005 1.2 4.0 3.0
(gamma_conv(timeCounTB3,last=2005,ref=2000, timeName = "time",printRanks = T))
#> Ranks:
#>      1 2 3
#> [1,] 1 2 3
#> [2,] 1 2 3
#> [3,] 3 2 1
#> [4,] 1 3 2
#> [5,] 3 2 1
#> [6,] 1 3 2
#> $res
#> [1] 0.08163265
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

2.4 Delta-convergence

Delta-convergence can be calculated as follows:

delta_conv(timeCounTB)
#> $res
#> # A tibble: 6 × 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2000   4.3
#> 2  2001   4  
#> 3  2002   4.4
#> 4  2003   3.8
#> 5  2004   3.9
#> 6  2005   3.8
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

2.5 Absolute change

Absolute change as described in the reserved Eurofound Annex is defined as: \[ \Delta y_{m,i,t} = y_{m,i,t} - y_{m,i,t-1} \] for country \(m\), indicator \(i\) at time \(t\).

The R function abso_change calculates the above quantity, for example in the emp_20_64_MS dataset

data(emp_20_64_MS)
mySTB <- abso_change(emp_20_64_MS, 
                        time_0 = 2005, 
                        time_t = 2010,
                        all_within=TRUE,
                        timeName = "time")
names(mySTB$res)
#> [1] "abso_change"        "sum_abs_change"     "average_abs_change"

thus the above equation results in:

mySTB$res$abso_change
#>   time   AT   BE  BG   CY   CZ   DE   DK  EE  EL  ES   FI   FR  HR   HU   IE
#> 1 2003  0.4 -0.2 2.2  0.3 -0.7 -0.4 -0.9 0.8 1.0 1.1 -0.3  1.1 0.5  1.0 -0.4
#> 2 2004 -2.9  1.3 2.5  0.3 -0.9 -0.5  0.7 1.1 0.5 0.9 -0.4 -0.5 1.3 -0.4  0.6
#> 3 2005  2.0  0.7 0.7 -1.3  0.6  1.5 -0.1 1.8 0.1 2.3  0.5  0.2 0.3  0.2  1.6
#> 4 2006  1.2  0.0 3.2  1.4  0.5  1.7  1.4 3.9 1.2 1.5  0.9  0.0 0.6  0.4  0.8
#> 5 2007  1.2  1.2 3.3  1.0  0.8  1.8 -0.4 1.0 0.2 0.7  0.9  0.5 3.3 -0.3  1.7
#>     IT   LT   LU  LV   MT   NL   PL   PT   RO   SE   SI   SK  UK
#> 1  0.9  2.7 -1.2 1.1 -0.4 -0.5 -0.4 -1.1  0.5 -0.3 -1.9  1.8 0.4
#> 2  1.6 -1.1  0.5 0.0 -0.5 -0.4 -0.3 -0.4 -0.1 -0.7  2.9 -1.5 0.2
#> 3 -0.2  1.1  1.3 1.7  0.1 -2.2  1.3 -0.4 -1.1  0.3  0.1  1.0 0.3
#> 4  0.9  0.6  0.1 4.1  0.5  1.0  1.8  0.4  1.2  0.7  0.4  1.5 0.0
#> 5  0.3  1.4  0.5 2.0  0.7  1.8  2.6 -0.1 -0.4  1.3  0.9  1.2 0.0

The sum of absolute values \[ \sum_{t=t_0+1}^{} | \Delta y_{m,i,t}| \] is:

round(mySTB$res$sum_abs_change,4)
#>   AT   BE   BG   CY   CZ   DE   DK   EE   EL   ES   FI   FR   HR   HU   IE   IT 
#>  7.7  3.4 11.9  4.3  3.5  5.9  3.5  8.6  3.0  6.5  3.0  2.3  6.0  2.3  5.1  3.9 
#>   LT   LU   LV   MT   NL   PL   PT   RO   SE   SI   SK   UK 
#>  6.9  3.6  8.9  2.2  5.9  6.4  2.4  3.3  3.3  6.2  7.0  0.9

and such sum can be divided by the number of pair of years so that the result is an average per pair of years:

round(mySTB$res$average_abs_change,4)
#>   AT   BE   BG   CY   CZ   DE   DK   EE   EL   ES   FI   FR   HR   HU   IE   IT 
#> 1.54 0.68 2.38 0.86 0.70 1.18 0.70 1.72 0.60 1.30 0.60 0.46 1.20 0.46 1.02 0.78 
#>   LT   LU   LV   MT   NL   PL   PT   RO   SE   SI   SK   UK 
#> 1.38 0.72 1.78 0.44 1.18 1.28 0.48 0.66 0.66 1.24 1.40 0.18

2.6 Convergence measures on Eurofound lifesatisf indicator

Here we assume that larger the index, better the performance.

Let’s load the Eurofound indicator lifesatisf:

workDF <- extract_indicator_EUF(
  indicator_code ="lifesatisf", #Code_in_database
  fromTime=2000,
  toTime =2018,
  gender= c("Total","Females","Males")[1],
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS)
workDF
#> $res
#> # A tibble: 4 × 29
#>    time sex      AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI
#>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003 Total  7.85  7.52  4.47  7.22  6.57  7.36  8.47  5.94  6.69  7.49  8.14
#> 2  2007 Total  6.95  7.54  5.01  7.05  6.59  7.16  8.48  6.72  6.58  7.25  8.23
#> 3  2011 Total  7.66  7.38  5.55  7.16  6.43  7.20  8.37  6.28  6.16  7.47  8.08
#> 4  2016 Total  7.92  7.31  5.62  6.54  6.48  7.31  8.19  6.73  5.26  6.95  8.07
#> # ℹ 16 more variables: FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>,
#> #   LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>,
#> #   RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

wDF <- workDF$res

then we ask if it is complete or some missing values are present:

check_data(select(wDF,-sex),timeName="time")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: one or more missing values in the dataframe."

thus at least one missing value is present. In the next step, imputation of missing values is performed:

wDFI <- impute_dataset(select(wDF,-sex),
               countries= names(select(wDF,-sex,-time)),
               timeName = "time",
               tailMiss = c("cut", "constant")[2],
               headMiss = c("cut", "constant")[1])

and some checking is done:

check_data(wDFI$res,timeName="time")
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

which returns TRUE.

First, we calculate the EU unweighted average of emp:

wwTB <- (wDFI$res %>%
   average_clust(timeName="time",cluster="EU27"))$res

wwTB$EU27
#> [1] 6.829626 6.984789 7.014334 6.978321

Time series can be plotted:

mini_EU <- min(wwTB$EU27)
maxi_EU <- max(wwTB$EU27)

qplot(time, EU27, data=wwTB,
      ylim=c(mini_EU,maxi_EU))+geom_line(colour="navy blue")+
      ylab("lifesatisf")
#> Warning: `qplot()` was deprecated in ggplot2 3.4.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

2.6.1 Beta convergence

Now the beta-convergence is calculated for just two years:

betaRes <- beta_conv(wDFI$res,time_0=2007, time_t=2011, all_within=FALSE)
betaRes 
#> $res
#> $res$workTB
#> # A tibble: 27 × 3
#>    deltaIndic indic countries
#>         <dbl> <dbl> <chr>    
#>  1    0.0244   1.94 AT       
#>  2   -0.00553  2.02 BE       
#>  3    0.0256   1.61 BG       
#>  4    0.00403  1.95 CY       
#>  5   -0.00623  1.89 CZ       
#>  6    0.00138  1.97 DE       
#>  7   -0.00316  2.14 DK       
#>  8   -0.0171   1.91 EE       
#>  9   -0.0165   1.88 EL       
#> 10    0.00719  1.98 ES       
#> # ℹ 17 more rows
#> 
#> $res$model
#> 
#> Call:
#> stats::lm(formula = deltaIndic ~ indic, data = workTB)
#> 
#> Coefficients:
#> (Intercept)        indic  
#>     0.11155     -0.05679  
#> 
#> 
#> $res$summary
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   0.112     0.0314      3.56 0.00153
#> 2 indic        -0.0568    0.0162     -3.51 0.00171
#> 
#> $res$beta1
#> [1] -0.05678881
#> 
#> $res$adj.r.squared
#> [1] 0.3037316
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

A plot of transformed data and the straight line may be useful:

mybetaplot<-beta_conv_graph(betaRes,
                            indiName = 'Mean Life Satisfaction',
                            time_0 = 2007,
                            time_t = 2011)
mybetaplot

Note that label are replicated as many times as the number of included subsequent years.

2.6.2 Sigma convergence

Here we go with calculating the sigma-convergence:

mysigmares<-sigma_conv(wwTB)
#mysigmares

It is also possible to obtain a graphical representation of the standard deviation and the coefficient of variation obtained for the Sigma convergence by invoking the sigma_conv_graph function as follows:

mysigmaplot<-sigma_conv_graph(sigmaconvOut=mysigmares, 
         time_0 = 2007, 
         time_t = 2011,
        aggregation='EU27_2020')
mysigmaplot

2.6.3 Gamma convergence

Let’s reload Eurofound data:

workDF <- extract_indicator_EUF(
  indicator_code ="lifesatisf", #Code_in_database
  fromTime=2000,
  toTime =2018,
  gender= c("Total","Females","Males")[1],
  countries =  convergEU_glb()$EU27_2020$memberStates$codeMS)
wDFI <- impute_dataset(select(workDF$res,-sex),
               countries= names(select(wDF,-sex,-time)),
               timeName = "time",
               tailMiss = c("cut", "constant")[2],
               headMiss = c("cut", "constant")[1])

check_data(wDFI$res,timeName="time")
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

Now gamma-convergence is computed:

gamma_conv(wDFI$res,ref=2003,last=2016,timeName = "time")
#> $res
#> [1] 0.5879853
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

or equivalently:

tmpRes <- gamma_conv(wDFI$res,ref=2007,last=2011,timeName = "time")

Indeed there is the possibility of performing calculation for each pair of subsequent years in the dataset, that is, each year is the reference of the subsequent year:

wDFI$res
#> # A tibble: 4 × 28
#>    time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2003  7.85  7.52  4.47  7.22  6.57  7.36  8.47  5.94  6.69  7.49  8.14  6.96
#> 2  2007  6.95  7.54  5.01  7.05  6.59  7.16  8.48  6.72  6.58  7.25  8.23  7.32
#> 3  2011  7.66  7.38  5.55  7.16  6.43  7.20  8.37  6.28  6.16  7.47  8.08  7.23
#> 4  2016  7.92  7.31  5.62  6.54  6.48  7.31  8.19  6.73  5.26  6.95  8.07  7.17
#> # ℹ 15 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>

gamma_conv_msteps(wDFI$res,
                  startTime=2003, 
                  endTime=2016,
                  timeName = "time")
#> $res
#> # A tibble: 4 × 2
#>    time gammaConv
#>   <dbl>     <dbl>
#> 1  2003    NA    
#> 2  2007     0.418
#> 3  2011     0.526
#> 4  2016     0.588
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

2.6.4 Delta convergence

Let \(y_{m,i,t}\) be the value of indicator \(i\) for member state \(m\) at time \(t\), and \(y^{(M)}_{i,t}\) the maximum value over member states in the reference set \(A\), for example \(A = EU27\): \[ y^{(M)}_{i,t} = max(\{ y_{m,i,t}: m \in A\}) \]

The distance of a member state \(m\) from the top performer at time \(i\) is: \[ y^{(M)}_{i,t} - y_{m,i,t} \] thus the overall distance at time \(t\), called delta, is the sum of distances over the reference set \(A\) of MS: \[ \delta_{i,t} = \sum_{m \in A} (y^{(M)}_{i,t} - y_{m,i,t}) \] for the considered indicator \(i\).

The measure of delta-convergence is obtained as follows:

delta_conv(wwTB)
#> $res
#> # A tibble: 4 × 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  46.0
#> 2  2007  41.8
#> 3  2011  38.0
#> 4  2016  34.0
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

It must be noted that the delta_conv function allows to obtain also the declaration of convergence. To this end, the argument extended should be specified as TRUE. For example, for the wwTB indicator the syntax is as follows:

delta_conv(wwTB,"time", extended=TRUE)
#> $res
#> $res$delta_conv
#> # A tibble: 4 × 2
#>    time delta
#>   <dbl> <dbl>
#> 1  2003  46.0
#> 2  2007  41.8
#> 3  2011  38.0
#> 4  2016  34.0
#> 
#> $res$differences
#>             AT        BE       BG       CY       CZ       DE DK       EE
#> [1,] 0.6245208 0.9556756 4.003694 1.250645 1.904858 1.113082  0 2.534678
#> [2,] 1.5314507 0.9352727 3.469857 1.430196 1.883980 1.314598  0 1.756762
#> [3,] 0.7128654 0.9938769 2.823287 1.209136 1.939715 1.168355  0 2.095294
#> [4,] 0.2691503 0.8841901 2.568287 1.657201 1.713479 0.885087  0 1.464275
#>            EL        ES        FI       FR       HR       HU        IE       IT
#> [1,] 1.779027 0.9809632 0.3344421 1.508648 2.035054 2.534398 0.7697177 1.255543
#> [2,] 1.896363 1.2247257 0.2450380 1.154050 2.041460 2.884686 0.8919230 1.889798
#> [3,] 2.211353 0.9066143 0.2946043 1.145797 1.591188 2.598653 0.9794941 1.488251
#> [4,] 2.930435 1.2417102 0.1190214 1.025349 1.860298 1.677889 0.5078416 1.635120
#>            LT        LU       LV        MT        NL        PL       PT
#> [1,] 3.028715 0.7834425 2.897604 1.1496062 0.9260612 2.3119373 2.481104
#> [2,] 2.156122 0.5780816 2.438259 0.9191999 0.6090589 1.5864158 2.288289
#> [3,] 1.671255 0.5828137 2.129854 1.1383724 0.6792879 1.3019018 1.604687
#> [4,] 1.709892 0.2899728 1.869513 0.6240749 0.4548769 0.9941764 1.321342
#>            RO        SE       SI       SK     EU27
#> [1,] 2.348566 0.5827713 1.429040 2.815222 1.642186
#> [2,] 2.003722 0.1419497 1.250872 1.800439 1.493428
#> [3,] 1.638209 0.3387566 1.419971 1.986291 1.357403
#> [4,] 1.692563 0.2608399 1.338965 1.789471 1.214260
#> 
#> $res$difference_last_first
#>          AT          BE          BG          CY          CZ          DE 
#> -0.35537052 -0.07148552 -1.43540668  0.40655661 -0.19137859 -0.22799492 
#>          DK          EE          EL          ES          FI          FR 
#>  0.00000000 -1.07040215  1.15140772  0.26074696 -0.21542072 -0.48329926 
#>          HR          HU          IE          IT          LT          LU 
#> -0.17475605 -0.85650873 -0.26187611  0.37957668 -1.31882334 -0.49346972 
#>          LV          MT          NL          PL          PT          RO 
#> -1.02809095 -0.52553129 -0.47118425 -1.31776094 -1.15976238 -0.65600348 
#>          SE          SI          SK        EU27 
#> -0.32193136 -0.09007549 -1.02575064 -0.42792575 
#> 
#> $res$strict_conv_ini_last
#> [1] FALSE
#> 
#> $res$label_strict
#> [1] " "
#> 
#> $res$converg_ini_last
#> [1] TRUE
#> 
#> $res$label_conver
#> [1] "convergence"
#> 
#> $res$diffe_delta
#> [1] -11.98192
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

It is also useful to evaluate how much a collection of MS deviates from the EU mean for a given indicator and a period of time. In order to obtain this further information the demea_change function has been implemented in the convergEU package:

res1<-demea_change(wwTB,
                   timeName="time",
                   time_0 = 2003,
                   time_t = 2016,
                   sele_countries= NA,
                   doplot=TRUE)
res1
#> $res
#> $res$resDiffe
#> # A tibble: 4 × 29
#>    time      AT    BE    BG      CY     CZ    DE    DK     EE     EL      ES
#>   <dbl>   <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
#> 1  2003  1.02   0.687 -2.36  0.392  -0.263 0.529  1.64 -0.892 -0.137  0.661 
#> 2  2007 -0.0380 0.558 -1.98  0.0632 -0.391 0.179  1.49 -0.263 -0.403  0.269 
#> 3  2011  0.645  0.364 -1.47  0.148  -0.582 0.189  1.36 -0.738 -0.854  0.451 
#> 4  2016  0.945  0.330 -1.35 -0.443  -0.499 0.329  1.21 -0.250 -1.72  -0.0275
#> # ℹ 18 more variables: FI <dbl>, FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>,
#> #   IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>,
#> #   PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, EU27 <dbl>
#> 
#> $res$diffe_abs_diff
#> # A tibble: 3 × 29
#>    time     AT      BE     BG      CY      CZ      DE     DK     EE    EL     ES
#>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl> <dbl>  <dbl>
#> 1  2007 -0.980 -0.128  -0.385 -0.328   0.128  -0.350  -0.149 -0.629 0.266 -0.393
#> 2  2011  0.607 -0.195  -0.511  0.0850  0.192   0.0102 -0.136  0.475 0.451  0.182
#> 3  2016  0.301 -0.0335 -0.112  0.295  -0.0831  0.140  -0.143 -0.488 0.862 -0.423
#> # ℹ 18 more variables: FI <dbl>, FR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>,
#> #   IT <dbl>, LT <dbl>, LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>,
#> #   PT <dbl>, RO <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, EU27 <dbl>
#> 
#> $res$stats
#> # A tibble: 28 × 4
#>    MS    negaSum posiSum  posi
#>    <chr>   <dbl>   <dbl> <int>
#>  1 AT    -0.980    0.907     1
#>  2 BE    -0.356    0         2
#>  3 BG    -1.01     0         3
#>  4 CY    -0.328    0.380     4
#>  5 CZ    -0.0831   0.320     5
#>  6 DE    -0.350    0.150     6
#>  7 DK    -0.428    0         7
#>  8 EE    -1.12     0.475     8
#>  9 EL     0        1.58      9
#> 10 ES    -0.816    0.182    10
#> # ℹ 18 more rows
#> 
#> $res$miniX
#> [1] -1.117033
#> 
#> $res$maxiX
#> [1] 1.579333
#> 
#> $res$res_graph

#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

To plot the calculated differences, the user should invoke the plot function as follows:

plot(res1$res$res_graph)

3 Support functions

There are several auxiliary functions that help to prepare the tidy dataset time by member states (MS, that is countries in EU), which is needed in almost all computations. Here the most important resources are described.

3.1 Summaries and clusters of countries

An important summary is obtained
as unweighted average of country values. The cluster of considered countries may be specified and is also stored within the function generating global static objects and tables, called convergEU_glb(). The illustration of this function exploits the emp_20_64_MS dataframe in convergEU package.

First note that the EU area is made by the following MS:

convergEU_glb()$Eurozone
#> NULL

while labels representing the 28 MS are:

convergEU_glb()$EU27_2020
#> $dates
#> [1] "01/02/2020" "00/00/0000"
#> 
#> $memberStates
#> # A tibble: 27 × 2
#>    MS          codeMS
#>    <chr>       <chr> 
#>  1 Belgium     BE    
#>  2 Denmark     DK    
#>  3 France      FR    
#>  4 Germany     DE    
#>  5 Greece      EL    
#>  6 Ireland     IE    
#>  7 Italy       IT    
#>  8 Luxembourg  LU    
#>  9 Netherlands NL    
#> 10 Portugal    PT    
#> # ℹ 17 more rows

The list of known MS labels is shown in the appendix.

For example, the unweighted average in the emp_20_64_MS dataset is:

testTB <- emp_20_64_MS
average_clust(testTB,timeName = "time",cluster = "EU27")$res[,c(1,30)]
#> # A tibble: 17 × 2
#>     time  EU27
#>    <dbl> <dbl>
#>  1  2002  67.3
#>  2  2003  67.5
#>  3  2004  67.6
#>  4  2005  68.2
#>  5  2006  69.4
#>  6  2007  70.4
#>  7  2008  70.8
#>  8  2009  68.8
#>  9  2010  67.9
#> 10  2011  67.9
#> 11  2012  67.8
#> 12  2013  67.8
#> 13  2014  68.7
#> 14  2015  69.7
#> 15  2016  70.8
#> 16  2017  72.3
#> 17  2018  73.7

while for EU12 is:

average_clust(testTB,timeName = "time",cluster = "EU12")$res[,c(1,30)]
#> # A tibble: 17 × 2
#>     time  EU12
#>    <dbl> <dbl>
#>  1  2002  69.1
#>  2  2003  69.1
#>  3  2004  69.4
#>  4  2005  69.9
#>  5  2006  70.6
#>  6  2007  71.3
#>  7  2008  71.4
#>  8  2009  69.9
#>  9  2010  69.2
#> 10  2011  68.6
#> 11  2012  68.0
#> 12  2013  67.8
#> 13  2014  68.4
#> 14  2015  69.2
#> 15  2016  70.1
#> 16  2017  71.2
#> 17  2018  72.3

An unknown label, like “EUspirit”, causes computation error:

average_clust(testTB,timeName = "TTime",cluster = "EUspirit")
#> $res
#> NULL
#> 
#> $msg
#> NULL
#> 
#> $err
#> [1] "Error: Time variable not in the dataframe."

3.2 Imputing missing values using a straight line

The basic imputation method is deterministic, like the average of interval endpoints, but it assumes that a linear change of an indicator happened between the two observed time points flanking a chunk of missing values.

intervalTime <-  c(1999,2000,2001) 
intervalMeasure <- c( 66.5, NA,87.2) 
currentData <- tibble(time= intervalTime, veval= intervalMeasure) 
currentData 
#> # A tibble: 3 × 2
#>    time veval
#>   <dbl> <dbl>
#> 1  1999  66.5
#> 2  2000  NA  
#> 3  2001  87.2
resImputed <- impute_dataset(currentData,
                           countries = "veval",
                           timeName = "time",
                           tailMiss = c("cut", "constant")[2],
                           headMiss = c("cut", "constant")[2]) 
resImputed  
#> $res
#> # A tibble: 3 × 2
#>    time veval
#>   <dbl> <dbl>
#> 1  1999  66.5
#> 2  2000  76.8
#> 3  2001  87.2
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

If several missing values are present in a row

intervalTime <-  c(1999,2000,2001,2002,2003) 
intervalMeasure <- c( 66.5, NA,NA,NA,87.2) 
currentData <- tibble(time= intervalTime, veval= intervalMeasure) 
currentData
#> # A tibble: 5 × 2
#>    time veval
#>   <dbl> <dbl>
#> 1  1999  66.5
#> 2  2000  NA  
#> 3  2001  NA  
#> 4  2002  NA  
#> 5  2003  87.2
resImputed <- impute_dataset(currentData,
                           countries = "veval",
                           timeName = "time",
                           tailMiss = c("cut", "constant")[2],
                           headMiss = c("cut", "constant")[2]) 
tmp <-  as.data.frame(currentData[ c(1,5),] )
tmp2 <- as.data.frame(resImputed$res[2:4,] )

resImputed  
#> $res
#> # A tibble: 5 × 2
#>    time veval
#>   <dbl> <dbl>
#> 1  1999  66.5
#> 2  2000  71.7
#> 3  2001  76.9
#> 4  2002  82.0
#> 5  2003  87.2
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

3.3 Weighted average smoothing of a complete dataset

It may be of interest to assume that part of the variability observed in a country on a given index is not structural, i.e. not due to causal determinants by to transient fluctuations. Furthermore, the interest here is not directed towards prediction but on smoothing values observed in the whole considered time interval.

In such a case a smoothing procedure remove sudden large changes showing a less variable time serie than the original.

Given that here short time series (panel data) are considered, a three points weighted average is proposed. The smoother substitutes an original raw value \(y_{m,i,t}\) of country \(m\) indicator \(i\) at time \(t\) with the weighted average \[\check{y}_{m,i,t} = y_{m,i,t-1} ~ (1-w)/2 +w ~y_{m,i,t} +y_{m,i,t+1} ~(1-w)/2\] where \(0< w \leq 1\). The special case \(w=1\) corresponds to no smoothing. In case of missing values an NA is returned. If the weight is outside the interval \((0,1]\) then a NA is returned. The first and last values are smoothed using weights \(w\) and \(1-w\).

After loading data, imputation takes place and finally smoothing is performed. Now, countries IT and DE are considered to illustrate the procedure. First check if missing values are present:

workTB <- dplyr::select(emp_20_64_MS, time, IT,DE)
check_data(workTB)
#> $res
#> [1] TRUE
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

thus checking is passed, so we go with the smoothing step after deleting the time variable:

resSM <- smoo_dataset(select(workTB,-time), leadW = 0.149, timeTB= select(workTB,time))
resSM
#> # A tibble: 17 × 3
#>     time    IT    DE
#>    <dbl> <dbl> <dbl>
#>  1  2002  60.0  68.5
#>  2  2003  60.4  68.4
#>  3  2004  60.9  68.8
#>  4  2005  62.0  69.5
#>  5  2006  62.1  71.1
#>  6  2007  62.7  72.6
#>  7  2008  62.3  73.6
#>  8  2009  61.9  74.5
#>  9  2010  61.3  75.3
#> 10  2011  61.0  76.0
#> 11  2012  60.4  76.9
#> 12  2013  60.3  77.3
#> 13  2014  60.1  77.7
#> 14  2015  60.7  78.1
#> 15  2016  61.4  78.6
#> 16  2017  62.3  79.2
#> 17  2018  62.4  79.3

and for a comparison:

tmpSM <- dplyr::rename(dplyr::select(resSM,-time),IT1=IT,DE1=DE)
compaTB <- dplyr::select(bind_cols(workTB, tmpSM), time,IT,IT1,DE,DE1)
compaTB
#> # A tibble: 17 × 5
#>     time    IT   IT1    DE   DE1
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002  59.2  60.0  68.8  68.5
#>  2  2003  60.1  60.4  68.4  68.4
#>  3  2004  61.7  60.9  67.9  68.8
#>  4  2005  61.5  62.0  69.4  69.5
#>  5  2006  62.4  62.1  71.1  71.1
#>  6  2007  62.7  62.7  72.9  72.6
#>  7  2008  62.9  62.3  74    73.6
#>  8  2009  61.6  61.9  74.2  74.5
#>  9  2010  61    61.3  75    75.3
#> 10  2011  61    61.0  76.5  76.0
#> 11  2012  60.9  60.4  76.9  76.9
#> 12  2013  59.7  60.3  77.3  77.3
#> 13  2014  59.9  60.1  77.7  77.7
#> 14  2015  60.5  60.7  78    78.1
#> 15  2016  61.6  61.4  78.6  78.6
#> 16  2017  62.3  62.3  79.2  79.2
#> 17  2018  63    62.4  79.9  79.3

A graphical output shows changes for “IT”, with original index in blue and smoothed index in red:

qplot(time,IT,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)

Similarly for Germany, i.e. “DE”:

qplot(time,DE,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=DE1),colour="red") +
  geom_point(aes(x=time,y=DE1),colour="red",shape=8)

A weight equal to 1 leaves data unchanged:

resSM <- smoo_dataset(dplyr::select(workTB,-time), leadW = 1,
                      timeTB= dplyr::select(workTB,time))
resSM <- dplyr::rename(resSM,IT1=IT, DE1=DE)
compaTB <- dplyr::select(dplyr::bind_cols(workTB, 
                     dplyr::select(resSM,-time)), time,IT,IT1,DE,DE1)
qplot(time,IT,data=compaTB) + 
  geom_line(colour="navyblue") +
  geom_line(aes(x=time,y=IT1),colour="red") +
  geom_point(aes(x=time,y=IT1),colour="red",shape=8)

A time window larger than \(3\) could be considered, but deep thoughts are recommended on how much economic and social changes may happen in \(5\) consecutive years.

3.4 Moving Average smoother

Several alternative smoothing algorithm are available in R. Classical ma smoothers are also available from the caTools package.

The emp_20_64_MS dataset is now chosen for example, first with Italy and then with Germany as member states of interest.

data(emp_20_64_MS)
cuTB <- dplyr::tibble(ITori =emp_20_64_MS$IT)
cuTB <- dplyr::mutate(cuTB,time =emp_20_64_MS$time)

At the beginning and end of this series values are averages on smaller and smaller number of observations on the tails:


cuTB <-  dplyr:: mutate(cuTB, IT_k_3= caTools::runmean(emp_20_64_MS$IT, k=3, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

cuTB <-  dplyr:: mutate(cuTB, IT_k_5= caTools::runmean(emp_20_64_MS$IT, k=5, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

cuTB <-  dplyr:: mutate(cuTB, IT_k_7= caTools::runmean(emp_20_64_MS$IT, k=7, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

myG <- ggplot(cuTB,aes(x=time,y=ITori))+geom_line()+geom_point()+
       geom_line(aes(x=time,y=IT_k_3),colour="red")+
       geom_point(aes(x=time,y=IT_k_3),colour="red")+
       #
       geom_line(aes(x=time,y=IT_k_5),colour="blue")+
       geom_point(aes(x=time,y=IT_k_5),colour="blue")+
       #
       geom_line(aes(x=time,y=IT_k_7),colour="orange")+
       geom_point(aes(x=time,y=IT_k_7),colour="orange")+
       theme(legend.position = c(.5, .5),
              legend.title = element_text(face = "bold"))

myG

For Germany, a similar implementation provides the following result:

cuTB <- dplyr::mutate(cuTB, DEori =emp_20_64_MS$DE)

cuTB <-  dplyr:: mutate(cuTB, DE_k_3= runmean(emp_20_64_MS$DE, k=3, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

cuTB <-  dplyr:: mutate(cuTB, DE_k_5= runmean(emp_20_64_MS$DE, k=5, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

cuTB <-  dplyr:: mutate(cuTB, DE_k_7= runmean(emp_20_64_MS$DE, k=7, 
        alg=c("C", "R", "fast", "exact")[4],
        endrule=c("mean", "NA", "trim", "keep", "constant", "func")[4],
        align = c("center", "left", "right")[1]))

myG <- ggplot(cuTB,aes(x=time,y=DEori))+geom_line()+geom_point()+
       geom_line(aes(x=time,y=DE_k_3),colour="red")+
       geom_point(aes(x=time,y=DE_k_3),colour="red")+
       #
       geom_line(aes(x=time,y=DE_k_5),colour="blue")+
       geom_point(aes(x=time,y=DE_k_5),colour="blue")+
       #
       geom_line(aes(x=time,y=DE_k_7),colour="orange")+
       geom_point(aes(x=time,y=DE_k_7),colour="orange")+
       theme(legend.position = c(.5, .5),
              legend.title = element_text(face = "bold"))

myG

The time serie is so short that at \(k=7\) a lot of observations are smoothed with different number of observations (shorter at start and end).

The above calculations are performed by a function in the convergEU package:

cuTB <-  emp_20_64_MS[,c("time","IT","DE")]

ma_dataset(cuTB, kappa=3, timeName= "time")
#> $res
#> # A tibble: 17 × 3
#>     time    IT    DE
#>    <dbl> <dbl> <dbl>
#>  1  2002  59.2  68.8
#>  2  2003  60.3  68.4
#>  3  2004  61.1  68.6
#>  4  2005  61.9  69.5
#>  5  2006  62.2  71.1
#>  6  2007  62.7  72.7
#>  7  2008  62.4  73.7
#>  8  2009  61.8  74.4
#>  9  2010  61.2  75.2
#> 10  2011  61.0  76.1
#> 11  2012  60.5  76.9
#> 12  2013  60.2  77.3
#> 13  2014  60.0  77.7
#> 14  2015  60.7  78.1
#> 15  2016  61.5  78.6
#> 16  2017  62.3  79.2
#> 17  2018  63    79.9
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

that is a bit less flexible but it produced standard results.

4 Scoreboards

The basis of scoreboard are raw values of an indicator (level, \(y_{m,i,t}\)) for MS \(m\) at time \(t\) for indicator \(i\). Differences among subsequent years (change) are as well important, namely \[ y_{m,i,t} - y_{m,i,t-1} \] thus a function to calculate these values may be exploited.

Let’s consider the dataset emp_20_64_MS, to calculate such quantities we do the following:

data(emp_20_64_MS)
resTB <- scoreb_yrs(emp_20_64_MS,timeName = "time")
resTB
#> $res
#> $res$sigma_conv
#> # A tibble: 17 × 9
#>     time stdDev     CV  mean devianceT elle1in elle1su elle2in elle2su
#>    <dbl>  <dbl>  <dbl> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1  2002   6.34 0.0939  67.5     1125.    64.3    70.7    61.2    73.9
#>  2  2003   5.95 0.0878  67.8      991.    64.8    70.7    61.8    73.7
#>  3  2004   5.70 0.0839  67.9      909.    65.1    70.8    62.2    73.6
#>  4  2005   5.54 0.0809  68.4      858.    65.7    71.2    62.9    74.0
#>  5  2006   5.57 0.0801  69.6      869.    66.8    72.3    64.0    75.1
#>  6  2007   5.47 0.0775  70.6      838.    67.9    73.3    65.1    76.1
#>  7  2008   5.36 0.0755  71.0      804.    68.3    73.7    65.6    76.3
#>  8  2009   5.03 0.0730  69.0      710.    66.5    71.5    64.0    74.0
#>  9  2010   5.24 0.0769  68.1      768.    65.5    70.7    62.9    73.4
#> 10  2011   5.59 0.0821  68.1      875.    65.3    70.9    62.5    73.7
#> 11  2012   5.98 0.0880  68       1002.    65.0    71.0    62.0    74.0
#> 12  2013   6.28 0.0922  68.0     1103.    64.9    71.2    61.8    74.3
#> 13  2014   5.98 0.0867  69.0     1000.    66.0    72.0    63.0    75.0
#> 14  2015   5.74 0.0820  70.0      922.    67.1    72.9    64.2    75.7
#> 15  2016   5.60 0.0789  71.0      879.    68.2    73.8    65.4    76.6
#> 16  2017   5.37 0.0741  72.5      808.    69.8    75.2    67.1    77.9
#> 17  2018   5.30 0.0717  73.8      786.    71.2    76.5    68.6    79.1
#> 
#> $res$sco_level
#> # A tibble: 17 × 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1  2002     4     3     1     5     4     3     5     3     2     2     4     3
#>  2  2003     4     2     1     5     4     3     5     3     2     2     4     3
#>  3  2004     3     3     1     5     3     3     5     3     2     3     4     3
#>  4  2005     3     3     1     5     3     3     5     4     2     3     4     3
#>  5  2006     3     2     2     5     3     3     5     5     2     3     4     3
#>  6  2007     3     2     3     5     3     3     5     5     2     3     4     3
#>  7  2008     4     2     3     5     3     4     5     5     2     3     4     3
#>  8  2009     4     3     3     5     3     5     5     3     2     2     4     3
#>  9  2010     5     3     2     5     3     5     5     3     2     1     4     3
#> 10  2011     5     3     2     4     4     5     5     3     1     1     5     3
#> 11  2012     5     3     2     3     4     5     5     4     1     1     5     3
#> 12  2013     5     3     2     3     4     5     4     4     1     1     4     3
#> 13  2014     4     3     2     3     4     5     4     4     1     1     4     3
#> 14  2015     4     3     2     3     4     5     4     5     1     1     4     3
#> 15  2016     4     2     2     3     5     5     4     4     1     1     3     3
#> 16  2017     4     2     3     3     5     5     4     5     1     1     3     3
#> 17  2018     3     2     3     3     5     5     4     5     1     1     3     3
#> # ℹ 16 more variables: HR <int>, HU <int>, IE <int>, IT <int>, LT <int>,
#> #   LU <int>, LV <int>, MT <int>, NL <int>, PL <int>, PT <int>, RO <int>,
#> #   SE <int>, SI <int>, SK <int>, UK <int>
#> 
#> $res$sco_change
#> # A tibble: 17 × 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
#>  2  2003     3     3     5     3     2     2     1     4     4     4     2     4
#>  3  2004     1     4     5     3     2     2     3     4     3     4     3     2
#>  4  2005     5     3     3     1     3     4     2     5     3     5     3     3
#>  5  2006     3     1     5     3     2     4     3     5     3     3     3     1
#>  6  2007     3     3     5     3     3     4     1     3     2     3     3     2
#>  7  2008     4     3     5     2     3     4     2     3     3     1     4     3
#>  8  2009     4     3     3     3     3     4     3     1     4     1     3     3
#>  9  2010     5     5     1     3     3     5     3     1     2     3     3     4
#> 10  2011     3     3     1     2     3     4     3     5     1     3     4     3
#> 11  2012     3     3     3     1     3     3     3     5     1     1     3     3
#> 12  2013     3     3     3     1     4     3     3     4     1     2     2     3
#> 13  2014     1     2     4     2     3     2     2     3     2     3     1     1
#> 14  2015     1     1     5     2     3     2     3     5     4     5     1     2
#> 15  2016     2     2     2     3     5     2     2     1     3     5     2     2
#> 16  2017     1     2     5     4     3     1     1     4     3     3     2     1
#> 17  2018     2     3     3     5     3     1     2     2     4     3     5     1
#> # ℹ 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> $res$sco_level_num
#> # A tibble: 17 × 29
#>     time    AT    BE    BG    CY    CZ    DE    DK    EE    EL    ES    FI    FR
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  2002   0.5   0    -1     1     0.5   0     1     0    -0.5  -0.5   0.5     0
#>  2  2003   0.5  -0.5  -1     1     0.5   0     1     0    -0.5  -0.5   0.5     0
#>  3  2004   0     0    -1     1     0     0     1     0    -0.5   0     0.5     0
#>  4  2005   0     0    -1     1     0     0     1     0.5  -0.5   0     0.5     0
#>  5  2006   0    -0.5  -0.5   1     0     0     1     1    -0.5   0     0.5     0
#>  6  2007   0    -0.5   0     1     0     0     1     1    -0.5   0     0.5     0
#>  7  2008   0.5  -0.5   0     1     0     0.5   1     1    -0.5   0     0.5     0
#>  8  2009   0.5   0     0     1     0     1     1     0    -0.5  -0.5   0.5     0
#>  9  2010   1     0    -0.5   1     0     1     1     0    -0.5  -1     0.5     0
#> 10  2011   1     0    -0.5   0.5   0.5   1     1     0    -1    -1     1       0
#> 11  2012   1     0    -0.5   0     0.5   1     1     0.5  -1    -1     1       0
#> 12  2013   1     0    -0.5   0     0.5   1     0.5   0.5  -1    -1     0.5     0
#> 13  2014   0.5   0    -0.5   0     0.5   1     0.5   0.5  -1    -1     0.5     0
#> 14  2015   0.5   0    -0.5   0     0.5   1     0.5   1    -1    -1     0.5     0
#> 15  2016   0.5  -0.5  -0.5   0     1     1     0.5   0.5  -1    -1     0       0
#> 16  2017   0.5  -0.5   0     0     1     1     0.5   1    -1    -1     0       0
#> 17  2018   0    -0.5   0     0     1     1     0.5   1    -1    -1     0       0
#> # ℹ 16 more variables: HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>,
#> #   LU <dbl>, LV <dbl>, MT <dbl>, NL <dbl>, PL <dbl>, PT <dbl>, RO <dbl>,
#> #   SE <dbl>, SI <dbl>, SK <dbl>, UK <dbl>
#> 
#> 
#> $msg
#> NULL
#> 
#> $err
#> NULL

where the result is a list of three components: the summary statistics, the numerical labels to indicate the interval of the partition a level belongs to, the interval of the partition a change belongs to.

Numerical labels are assigned as follows (see DRAFT JOINT EMPLOYMENT REPORT FROM THE COMMISSION AND THE COUNCIL):
* value \(-1\) if a the original level or change is \(y \leq m -1 \cdot s\);
* value \(-0.5\) if a the original level or change is \(m -1\cdot s < y \leq m - 0.5\cdot s\);
* value \(0\) if a the original level or change is \(m - 0.5\cdot s< y \leq m +0.5\cdot s\);
* value \(+0.5\) if a the original level or change is \(m +0.5\cdot s< y \leq m + 1\cdot s\);
* value \(1\) if a the original level or change is \(y > m +1\cdot s\).

We note that there is the possibility of representing the above summaries as coloured plots (TO DO) into scoreboards.

For the comparison of a country with the EU average, the following steps are recommended, from raw data:

# library(ggplot2)
data(emp_20_64_MS)
selectedCountry <- "IT"
timeName <-  "time"
myx_angle <-  45

outSig <- sigma_conv(emp_20_64_MS, timeName = timeName,
           time_0=2002,time_t=2016)
miniY <- min(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
maxiY <-  max(emp_20_64_MS[,- which(names(emp_20_64_MS) == timeName )])
estrattore<-  emp_20_64_MS[[timeName]] >= 2002  &  emp_20_64_MS[[timeName]] <= 2016
ttmp <- cbind(outSig$res, dplyr::select(emp_20_64_MS[estrattore,], -contains(timeName)))

myG2 <- 
  ggplot(ttmp) + ggtitle(
  paste("EU average (black, solid) and country",selectedCountry ," (red, dotted)") )+
  geom_line(aes(x=ttmp[,timeName], y =ttmp[,"mean"]),colour="black") +
  geom_point(aes(x=ttmp[,timeName],y =ttmp[,"mean"]),colour="black") +
#        geom_line()+geom_point()+
    ylim(c(miniY,maxiY)) + xlab("Year") +ylab("Indicator") +
  theme(legend.position = "none")+
  # add countries
  geom_line( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red"),linetype="dotted") + 
  geom_point( aes(x=ttmp[,timeName], y = ttmp[,"IT"],colour="red")) +
  ggplot2::scale_x_continuous(breaks = ttmp[,timeName],
                     labels = ttmp[,timeName]) +
   ggplot2::theme(
         axis.text.x=ggplot2::element_text(
         #size = ggplot2::rel(myfont_scale ),
         angle = myx_angle 
         #vjust = 1,
         #hjust=1
         ))
  
myG2

It is also possible to graphically show departures in terms of the above defined partition:

obe_lvl <- scoreb_yrs(emp_20_64_MS,timeName = timeName)$res$sco_level_num
# select subset of time
estrattore <- obe_lvl[[timeName]] >= 2009 & obe_lvl[[timeName]] <= 2016  
scobelvl <- obe_lvl[estrattore,]

my_MSstd <- ms_dynam( scobelvl,
                timeName = "time",
                displace = 0.25,
                displaceh = 0.45,
                dimeFontNum = 3,
                myfont_scale = 1.35,
                x_angle = 45,
                axis_name_y = "Countries",
                axis_name_x = "Time",
                alpha_color = 0.9
                )   
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the convergEU package.
#>   Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

my_MSstd

Convergence monitoring

Federico M. Stefanini, Nedka D. Nikiforova, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini

2024-01-21

Index:

1 Datasets on EU member states

1.1 Locally accessible datasets

1.2 Metaresults and missing values check

1.3 Imputation for artificially generated missing values in the Eurofound database

2 On Convergence

2.1 Beta-convergence

2.2 Sigma-convergence

2.3 Gamma-convergence

2.4 Delta-convergence

2.5 Absolute change

2.6 Convergence measures on Eurofound lifesatisf indicator

2.6.1 Beta convergence

2.6.2 Sigma convergence

2.6.3 Gamma convergence

2.6.4 Delta convergence

3 Support functions

3.1 Summaries and clusters of countries

3.2 Imputing missing values using a straight line

3.3 Weighted average smoothing of a complete dataset

3.4 Moving Average smoother

4 Scoreboards

5 Country fiche

6 Indicator fiches

7 References

8 Appendix: clusters over time of EU MS

Convergence monitoring

Federico M. Stefanini, Nedka D. Nikiforova, Chiara Litardi, Eleonora Peruffo and Massimiliano Mascherini

2024-01-21 Index:

1 Datasets on EU member states

1.1 Locally accessible datasets

1.2 Metaresults and missing values check

1.3 Imputation for artificially generated missing values in the Eurofound database

2 On Convergence

2.1 Beta-convergence

2.2 Sigma-convergence

2.3 Gamma-convergence

2.4 Delta-convergence

2.5 Absolute change

2.6 Convergence measures on Eurofound lifesatisf indicator

2.6.1 Beta convergence

2.6.2 Sigma convergence

2.6.3 Gamma convergence

2.6.4 Delta convergence

3 Support functions

3.1 Summaries and clusters of countries

3.2 Imputing missing values using a straight line

3.3 Weighted average smoothing of a complete dataset

3.4 Moving Average smoother

4 Scoreboards

5 Country fiche

6 Indicator fiches

7 References

8 Appendix: clusters over time of EU MS

2024-01-21

Index: