88.8%: problems of the “Assessment of the current population of Ukraine”

Demography, Statistics

The data on the sex and age structure in the “Assessment of the current population of Ukraine” conducted by the team of  the Minister of the Cabinet of Ministers Dmytro Dubilet were obtained by multiplying the data of the State Statistics Service (on sex and age structure) by 0.888 (or 88.8%). And this is not a “falsification” but the part of the applied methodology.

This is an updated version of the article. Changes and additions were made after the explanation of the Minister of the Cabinet of Ministers of Ukraine Dmytro Dubilet on the methodology of the “Assessment of the current population of Ukraine”, which was carried out using available electronic registers, data of mobile operators, household surveys, data of the State Statistics Service.

On January 23, 2020, the Minister of the Cabinet of Ministers of Ukraine Dmytro Dubilet published a PDF file with the presentation of the “Assessment of the current population of Ukraine” in his telegram channel.

It was report that the total population of Ukraine (excluding the occupied territories of Donbas and Crimea) is 37 289 000 people, as of December 1, 2019.

Note: the total population obtained as a result of the “Assessment” seems quite realistic and does not raise questions.

However the presented sex and age structure seemed questionable. One of the used methods was declared to be the “Combined method of assessment of the current population (data on sex and age structure of the population plus data from the registers)”.

01

What was originally known about this method (from the first presentation):

  1. The shares of the population by sex and age structure were calculated according to the State Statistics Service data and state registers data.
  2. The number of persons aged 60+ was determined from the data of the State Register of Compulsory State Social Insurance. That includes pensioners from temporarily occupied territories of Ukraine who visit the government-controlled territory to collect their pensions.
  3. Data on the number of persons aged 60+ were extrapolated to data on sex and age structure.

02

In fact, based on the available information, we made the assumption that THROUGH certain obtained data on the sex and age structure of the current population, the data on the total population were obtained (because it was presented as one of the methods of estimation of the number of population).

So we decided to check to what extent the sex and age structure of the population presented in the Assessment” correlates with the data published on the website of the State Statistics Service of Ukraine.

We grouped the data that are freely available on the website of the State Statistics Committee (Table: 0204. Distribution of the permanent population by sex, age groups and type of locality), by age groups indicated in the presentation of Mr. Dubilet:

  • children, aged 0-14
  • early working age, aged 15-24
  • basic working age, aged 25-54
  • late working age, aged 55-64
  • elderly, aged 65+

The result shown  in the table:

https://docs.google.com/spreadsheets/d/1il-RCOHuy6kozVcR-V0de1vf50gQC6gieVp5Zgt7gvg/edit?usp=sharing

Then we compared the data summarized by age groups with the data from the presentation of Mr. Dubilet. We were startled by the practically the same correlation between the sex-age groups – about 88.8% (Dubilet’s data and the State Statistics Service data).

03

We started to have a sneaking feeling that there was no “extrapolation” at all. There were a simple multiplication of the State Statistics Service data by a factor of 0.888 (or 88.8%). And the minor differences in the correlations are the results of the rounding.

Hoping that this is still the fruit of our imagination, and it can not be that silly, we tried to reproduce the possible application of a factor of 0.888 to the State Statistics Service data. After several iterations (which took half an hour), we were able to generate with almost 100% accuracy the data published by Mr. Dubilet which were claimed to be the results of so-called extrapolation.

04

Note that in the second step, rounding to the hundredth was done through ROUNDUP – to the nearest higher value (do not ask why, this is a mystery to us). Also, these calculations can be done by anyone. And to make it easier, we provide free access to the table with our calculations:

https://docs.google.com/spreadsheets/d/1il-RCOHuy6kozVcR-V0de1vf50gQC6gieVp5Zgt7gvg/edit#gid=1131691965

05

As can be seen from the tables, the result of our primitive calculations did not coincide with the advanced methods of processing and analysis of “big data” by Mr. Dubilet only for one sex-age group (women, aged 15-24). Instead of 1740 thousand people, we’ve got 1741 thousand people.

But, at the same time, it should be noted that the totals on age groups are inaccurate in Mr. Dubilet calculations. So, if you sum up numbers on all the men in his presentation, it turns out 17 million 281 thousand, not 17 million 280 thousand people.

As a result of the analysis, the suspicions of falsification of the “Assessment” has arisen.

On February 5, 2020, Dmytro Dubilet published his reaction to our article, in which he called our publication a fake (“Yesterday a fake about the alleged falsification of the “Assessment ” spread on the Internet). But at the same time he confirmed that our conclusions are right ( “It is unclear why to present as a” sensation “what was written by us in the description of the methodology.”)

A notification was also posted on the website of the State Statistics Service (http://www.ukrstat.gov.ua/Noviny/new2020/zmist/novini/pr_ochnu.htm ) refuting what we did not really questioned. Quote:

“The media reported on non-performance of work on estimating the current population of Ukraine as of December 1, 2019 (by regions of Ukraine) and obtaining an estimate of the current population of Ukraine 37 million 289 thousand people by ordinary mathematical operations (multiplied by 0.888).”

This is not the case, because our article was about obtaining data on the GENDER-AGE STRUCTURE of the current population in the presented “Assessment”. It can be assumed that the State Statistics Service meant possible reprints in the media, which accidentally (or intentionally) distorted the meaning of our publication about the detected anomaly.

The mentioned State Statistics Service notification did not disclose in any way the issue of calculating the sex-age structure.

On February 6, 2020, Dmytro Dubilet has held a briefing on the methodology of the assessment of the current population of Ukraine. ( https://www.youtube.com/watch?v=ATzfEJYhNOQ )

Regarding the controversial issue of probable data on sex and age distribution of the population by multiplying the relevant data of the State Statistics Service by 0.888, Dmytro Dubilet said the following:

” We are not ready to just use the State Statistics Service “fur tree” [meaning the sex-age pyramid], we need to conduct additional research to understand if  this “fur tree” is correct.

And the work with big data, with many sources of information has begun. What have we done? On the left you can see the mentioned demographic “fur tree” […]We took information from the tax system and built our “fur tree”. […] We took the “pyramid” from the State Tax Service and lay it over our “fur tree”. We have two pyramids, and we needed to make sure they matched or didn’t match. I remember the night when Pavlo, sitting to my right [probably Pavlo Polikarchuk, the founder of the “RATING. Business in official figures” project.”], began to write emotional messages that” wow, they coincided completely, to a thousandth of a coma.” If you think about it, it makes sense, because if the State Statistics Service maintains its pyramid on the basis of the data obtained from the registers, if the State Tax Service also maintains its pyramid on the basis of registers, it would be logical for them to coincide. The pyramid on the right does not take into account children, because there is no information about children in the tax office. Therefore on this pyramid in addition we had to impose the information on children from the Ministry of Justice. Then, when we cleaned the State Tax Service database, we had two samples: one – just raw data from the DPS, and the second – when we cleaned them according to the criteria mentioned when I talked about the second method. We lay them over each other again, and, again, they coincided in structure to a thousandth.

When we did this pretty big job, we told ourselves that we now have a scientific basis to use the pyramid run by the State Statistics Service, not just to estimate the population, but to make a breakdown by age and sex.

What did we do:

  • We made sure that the sex and age structure maintained by the State Statistics Service is reliable. We did a great job of making sure that this pyramid was correct, and fortunately, it coincided completely.
  • We determined the number of current population by taking the average value between the methods
  • Distributed a determined number of the current population by and age structure.

What this work looked like is very easy to see on this slide. On the left is the percentage of the population in different groups. Then we impose this on the total number of available population that we have. And finally, multiplying this matrix by this number, we got a matrix based on how many people we have in different gender and age groups.”

Thus, Dmytro Dubilet confirmed that the proportion of the sex and age distribution of the population (according to the State Statistics Service) was directly transferred to the total population obtained during the “Assessment”. He confirmed that the procedure was part of a methodology and he dismissed allegations of fraud.

When asked where the 0.888 coefficient came from, Dmytro Dubilet answered:

“If you divide the estimate we made by the State Statistics Service data (the official census), we get 0.88.”

The fact that Dmytro Dubilet admitted that they used such an approach to the assessment of sex and age structure does not mean that this method is acceptable.

Why can’t we simply transfer the proportion of sex and age groups from the available data of the State Statistics Service to the estimated number of the current population?

According to Andriy Protsyuk, an analyst and consultant at the Ukrainian Center for Social Data, there are many reasons for this:

“If we move to reality, where this pyramid (formed by multiplying the data of the State Statistics Service by 0.888) is true:

  • One year after the birth of children (and all births are registered in the system of the Ministry of Justice, and hence in the State Statistics Service), 11.2% (100% -88.8%) of them disappear somehow.
  • Working immigrants are equally distributed in all age groups. People who go to Poland to pick up tomatoes take with them a proportional number of children and pensioners, 11.2%?
  • Mortality rate, which is also fully registered, accounts for fewer people. Which means life expectancy becomes ~ 12.6% less than the official one.
  • According to the State Statistics Service, since the last census, 220,000 more people have arrived in Ukraine than left, including 90,000 more after 2014. The figures are fantastic and, nevertheless, they are included in the sex and age pyramid of the State Statistics Service, which is taken as a basis for calculations.
  • At the beginning of 2019, according to the Pension fund of Ukraine, there were 8.7 million pensioners (excluding the occupied territories of Donbas and Crimea), and the payment funds are divided into this number. If we take the “calculated” sex and age pyramid as reality, it turns out that there are 11.2% fewer pensioners, which means that either 11.2% of the annual pension fund is stolen or the real pension is 12.6% higher. And the Pension fund of Ukraine lowers the statistics of the size of pensions, and pensioners make themselves look poorer.”

In fact, with 88.8% of the population, the sex and age pyramid will be significantly different from the State Statistics Service pyramid. And the difference depending on age groups will be parabolic. Deviations will be minimal (less than 5%) in the youngest and oldest age groups, and in the middle age groups – maximum (around 20%).

In addition to the sex and age structure in the “Assessment of the current population” and “0.888”, there are other questions regarding the estimation methodology.

Kyrylo Zakharov,  Director of the Court on the Palm project wrote on his Facebook page:

” A statistical survey of 26.7 thousand households was conducted on a “volunteer basis”. Also “data were collected taking into account the territory of residence, gender, age, social status of respondents.” Show me this army of volunteers.

Can you call volunteering the work of the State Statistics Service employees, who in their survey had to ask an additional question about the number of mobile phones? Were the staff of the State Statistics Service informed that this was their volunteer work? Let’s suppose that one question with the answer takes 1 minute. 26.7 thousand answers were received. That takes 445 hours of working time. If it’s not volunteering, why have we depreciated almost 56 working days to 0?

Three mobile operators performed the calculations separately. The data were taken from February-March 2019, almost a year old. Why? One hypothesis is that they were collected for the purpose of the presidential campaign, or it could be related to the trade secret of the operators. Has there been any adjustment of the data done, at least at the level of simple linear models. Or has there been done an estimation of the error  because of the outdated data ?

The share of mobile operators that provided data is 98% of the market. In addition, there are nuances with the wording of the question in the survey, the theoretical error of the survey, etc. Is it really the error of the result can be only 3%?

Why was it decided that the error could be estimated by comparing the results of the three methods?

According to the results of a representative survey, it was determined that the coverage of the population by mobile phones is 88.8% (one of the previous versions of the appearance of the mysterious coefficient). And the average number of SIM-cards is 1.21.

What should the average number of SIM cards per subscriber say, if the distribution is not a Gaussian distribution, but rather fits the Poisson distribution. Why not publish a histogram?”

There are many questions about the methodology, and hence the reliability of the assessment.

The only way to obtain the most reliable data on the population of Ukraine – is to conduct a full-fledged full census.

And before the census informational and explanatory campaign on the importance of the census should be carried out in order to make citizens more open minded and to improve the quality of data. With such a record level of trust, the authorities could make the most of it. Why is the government doing so much to discredit the census?

 

 

Menu