Exploratory data analysis

COVID-19 Data Analysis in Spain using python

Data Analysis of the disease of COVID-19 and its impact on health services.
Project image

Repository: https://github.com/octadelsueldo/Master_DS_CUNEF/tree/main/Python.

You can find all the information related to the above link attached. In this project, I was able to answer the following questions:

Section 1.

1. Show a list of hospitals for which the incidence of COVID-19 is known. Which hospital has suffered the highest number of deaths? In which province is the hospital located?

2. Show a list of the provinces that have collected data on the evolution of Covid-19. For each province, we want to know the total number of registrations produced. Calculate the total number of registrations by Autonomous Community.

3. Show a list of the autonomous communities of Spain.

Section 2.

  1. Select the evolution data of Covid-19 registered in the Hospital Complex Asistencial from Soria.

  2. Select the Covid-19 evolution data registered in the province of Soria.

  3. Select the Covid-19 evolution data recorded on 2020-03-22.

  4. It is desired to study the incidence that Covid-19 has had, depending on the age groups. Select the data on the evolution of Covid-19 associated with the 70-79 age group.

Section 3.

  1. It is desired to study the incidence that Covid-19 has had, depending on the age groups. Select the evolution data of Covid-19 associated with the age group 70-79 in the province of acronyms SO.

  2. It is desired to study the evolution that Covid-19 has had depending on the age groups. Select the evolution data of Codvid-19 associated with the province of SA acronyms and the age group 60-69.

  3. It is desired to study the evolution of new deaths caused by Covid-19 in the different provinces. Select the evolution data of Codvid-19 associated with the province of Valladolid on the day '2020-05-10'.

  4. It is desired to study the evolution of new deaths caused by Covid-19 in the different provinces. Select the evolution data of Codvid-19 collected in the month of April.

Section 4.

  1. In the file T_situacion-epidemiologica-coronavirus-provincias.csv you can find the daily evolution of patients with Covid-19 in several Spanish provinces. Calculate the total number of confirmed cases in the province of Palencia.

  2. Calculate the total number of deaths in the province of Zamora. Use the file T_situacion-epidemiologica-coronavirus-provincias.csv, where the daily evolution of patients with Covid-19 in several of the Spanish provinces is found.

  3. Calculate the total number of registrations in the province of Valladolid.

  4. Calculate the total number of male patients over 80 years of age produced in the province from Zamora (ZA)

Section 5.

  1. The file T_situacion-enfermos-por-coronavirus-por-tranmos-sexo.csv collects data on patients by sex and age. It shows the data of the province of León (LE) sorted by date and age group in ascending order.

  2. Shows the data associated with the '40-49' age group, sorted by date and province in descending order. The file T_situacion-enfermos-por-coronavirus-por-tranmos-sexo.csv collects data on patients by sex and age.

  3. Shows the evolution data of Covid patients associated with the province of Soria. The data must be ordered by date and hospital name in decreasing order of date and ascending hospital.

  4. Shows the data associated with the provinces of the autonomous community of Andalusia. Shows the data sorted in increasing the Surface area and decreasing population

  5. Shows the data on the evolution of confirmed cases of Covid associated with the province of Ávila. The data must be sorted by date in descending order.

Section 6.

  1. Add a new column called men_ratio to the dataframe associated with the file T_situacion-enfermos-por-coronavirus-por-tranmos-sexo.csv. The value of this column represents the percentage of sick men with respect to the total number of sick men.

  2. Add a new column called differences to the dataframe associated with the file T_situacion-sick-por-coronavirus-por-tranmos-sexo.csv. The value of this column represents the difference between sick men and sick women

  3. Add a new column called new_hospitalized patients to the dataframe associated with the file T_situacion-de-hospitalizados-por-coronavirus.csv. The value of this column represents the sum of the newly hospitalized in the ward and the new hospitalized in the ICU.

Section 7

  1. We want to know the average number of new patients admitted to the ward in each of the hospitals.

  2. You want to know the maximum number of new positive cases registered in each of the provinces.

  3. You want to know the maximum number of new positive cases registered on each of the dates.

Section 8.

  1. How many NaN values ​​appear in the 'Position' column of the dataframe obtained from reading the file T_situacion-epidemiologica-coronavirus-provincias.csv?

  2. Substitutes the NaN values ​​of the 'Position' column of the dataframe obtained from reading the file T_situacion-epidemiologica-coronavirus-provincias.csv with the value '0.0'.

  3. Eliminate the rows that contain a NaN value from the dataframe obtained from reading the file T_situacion-epidemiologica-coronavirus-provincias.csv.

  4. Select the date, hospital, discharge, and death columns from the dataframe associated with the file T_situacion-de-hospitalizados-por-coronavirus.csv and save the result in a file named nuevo_f1.csv.

Section 9.

  1. Create a new column called size in the dataframe associated with the Poblacion_surefficie_comunidades.csv file. The value of said column will be 'P' for those provinces whose population is less than 100,000 inhabitants, 'M' for those provinces whose population is less than 1,000,000 inhabitants, and 'G' for those provinces whose number of inhabitants is equal to or more than a million.

  2. Create a new column called alarm in the dataframe associated with the file T_situacion-de-hospitalizados-por-coronavirus.csv. The value of this column will be green if the number of new deaths is less than 3, yellow if the number of new deaths is less than 10 and red if the number of new deaths is greater than or equal to 10.

  3. Create a new column called 'initials' in the dataframe associated with the file T_situacion-de- hospitalizados-por-coronavirus.csv. The value of this column is the initials of each of the provinces. Thus, the initial of Burgos is 'B'.

  4. Create a new column called 'Numerical code' in the dataframe associated with the Poblacion_surefficie_comunidades.csv file. The value of this column is the numerical code of the Autonomous community. For example, for the Autonomous Community of Catalonia with code 'C09', its numerical code will be 9.

Section 10

  1. Represents the daily evolution of the Number of deaths from coronavirus by provinces of Castile and Leon

  2. The Daily evolution of deaths in the A.C. of Castilla y León.