Category: Data Science

  • How to Install Python on Your Windows 10/11

    How to Install Python on Your Windows 10/11

    How to Install Python on Your Windows 10/11

    Before you begin to write programs with Python, it must be available and correctly configured on your computer. This post will guide you in installing and creating your first Python program using the simple and attractive IDE interface and print function using real-life examples.

    Next, let’s begin with the following simple seven steps.

    The Process of downloading and installing Python from a credible source.

    Downloading Python on Windows.
    A simple and easy to follow step-by-step way to download Python on Windows 10/11

    Downloading and installing the current Python software on your Windows OS computer securely is very easy. To do that job in a moment, follow the simple steps below.

    Step 1: Search for software setup using your browser

    Open any of your favorite browsers (Google Chrome, Microsoft Edge, Operamini, Apple Safari, and Brave) and search “Python download for Windows.” You will receive the search results shown in the screenshot below.

    Click-Download-Python-Python.Org_.-First-option-on-the-search-results
    Web search results you get on a Google Chrome browser.

    Click the first link for the Python Organization downloads, which will take you to the next web page in step 2 below. Alternatively, click this link: https://www.python.org/downloads/ to go directly to Python’s official downloads page.

    Step 2: Downloading the Python setup

    Once the next page opens after step 1 above, click the “Download Python” button below the topic “Download the latest version for Windows” circled on the screenshot below.

    Click-the-button-named-Download-Python-3.12.2.
    Python’s official downloads page.

    You will be asked to save the setup file. Please choose the location on your computer where you want to keep it. For example, I stored the Python file in the Downloads folder for this guide, as shown in the screenshot below.

    Saving the Python setup file on your computer's download folder.
    Saving the Python setup file on your computer’s download folder.

    After that, it should start downloading, and indicate it on your browser’s download progress icon.

    Download in progress

    Stay cool as you wait for the setup to download. It should be noted that the download time depends on your internet speed. Therefore, if your internet speeds are high, you will wait for a short while and vice versa.

    Step 3: Open the Python setup to install

    Next, after the setup finishes downloading, it is time for us to install. Therefore, to install it on your PC, please visit and open the location where you downloaded and stored the setup file. Once you are in it, double-click the setup icon to start the installation process. As a result, the Python setup installation panel opens as shown below.

    If you look at position 1 on the screenshot, the version to be installed is 3.12.2, which is 64-bit. In addition, below the version topic is an installation statement guiding us on what we should choose next.

    The installation wizard instructions on your screen.

    Make sure you select the two check boxes circled and labeled 2 in red color on the screenshot above (number 2). After ticking by clicking on them, they are highlighted with a tiny tick in the middle and blue. Thus, they should look like in the image below.

    Select-the-two-checkboxes-at-the-bottom-of-the-installation-wizard

    The two options ensure you will not experience challenges using the Python environments after installation.

    Step 4: Starting the installation process

    Next, click the first option on the install menu – “Install Now” option.

    Click-the-Install-Now-option-to-start-installation

    Step 5: Managing the Microsoft User Account Control settings

    Next, if you are using the Microsoft User Account Control settings, you’ll be prompted to allow changes on your hard drive. Therefore, once it appears, do the following.

    On the “User Account Control” prompt menu on your screen asking “if you want to allow the app to make changes to your device”, click the “Yes” button.

    Select yes when prompted to allow the application to make changes to your harddisk.
    Select “Yes” when prompted to allow the application to make changes to your harddisk.

    Immediately, the installation process starts.

    The-Python-setup-installation-process-and-progress

    Wait until the setup completes the installation.

    Step 6: Finishing the Python setup installation successfully

    After that, the installation progress changes to “setup was successful” at the top, and a “Close”button is activated on the lower right side of the window. Click the close button to finish up the installation.

    Setup-completes-installation-successfully

    Step 7: Verifying if the setup was correctly installed

    After closing the installation wizard, the next activity is to confirm if the setup was installed successfully. For you to quickly verify, use the Windows 10/11 command line terminal known as cmd.

    Are you wondering how to access the cmd terminal? If yes, stop wondering because I figured out the answer to your question beforehand. Therefore, follow the following method to access it for verification.

    First, open the cmd interface by going to the search button (on your taskbar).

    Click-on-the-Windows-search-button

    Secondly, type “cmd” on the search bar (position 1), and the best march will appear on the list below the search bar.

    Then, choose either the first option (Number 2) or option 2 on the right side of the menu (number 3). After selecting one of the options on the search results, the command line interface opens in a moment.

    The-Windows-10-Command-Line-Interface-for-verification-of-the-installation

    Thirdly, once it opens, type the command “python—version” and press “Enter” on the keyboard to display the current version of Python that has been installed. The results of executing the command are as follows.

    The-command-used-to-check-the-Python-version-installed-and-the-version-results

    Hurray! You have Python 3.12.2 installed on your Windows operating system machine and are ready to use it.

    Note: When I first wrote this guide, the latest version was Python 3.12.2. So, the version might be updated and new when you come across this guide. However, don’t worry or be scared because the guide installs the latest version of Python. Therefore, I promise you a seamless installation journey.

    Now, let’s access and use the Python IDE to write a program for the first time.

    Writing your first Python program code – “Hello World” with Python IDLE

    By default, after installing Python, you have the IDLE installed. This is because the Integrated Development and Learning Environment (IDLE) is the primary IDE for the Python programming language. Generally, the acronym name IDE stands for Integrated Development Environment.

    So, how do you launch and use Python’s IDE to write your first program? Let me tell you something you may not know. The process is very simple and very effective. Are you excited to start writing Python programs? I’m pretty sure you are; therefore, let’s dive in.

    How do you open the IDLE and start writing program codes?

    I know you are eager to write your very first program, right? I know the feeling, too, because I’ve been through the same situation in many situations. Thus, trust me, I know waiting for something new, especially one you are excited about, is not easy.

    However, before we begin, let me tell you one important thing about the Integrated Development and Learning Environment. This will ensure you are the best at using the Python IDLE coding environment. Also, the tip will help you to be in a position to handle any size or complexity of your codes as projects grow bigger.

    Pro Tip

    If your code is small, when the Python Shell window opens, write them directly and test them on the same window. As a result, the process will be simple, and the output will be just below a few lines of code. For example, see the code lines and outputs in the same window below.

    However, if you want to have a long piece of code, open a new file by going to “File” on the top left and creating a new file from the menu. The new window opens with an advanced menu, enabling you to type and run a long code directly. Having said that, let us move on to even more interesting stuff.

    Are you ready to start writing your first Python Programming Language program? I know you are, but before that, let’s look at how to launch the Python IDLE on our Windows computers.

    2 Ways to access and open the Python IDLE interactive shell

    Generally, two main ways to open the Python coding shell are through the Windows start menu and the search button on the taskbar by typing in the search bar.

    A. Opening the IDLE shell using the Windows search button

    i. Click the search button next to the Windows icon.

    ii. Next, type IDLE in the search bar (position 1 on the screenshot below).

    Searching IDLE in the search bar.

    iii. Then, on the search results, click either the best match result containing the name (Position 2). Or the open button on the right side of the menu (Position 3).

    iv. After clicking, the IDLE will launch in a moment, and you can begin writing your first program. Therefore, you should start writing on the first line where the cursor is blinking.

    Python-IDLE-opens-immedietly-with-the-cursor-blinking-on-the-first-line

    v. Finally, when you finish typing the program, press “Enter” to execute it. For example, I typed the program code line “print(“Hello! World”)” in the first line.

    Start-typing-your-first-line-of-code-where-the-cursor-is-blinking

    After that, I pressed the enter button on my keyboard, and the following output was displayed on the next line.

    First-program-output_Hello-world

    Note: The interactive shell only executes small code pieces and displays the results below the code lines.

    Examples-of-simple-code-lines

    B. Opening the Python IDLE shell using the Windows start menu

    On your desktop, go to the Windows start menu and click on it to open the start menu.

    Windows-Start-Menu

    Next, on the new start menu, click on the all apps extension button to open the main menu, which is arranged in alphabetic order.

    Next, scroll up the menu until you find the newly installed Python application under the menu items starting with the letter “P.”

    Newly-installed-python-folder-and-menu

    After that, click it or the extension on the right side to open all the applications under the Python folder menu.

    Then, click the first item on the menu—IDLE (Python 3.13 64-it). The IDLE Shell will open in a new window.

    Now that you know how to access and launch the IDLE shell, let us open it, write, and execute a long code.

    How to create and execute long code pieces using the IDLE shell

    Earlier in our guide, I mentioned that you can write and execute small pieces of code directly after opening the interactive shell. What if you want to write a long piece of code or write small ones in a more advanced environment? Here is how to achieve it in simple terms.

    First, access using any of the methods discussed and open it. Then click on the “File” menu on the top left of the new shell window. On the floating menu, select the “New File” option.

    Click-on-File-Menu-and-then-choose-new-file

    Secondly, when the new file opens, start typing your code.

    Finally, after creating your code, save the file as follows before executing it: On the top menu, click File and choose Save on the dropdown menu. The program will not work if you execute the file before saving it.

    Next, choose where to save the program file on your Windows computer. In this case, I saved the file on the desktop.

    Also, remember to save the file using the Python file extension (.py). This enables your computer to detect and execute the program when you run the program in the shell.

    Note: Once you get used to shell programming, you can use the shortcuts displayed on each menu. For example, you only need to press the “Ctrl + S” combination to save the program.

    Once the file is saved, head to the top menu and click Run. Then, choose the first option, “Run Module,” which appears on the drop menu.

    Alternatively, you can press the F5 key on the keyboard to run the program directly.

    Summing up everything!

    The guide has taught you everything important about downloading and using Python. First, it has taught you that by following this guide step by step, you are guaranteed to install a legit copy of the programming language on your Windows 10 or 11 computer. The procedures followed are simple – including searching for the setup online, downloading, accessing, and installing.

    Therefore, you can follow this guide with or on your favorite browser to download and install the tool successfully. Secondly, you’re also guided on accessing the IDLE coding shell and writing your first code. You can either access it through;

    1. the Windows start menu by opening the “All apps” extension menu and scrolling up until you see the Python menu.
    2. by using the Search button next to the start menu above, type the shell name IDLE and select it from the search results.

    Furthermore, through the guide, we’ve seen how to verify if the installation was successful and the version of Python. Additionally, we have also learned how to write both short and long code pieces using the IDLE shell. In conclusion, with this guide, you’ve just kickstarted your Python programming journey with us. More tutorials, guides, and courses are coming up in the near future. Stay tuned for more learning content and guides.

  • How to Develop a Simple Web Application for Your Data Science with Python in 2024

    How to Develop a Simple Web Application for Your Data Science with Python in 2024

    Are you looking for a guide on how to build a web application for your data analysis project? Or are you just looking for a data analysis and machine learning project to inspire your creativity? Then, if your answer is “Yes,” this step-by-step guide is for you. You’ve come to the right place where we guide, inspire, and motivate people to venture into data science and ML. For your information, ML is the acronym for Machine Learning. Having that said, let’s move on with the project.

    The application landing page:

    Upon opening the application on the web, you will see the app name and tagline. They are listed below and shown in the screenshot after the name and tagline.

    Name:

    The Ultimate Career Decision-Making Guide: – Data Jobs

    Tagline:

    Navigating the Data-Driven Career Landscape: A Deep Dive into Artificial Intelligence (AI), Machine Learning (ML), and Data Science Salaries.

    Project introduction

    This is the second part of the project. As you can remember from the introductory part of the project, I divided the project into two parts. In phase one, I worked on the End-to-End data analysis part of the project using the Jupyter Notebook. As mentioned earlier, it formed the basis for this section – web application development.

    Thus, to remind ourselves, I performed the explorative data analysis (EDA) process in step 1 and data preprocessing in step 2. Finally, I visualized the data in step 3. I created nine key dimensions to create this web application to share my insights with you and the world.

    I recommend you visit the application using the link at the end of this guide.

    What to expect in this Web Application building guide.

    A step-by-step guide on how to design and develop a web application for your data analysis and machine learning project. By the end of the guide, you’ll learn how the author used the Python Streamlit library and GitHub repository to create an excellent application. Thus, gain skills, confidence, and expertise to come up with great web apps using free and readily available tools.

    The Challenge

    Today, there are different challenges or issues that hinder people from venturing into the fields of Artificial Intelligence (AI), Machine Learning (ML), and Data Science. One primary challenge individuals face is understanding the nuanced factors that influence career progression and salary structures within these fields.

    Currently, the demand for skilled professionals in AI, ML, and Data Science is high, but so is the competition. For example, 2024 statistics show that job postings in data engineering have seen a 98% increase while AI role postings have surged 119% in 2 years. Additionally, 40% or 1 million new machine learning jobs are expected to be created in the next five years.

    Therefore, these industry/field and market trends show an increase in demand for AI, ML, and Data Science. The growth needs more experts and professionals; thus, it is necessary to act now to join and take advantage of the trends. But how do you do that? The question will be directly or indirectly answered in the subsequent sections of this guide.

    Navigating this landscape requires a comprehensive understanding of how various elements such as years of experience, employment type, company size, and geographic location impact earning potential. That’s why I developed this solution for people like you.

    The Solution: Web Application

    To address these challenges, the project embarks on a journey to demystify the intricate web of factors contributing to success in AI, ML, and Data Science careers. By leveraging data analytics and visualization techniques, we aim to provide actionable insights that empower individuals to make informed decisions about their career trajectories. Keep it here to learn more. Next, let’s look at the project objective.

    The project’s phase 2 objective.

    The goal is to create a simple web application using the Python Streamlit library to uncover patterns and trends that can guide aspiring professionals. The application will be based on comprehensive visualizations highlighting salary distributions prepared in phase 1 of the project.

    The project’s phase 2 implementation.

    A web application built with the Python Streamlit library and the GitHub repository
    Web Application Dashboard: A web application built with the Python Streamlit library and the GitHub repository

    Web Application: the streamlit library in Python

    Generally, Streamlit is the Python library that simplifies the process of building interactive web applications for data science and machine learning projects. For example, in this project, I used it for a data science project. It allowed me as a developer to create dynamic and intuitive user interfaces directly from Python scripts without needing additional HTML, CSS, or JavaScript.

    With Streamlit, users can easily visualize data, create interactive charts, and incorporate machine learning models into web applications with minimal code. For example, I used several Python functions in this project to generate insights from visualizations and present them in a simple web application.

    This library is particularly beneficial for data scientists and developers who want to share their work or deploy models in a user-friendly manner, enabling them to prototype ideas and create powerful, data-driven applications quickly. This statement describes my project’s intentions from the beginning to the end.

    Creating the web application in Microsoft VS Code.

    To achieve the project’s objective, I use Six precise step-by-step procedures. Let us go through each one of them in detail in the next section.

    Step 1: Importing the necessary libraries to build the web application

    The first thing to do is import the Python libraries required to load and manipulate data.

    The libraries imported for the project.

    These are the same libraries used in the end-to-end data analysis process in phase 1 of the project. It’s important to note that Streamlit has been added to this phase.

    Step 2: Setting the page layout configurations

    Next, I basically designed the application interface using the code snippet below. I defined and set the application name, loaded the image on the homepage, and created the “Targeted Audience” and “About the App” buttons.

    Step 3: Loading the engineered data.

    After creating the web application layout, I proceeded to load the data along with the newly engineered features. The two code snippets below show how to load data in Streamlit and ensure that it remains the same every time it is loaded.

    Step 4: Create the web application “Raw Data” & “Warning: User Cases” display buttons

    The “Raw Data” button was designed to ensure a user can see the engineered dataset with the new features on the Web Application. Similarly, the “Warning: User Cases” button was created to warn the user about the application’s purpose and its limitations.

    Step 5: Insights generation and presentation in the Web Application.

    There were nine insights and recommendations. The code snippets below show how each insight was developed. Visit the application to view them by clicking the link given below.

    Step 6: Web application user appreciation note

    After exploring the 9 insights, the summary section congratulates the App user. Additionally, it tells them what they have done and gained by using my application.

    Finally, it bids them goodbye and welcomes them back to read more on the monthly updated insights.

    Key analytical dimensions in the web application

    Building a web application.

    As listed below, based on project phase 1 data visualizations, I generated 9 insights. Each analytical dimension is accompanied by the code snippet developed to provide insight into the web application interface. After reading them, I recommend you click the link at the end of the post to visit the deployed application. As a result, you will confirm my great work with the Streamlit Library, which is a very simple tool that is available to you.

    1. Employment Type: Unraveling the nuances of salaries in full-time, part-time, contract, or freelance roles.

    2. Work Years: Examining how salaries evolve over the years.

    3. Remote Ratio: Assessing the influence of remote work arrangements on salaries.

    4. Company Size: Analyzing the correlation between company size and compensation.

    5. Experience Level: Understanding the impact of skill proficiency on earning potential.

    6. Highest Paid Jobs: Exploring which job category earns the most.

    7. Employee Residence Impacts on Salary: Investigating the salary (KES) distribution based on the country where the employee resides.

    8. Company Location: Investigating geographical variations in salary structures.

    9. Salary_in_KES: Standardizing salaries to a common currency for cross-country comparisons.

    Conclusion

    In conclusion, by examining the critical analytical dimensions, the project seeks to provide a nuanced perspective on the diverse factors shaping salaries in the AI, ML, and Data Science sectors. Armed with these insights, individuals can navigate their career paths with a clearer understanding of the landscape, making strategic decisions that enhance their success in these dynamic and high-demand fields.

    I recommend you access the guidelines with an open mind and not forget to read the user case warning. Do not wait any longer; access “The Ultimate Career Decision-Making Guide: – Data Jobs” web application and get the best insights.

    The Streamlit Application Link: It takes you to the final application deployed using the Streamlit Sharing platform – Streamlit.io. https://career-transition-app-guide–data-jobs-mykju9yk46ziy4cagw9it8.streamlit.app/

  • End-to-End Data Analysis Project with Source Codes: Your Ultimate Guide

    End-to-End Data Analysis Project with Source Codes: Your Ultimate Guide

    End-to-End Data Analysis Project

    Are you looking for an End-to-End Data Analysis Project with source codes to inspire and guide you in your project? Look no further because you have come to the right place. This project is an excellent step-by-step tutorial to help you accomplish your data science and machine learning (ML) project. Therefore, its originality and uniqueness will greatly inspire you to innovate your own projects and solutions.

    1.0. Introduction

    In the ever-evolving landscape of technology, the temptness of Artificial Intelligence (AI), Machine Learning (ML), and Data Science has captivated the ambitions of professionals seeking career transitions or skill advancements.

    However, stepping into these domains has its challenges. Aspiring individuals like me often confront uncertainties surrounding market demands, skill prerequisites, and the intricacies of navigating a competitive job market. Thus, in 2023, I found myself in a data science and machine learning course as I tried to find my way into the industry.

    1.1 Scenario Summary

    Generally, at the end of any learning program or course, a learner or participant has to demonstrate that they meet all the criteria for successful completion by way of a test. The final test is for the purpose of certifications, graduation, and subsequent release to the job market. As a result, a test, exam, or project deliverable is administered to the student. Then, certification allows one to practice the skills acquired in real-life scenarios or situations such as company employment or startups.

    Applying the above concept to myself, I certified at the end of my three-month intermediary-level course in 2023. The course is on “Data Science and Machine Learning with the Python Programming Language.” The Africa Data School (ADS) offers the course in Kenya and other countries in Africa.

    At the end of the ADS course, the learner has to work on a project that meets the certification and graduation criteria. Therefore, from the college guidelines, it goes without saying that for me to graduate, I had to work on either;

    1. An end-to-end data analysis project, OR
    2. An end-to-end machine learning project.

    The final project product was to be presented as a Web App deployed using the Streamlit library in Python. To achieve optimum project results, I performed it in two phases: the data analysis phase and the App design and deployment phase.

    1.2. End-to-End data analysis project background: What inspired my project

    Last year, 2023, I found myself at a crossroads in my career path. As a result, I didn’t have a stable income or job. For over six and a half years, I have been a hybrid freelancer focusing on communication and information technology gigs. As a man, when you cannot meet your financial needs, your mental and psychological well-being is affected. Thus, things were not good or going well for me socially, economically, and financially. For a moment, I considered looking for a job to diversify my income to cater to my nuclear family.

    Since I mainly work online and look for new local gigs after my contracts end, I started looking for ways to diversify my knowledge, transition into a new career, and subsequently increase my income. From my online writing gigs and experience, I observed a specific trend over time and identified a gap in the data analysis field. Let us briefly look at how I spotted the market gap in data analysis.

    1.2.1.   The identified gap

    I realized that data analysis gigs and tasks that required programming knowledge were highly paid. However, only a few people bid on them on different online job platforms. Therefore, the big question was why data analysis jobs, especially those requiring a programming language, overstayed on the platforms with few bids. Examples of the programming languages I spotted an opportunity in most of those data analysis jobs include Python, R, SQL, Scala, MATLAB, and JavaScript. The list is not exhaustive – more languages can be found online.

    As a result of the phenomenon, I started doing some research. In conclusion, I realized that many freelancers, I included, lacked various programming skills for data analysis and machine learning. To venture into a new field and take advantage of the gap required me to learn and gain new skills.

    However, I needed guidance to take advantage of the market gap and transition into the new data analysis field with one of the programming languages. I did not readily find one, so I decided to take a course to gain all the basic and necessary skills and learn the rest later.

    Following strong intuition coupled with online research about data science, I landed at ADS for a course in Data Science and Machine Learning (ML) with Python Programming Language. It is an instructor-led intermediary course with all the necessary learning resources and support provided.

    Finally, at the end of my course, I decided to come up with a project that would help people like me to make the right decisions. It is a hybrid project. Therefore, it uses end-to-end data analysis skills and machine learning techniques to keep it current with financial market rates.

    I worked on it in two simple and straightforward steps and phases. They include:

    1.2.2. Phase 1: End-to-End Data Analysis

    Dataset Acquisition, Analysis, and Visualization using the Jupyter Notebook and Anaconda.

    1.2.3. Phase 2: App Design and Deployment

    – Converting the Phase 1 information into a Web App using the Streamlit library.

    Next, let me take you through the first phase. In any project, it is important to start by understanding its overall objective. By comprehending the goal of the project, you can determine if it fits your needs. It is not helpful to spend time reading through a project only to realize that it is not what you wanted.

    Therefore, I’ll start with the phase objective before moving on to the other sections of the project’s phase one.

    1.3. Objective of the end-to-end data analysis project phase 1

    The project analyzes and visualizes a dataset encompassing global salaries in the AI, ML, and Data Science domains to provide you with helpful data-driven insights to make the right career decisions. It delves into critical variables, including the working years, experience levels, employment types, company sizes, employee residence, company locations, remote work ratios, and salary in USD and KES of the dataset. Thus, the final results were useful data visualizations for developing a web application.

    2.0. The End-to-End Data Analysis Process.

    The first phase was the data analysis stage, where I searched and obtained a suitable dataset online.

    Step 1: Explorative Data Analysis (EDA) Process

    2.1 Dataset choice, collection, description, and loading.

    The project’s data is obtained from the ai-jobs.net platform. For this project, the link used to load the data explored is for the CSV file on the platform’s landing page. Nonetheless, the dataset can also be accessed through Kaggle. Since the data is updated weekly, the link will facilitate continuous weekly data fetching for analysis in order to keep the Ultimate Data Jobs and Salaries Guider Application updated with the current global payment trends.

    Dataset Source = https://ai-jobs.net/salaries/download/salaries.csv

    2.1.1 Raw dataset description

    The dataset contained 11 columns with the following characteristics:

    • work_year: The year the salary was paid.
    • experience_level: The experience level in the job during the year.
    • employment_type: The type of employment for the role.
    • job_title: The role worked during the year.
    • Salary: The total gross salary amount paid.
    • salary_currency: The currency of the salary paid is an ISO 4217 currency code.
    • salary_in_usd: The salary in USD (FX rate divided by the average USD rate for the respective year via data from fxdata.foorilla.com).
    • employee_residence: Employee’s primary country of residence as an ISO 3166 country code during the work year.
    • remote_ratio: The overall amount of work done remotely.
    • company_location: The country of the employer’s main office or contracting branch as an ISO 3166 country code.
    • company_size: The average number of people that worked for the company during the year.

    2.1.2 Data loading for the end-to-end data analysis process

    First, I imported all the necessary libraries and modules to load, manipulate, and visualize the data in cell 1 of the Jupyter Notebooks.

    Importing the required Python libraries.

    Then, I loaded the data using Pandas in Cell 2 and received the output below. The Pandas “.head ()” function displayed the first five rows in the dataset.

    Using Pandas to load the dataset.

    After loading the salaries dataset from the URL, I used the Pandas library to study it. I analyzed the dataset’s frame structure, basic statistics, numerical data, and any null values present to understand its composition before proceeding with the analysis process. The results showed that there were:                              

    I. Four columns out of the total eleven with numerical data.

    Four columns with numerical data.
    All columns with numerical data types in the dataset were obtained.

    ii. Eleven columns in total contain 14,373 entries. Four with numerical data and seven with object data types.

    Data types descriptions. Four columns had int64 datatypes and 7 columns with object datatypes.

    iii. There was no missing data in the fields of the 11 columns of the dataset. You can confirm this in the screenshot below.

    There were no Null-field since all coulmns contained a sum of zero nnull counts.
    (1) The dataset did not have any null field among the 11 columns present.
    There were no Null-field since all coulmns contained non-null counts
    (2) There were 14,374 non-null data tupes including int64 and objects.

    2.2. Conclusions – EDA Process.

    Based on the above results, the dataset does not contain any missing values. The categorical and numerical datatypes are well organized, as shown in the above outputs. The dataset has eleven columns—4 with integer datatypes and 7 with object datatypes. Therefore, the data is clean, ready, and organized for use in the analysis phase of my project.

    Step 2: Data Preprocessing

    The main preprocessing activities performed were dropping the unnecessary columns, handling the categorical data columns, and feature engineering.

    2.2.1. Dropping the unnecessary columns

    Dropping the unnecessary

    The columns dropped were the salary and salary_currency. I dropped them because of one main reason. The salary column had different currencies depending on employee residence and company location, and they were converted into USD from other currencies. Thus, the dropped columns were unnecessary because I only needed the salary amount in one currency.

    2.2.2. Handling the categorical data columns

    I developed a code snippet summarizing and displaying all the categorical columns in the salaries dataset. The first five entries were printed out and indexed from zero, as shown in the sample below.

    2.2.3. The engineered features in the end-to-end data analysis process

    Making the data in the dataset more meaningful and valuable to the project is crucial. Therefore, I engineered two new and crucial features in the dataset. As a result of the engineering process, our new features were full name labeling and salary conversion. Now you have a clue about the engineered features. Next, let us describe how each feature came about and into existence.

    2.2.3.1. Full Name Labeling:

    Initially, the column titles in the dataset were written in short forms by combining the title initials. For example, FT was the short form name for the full-time column, and so on. Thus, I took all the titles written in short form using initial letters, wrote them in their full names, and added the initials at the end of the names. For example, I changed “FT” to Full Time (FT). This ensured proper labeling, understanding, and comprehension, especially during data visualizations.

    The Python code snippet below was used for full naming.

    2.2.3.2. Salary Conversion:

    The initial salary column was in USD dollars. Similarly, just like in the previous feature, I came up with a method of changing the “salary_in_usd” column into Kenyan Shillings and renamed it “Salary_in_KES.” Since the dataset is updated weekly, the conversion process was automated. A function was created that requests the current USD Dollar exchange rate versus the Kenyan Shilling and multiplies it by the salary values in dollars to get the salary value in Kenyan money.

    The function uses an API Key and a base URL for a website that requests the current exchange rate, prints it on the output.

    Then, the function multiplies the exchange rate obtained with salary calculated in USD dollars to create a new column named “Salary_in_KES.” As a result, the screenshot below shows the new column which circled in red color.

    Therefore, every time the data-jobs guide application is launched, the process will be repeated, and the output will be updated accordingly.

    Next, let us briefly proof if the automation really occurs for both the dataset and the exchange rate above.

    2.2.3.3. Proof the automated processes are effective in their work in the project

    This was proven during the end-to-end data analysis process and web application development. This is because the current value was printed out every time the data analysis Jupyter Notebook was opened and the cell ran.

    Exchange rate automation confirmation

    As mentioned earlier, the first output result was captured in November 2023. The exchange rate was 1USD = 156.023619KES

    As I write this post in March 2024, the feature gives back an exchange rate of 137.030367 KES. See the screenshot below.

    Let us find the difference by taking the initial amount minus the current exchange rate. That is 156.023619 – 137.030367 = KES 18.993252. At this moment, the dollar has depreciated against the Shilling by approximately 19 KES.

    Guide Publication Date

    As you may have noticed from the beginning of our guide, I published it in October. But I’ve noted above that I wrote in March 2024 when calculating the difference. Yes, that is true you got it right. However, I pulled the whole site down for unavoidable reasons, and now I’m creating the posts again. I’m doing this to maintain consistency of the results. Later, I will also update it again with 2024 data.

    It is important to note that the process is constant.

    Proof that the dataset is updated weekly

    The dataset is frequently updated as the main data source is updated. To prove this fact, the total entries in the dataset should increase with time. For example, the screenshot below shows 10456 entries in November 2023.

    Similarly, the following screenshot shows 14354 entries in March 2024. This is an increase in the dataset entries thus the changes are automatically reflected in our project.

    Next, let us find the difference. The updated entries are 14374 – 10456 initial entries =

    2.2.3.4. Employee Company Location Relationship:

    I created a new dataset column named “employee_company_location.”

    The program created checks and indicates in the new column if an employee comes from the company location. Therefore, this is true if the employee residence and country codes are the same in the dataset. For example, in the screenshot below, the first person resided in a country different from the company location.

    Step 3: Data Visualization

    Here, we are at the last step of phase 1. I hope you have already learned something new and are getting inspired to jump-start your end-to-end data analysis project. Let me make it even more interesting, energizing, and motivating in the next graphical visualization stage. In the next section, I’m going to do some amazing work, letting the data speak for itself.

    I know you may ask yourself, how? Don’t worry because I will take you through step by step. We let the data speak by visualizing it into precise and meaningful statistical visuals. Examples include bar charts, pie charts, line graphs,

    In this project, I developed seven critical dimensions. The accompanying screenshot figures show the code snippet I developed to process the data and visualize each dimension.

    1. Employment Type:

    Unraveling the significant salary differences based on employee roles. The roles present in the data were full-time (FT), part-time (PT), contract (CT), and freelance (FL) roles.

    Visualization number 1

    Code snippet for the visualization above.

    2. Work Years:

    Examining how salaries evolve over the years.

    Code snippet for the visualization above.

    Work Year Code Snippet

    3. Remote Ratio:

    Assessing the influence of remote work arrangements on salaries.

    Code snippet for the visualization above.

    4. Company Size:

    Analyzing the correlation between company size and compensation.

    Code snippet for the visualization above.

    5. Experience Level:

    Understanding the impact of skill proficiency on earning potential.

    Code snippet for the visualization above.

    6. Company Location:

    Investigating geographical variations in salary structures.

    Code snippet for the visualization above.

    7. Employee Residence:

    Exploring the impact of residing in a specific country on earnings.

    Code snippet for the visualization above.

    8. Salary (USD) – Distribution Per Company Location:

    Investigating howearnings are distributed based on employee residence and company location.

    Code snippet for the visualization above.

    9. Salary (KES) – Distribution Based on Different Company Locations:

    Investigating howearnings in Shillings are distributed based on employee residence and company location.

    Code snippet for the visualization above.

    Salary (KES) distribution based on differenty company loaction.

    In Summary:

    We’ve come to the end of phase 1 – in other words, the end-to-end data analysis project phase. Generally, you have gained in-depth skills in how you can find, acquire, clean, preprocess, and explore a dataset. Therefore, considering this project phase alone, you can start and complete an end-to-end data analysis project for your purposes.

    Whether for your job or class work. Additionally, with the source code snippets, it becomes easy to visualize your data based on them. What I mean is that your dataset may be different, but they directly help you produce similar or better visualizations. In my case and for this project design, the phase opens the door to the second project milestone, phase 2.