Data Analysis with Python: Tackling Challenges with NumPy
Announcement
Today I will add a set of videos that will show you how you can build a simple chat bot with just Python. This is a great project if you are just learning Python. You will find the videos in the functions section of the course. I have also just added a set of new videos on Python to the course: Master Python Fundamentals—The Ultimate Python Course for Beginners:
This course will become a full Python course covering all the fundamentals of Python: Join the course to learn Python the easy way: Master Python Fundamentals: The Ultimate Python Course for Beginners.
When you sign-up, you get a copy of Master Python Fundamentals book
Can you answer this challenge on while loops? Click to view challenge
Introduction
Pandas is widely celebrated as one of the best libraries for data analysis, and for good reason. It shines in handling and manipulating structured data. NumPy does not get as much attention as Pandas. That is understandable considering that it does not have the versatility of the Pandas library. While NumPy may not have the same versatility as Pandas, it still possesses a lot of functions that can be vital to anyone working with data. In this article, In last week's article, we saw how memory-efficient NumPy arrays are compared to Python lists. In this article, we will explore some functionalities of NumPy by tackling five challenges from the book 50 Days of Data Analysis with Python: The Ultimate Challenges Book for Beginners. These challenges will give you a glimpse of the power of NumPy
Through these hands-on challenges, we’ll explore key NumPy functionalities and demonstrate how they can be applied to common data analysis tasks. These challenges will show you how powerful and versatile NumPy can be. So, let’s dive in and start exploring the world of NumPy!
Challenge 1: Creating an array and calculating the standard deviation. Use the list below:
list_str = ["23", "12", "90", "28", "30"]
You must have noticed that the data in the list is of the string data type. Since the question requires that we calculate the standard deviation of the numbers in the array, when creating an array we must change the data type of the numbers to integer data type. Converting strings to integers allows for numeric operations. In the code below, the np.array() function converts the list to a NumPy array. The dtype=int ensures that the elements are converted to integers. Standard deviation (np.std) measures how much the values deviate from the mean (average).
Challenge 2: Write a code to change the data type of the array you created in question one (1) to a floating data type. Save this as a new variable.
This question is asking that we convert the array from question 1 to float data. The significance of tackling this challenge is that it will introduce you to the astype() method. This method demonstrates the flexibility of NumPy arrays. Data comes in many formats and some formats may not be appropriate for certain tasks, such as analysis. This method allows you to change the data type of array elements to adapt arrays to various data processing tasks. We are going to use this method to convert the array returned by the function in challenge one to float data type:
You can see that the array has been converted to a float data type. Floating-point data types are useful for operations that may involve decimals. This allows for more precise calculations.
Data Analysis with Python: Practice Practice Practice.
The main purpose of this book is to ensure that you develop data analysis skills with Python by tackling challenges. By the end, you should be confident enough to take on any data analysis project with Python. Start your 50-Day journey with "50 Days of Data Analysis with Python: The Ultimate Challenge Book for Beginners."
Recommended by LinkedIn
Challenge 3: Write a code to return the sum of values [90.0, 28.0, 30.0] from the array with floating data type you just updated in question (2).
There are several methods that you can use to answer this question. I will share the boolean mask method. This method performs an element-wise comparison between array_float and the values. In the code below, first we create a boolean array where each element is True if the corresponding element in array_float is equal to 90.0 and False otherwise. Next, we perform a bitwise OR operation using the |= operator by updating the boolean array (mask). Elements that are equal to 28.0 or 30.0 are also marked as True. This creates a boolean array where each element is True if it matches 90.0, 28.0, or 30.0, and False otherwise. This mask is then used to filter the array_float. Only elements where the mask is True are selected, creating a new array containing [90.0, 28.0, 30.0]. We then use the np.sum() function to calculate the sum of the values in the filtered array.
You can see that we get 148.0 as the answer. By using this boolean mask method to filter data, we avoid loops, which can be less efficient, especially when working with larger datasets. In the book, you will find another indexing method that you can use for this challenge.
Challenge 4: Create a 2-dimensional array of random integers from 0 to 100 with the shape (5, 5). Use the array to find the minimum and maximum values, as well as the mean and standard deviation. Ensure that the results are reproducible.
To generate a 2D array of random integers, we use NumPy's default_rng to create a random number generator initialized with a reproducible seed (seed = 42). This basically ensures that the results of this operation are reproducible. The integers generated range from a low of zero to a high of 100. The shape of the resulting array is (5 * 5). Once we have the array, we use the NumPy functions to compute the min, max, mean, and std of the array. See the code below:
Challenge 5: Below are two arrays:
first_names = ["John", " Kenny"]
last_names = ["Smith", " Sakula"]
Using NumPy’s char.join() function, create two arrays by joining the first name with the last name. Your first array should be: array(['John', 'Smith'], dtype='U6'). Your second array should be: array(['Kenny', 'Sakula'], dtype='U6').
This question requires that we create two arrays with dtype = 'U6'. This data type basically means that the resulting arrays will support Unicode strings of up to 6 characters. We are going to use NumPy's np.char.join function to concatenate corresponding elements of the two input lists. The result is a two-dimensional NumPy array. The first column of the resulting array will have the names "John" and "Smith." We are going to slice this column using [:, 0]. The second column will have the names "Kenny" and "Sakula." We will slice this column using, [:, 1].
You can see in the output that we have combined the names.
Final Thoughts
You can see in these examples that NumPy is not just efficient, but it also has great functions that can process and manipulate data. This makes NumPy a crucial tool to add to your toolbox if you work with data. However, these examples merely scratch the surface of NumPy's vast potential. The library is vast, offering a wide range of features beyond what we’ve explored here. These questions have been lifted from the book, 50 Days of Data Analysis with Python: The Ultimate Challenges Book for Beginners. The book is packed with hands-on exercises designed to strengthen your understanding of NumPy and other Python libraries used in data analysis. Keep learning, keep exploring, and keep experimenting. Thanks for reading.
Learn How to Build a Finance Tracker with Python
OK Boštjan Dolinšek
Student at IEM, Salt Lake Kolkata|SAE IEM Collegiate Club|IEM Toastmasters Club|IEI|IEEE IAS IEM|IEEE CS IEM| Uttaran Club|IIC IEDC LABIEM''27
1moInsightful!
--
1moI am interested in learning python
Business Analyst (junior) | Data Analyst (junior) | Operations Manager | Operations Analyst | Market Research Analyst | Data Analyst Intern | Business Analyst Intern
1moInterested to learn python
I am interested in learning Phyton and data analytics, please let me know how should I proceed