From the course: Python vs. R for Data Science

Data analysis in R

- Analyzing data is an essential part of data science work. It's how you discover insights that shape people's understanding of how the world works, support decision-making and inform product development. One of the strategies for data analysis is hypothesis testing. Hypothesis testing helps achieve inference, which is a process that data scientists use to quantify how certain they are that trends they see in their data will be found in new data. Inference enables you to draw conclusions about a population. And R is well suited for hypothesis testing. Data scientists use hypothesis testing to help answer questions about the world. Now there are many different types of hypothesis tests. Selecting the right hypothesis test depends on the type of data at hand and the type of trend you want to detect. For instance, let's say that you're interested in the following question: Do grad students with pets spend more time outdoors than grad students without pets? And say that you have a decently sized relevant data set to go with it. It could be data on a bunch of grad students wherein each data point represents a student and documents whether they have a pet and how many hours they spend outdoors on average. You would want to detect whether there is a significant difference between the mean time spent outdoors among grad students with pets and the mean time spent outdoors among grad students without pets. And since those with pets are not the same folks as those without pets, they're considered two independent samples. Given all of this information, you could perform a Two Sample Unpaired T-test in this situation. You'd use R's t.test function, and you'd pass in a vector containing the number of hours the sample of grad students with pets spend outdoors and a vector containing the number of hours the sample of grad students without pets spend outdoors. Note that this function has an argument named paired, which is set to false by default. So in this situation, you would not need to specify anything further for the paired argument, but if you were in a different scenario in which you had two dependent samples and wanted to perform a Two Sample Paired T-test, you would pass in the paired argument and set it to true. If you'd like to learn more about t.test, there's a great webpage with helpful documentation that you can find at this link here. I've included it in the resources document as well. Also, the t.test function is just one of many functions that support hypothesis testing in R. And you can find documentation for the other functions if you visit rdocumentation.org and use the search bar at the top. So that's it. Hypothesis testing is a really powerful process to engage with data and learn more about the world. And R has useful tools to support this strategy for data analysis.

Contents