Exploratory Analysis of Froth Characteristics in Iron Flotation
Introduction
In this project, I am assuming that I have been recently hired as a data analyst for a manufacturing/engineering/science company. I have been hired as a data analyst for a mining company called Metals R' Us and I have been given data from their flotation plant.
What is Flotation?
The flotation process in iron mining is a method used to separate and concentrate iron minerals from ore.
In the flotation process for iron ore beneficiation, a mixture of finely ground iron ore and a chemical reagent called an amine collector is often referred to as "iron ore pulp." This mixture is fed into flotation cells or columns.
"Ore pulp" typically refers to a slurry or mixture of finely ground ore (in this case, iron ore) along with water and various chemical reagents. In the context of iron ore processing, this pulp may contain additional substances like starch and amines.
The Ore pulp is then introduced into the flotation cells or columns, where the flotation process takes place. The reagents like starch and amines work together to selectively float the iron minerals while depressing impurities like silica.
Overall, the goal of the process is to separate the valuable iron minerals from the gangue minerals (including silica) so that a higher-grade iron concentrate can be produced for further processing.
The Data
The dataset is from Kaggle. It contains data from March 2017 to September 2017.
The first column shows the time and date range (from March of 2017 until September of 2017). Some columns were sampled every 20 seconds. Others were sampled on an hourly base.
Data Dictionary
Let us Explore the Data!
1. Setup
The first thing I did was to import all the necessary libraries and read the CSV file.
2. Checking Basic Details
The summary stats give the basic statistics of all numerical columns like count, mean, max, standard deviation, etc.
3. Inspecting June Month's data
An investigation for June 01, 2017, was requested. Hence, We filter the data to further inspect if anything looks out of place.
We began by extracting June 01 data by filtering the datetime value between "2017-05-31 23:59:59" < "2017-06-02".
We then inspect only the important columns and do a pair plot to find the relationships.
There is no significant relationship found.
We try to do a line graph to see if there is any relationship between any of the important columns and time.
Recommended by LinkedIn
Between 11:00 and 13:00, there is a spike in Silica Flow and a dip in the Iron%.
Looking at data during this specific time period on June 1, 2017.
4. Month and Day of the Week Vs Iron Concentrate %
I wanted to see if the month and day of the week play any role in %Iron concentrate.
5. Correlation Between Important Columns
There are only two correlations observed:
6. Correlation between floatation columns
7. Correlation between %iron concentrate and Froth/Airflow
No significant correlation was observed between Iron Concentate % and froth levels nor the Airflow.
8. Histogram of PH
Key Takeaways
The description of the dataset on Kaggle says that the purpose is to predict the %Iron Concentrate and %Silica in advance. This could be the reason why there are not many insights that could be found(There is nothing weird happening).
Nevertheless, this exploratory analysis proved highly insightful. It served its purpose well, providing a deep understanding of the dataset!
Well, That's all folks!
Thank You so much if you have made it this far!
If you liked this article connect with me on LinkedIn! I will be posting a lot more fun Data Analytics Projects!
Metalurgista | Analista de Datos | Especialista en programación y simulación de procesos industriales
2moA key component in data analysis is the domain expert. As a metallurgist, I have observed conclusions that could lead to discrepancies in meetings on metallurgical processes. For example, a strong correlation between iron and silica in the concentrate would not contribute much, as both are present in the final product. Since the objective is to recover iron, if there is more silica, there will be less iron, so further analysis is unnecessary. Additives such as amine and starch do affect iron recovery. If this is not observed, it could be due to the following reasons: the flow data for amine and starch may not be metallurgically significant enough to influence iron recovery; the data collection interval for these flows may be too short compared to the interval for iron recovery, meaning it is necessary to calculate an average of the independent variables before associating them with the dependent variable (iron recovery). Finally, there may not be a separate correlation between amine and starch with iron recovery, but there could be one when considered as combined variables (interaction: X1*X2, X1/X2, etc.). This should not be confused with a simple multivariable linear regression.
Data Analyst | SQL | Tableau | Excel | R | Data Visualization
1yGreat work Janani Teklur! I loved you simple explanations and analysis. Very well written.
Consultant -Data Operations and analytics at Genpact | Driving Data Excellence and Business Insights
1yAmazing work! Python is such a powerful programming language.
Editor of 'The AI Way' a weekly email newsletter focussed on Education and AI. | Pioneering AI in Education & Self-Learning | Explore AI's Frontier with My Weekly Newsletter |1010+ Subscribers & Growing
1yA great analysis Janani Teklur Srinivasa. Will you be creating a predictive analysis model as well.
Data Scientist at EMeRG | Data Science | Web Scraping | Statistics | Machine Learning | Deep Learning | Artificial Intelligence | LLM | Big Data | Python | Research | SQL | PowerBI
1yExcellent work and impressive!Janani. I love the correlation part where you used heatmap. I have a question though. Just wanna know. Shouldn't we also use scatter plot to check the correlation?