Exploratory Analysis of Froth Characteristics in Iron Flotation

Exploratory Analysis of Froth Characteristics in Iron Flotation

Introduction

In this project, I am assuming that I have been recently hired as a data analyst for a manufacturing/engineering/science company. I have been hired as a data analyst for a mining company called Metals R' Us and I have been given data from their flotation plant.

What is Flotation?

The flotation process in iron mining is a method used to separate and concentrate iron minerals from ore.

In the flotation process for iron ore beneficiation, a mixture of finely ground iron ore and a chemical reagent called an amine collector is often referred to as "iron ore pulp." This mixture is fed into flotation cells or columns.

"Ore pulp" typically refers to a slurry or mixture of finely ground ore (in this case, iron ore) along with water and various chemical reagents. In the context of iron ore processing, this pulp may contain additional substances like starch and amines.

  • Iron Ore: The primary component of the pulp is finely ground iron ore. This is the material from which iron is extracted.
  • Silica Impurities: Iron ore often contains impurities, including silica. Silica is a common gangue mineral found in many iron ore deposits. The goal of the processing is to separate the valuable iron minerals from these impurities.
  • Starch: Starch is sometimes used as a depressant in iron ore flotation. It helps to inhibit the flotation of certain minerals, including silica, allowing for better separation of the iron ore.
  • Amines: Amines are often used as collectors in iron ore flotation. They selectively bind to the surface of iron minerals(making them hydrophobic), making them more responsive to flotation.

The Ore pulp is then introduced into the flotation cells or columns, where the flotation process takes place. The reagents like starch and amines work together to selectively float the iron minerals while depressing impurities like silica.

Overall, the goal of the process is to separate the valuable iron minerals from the gangue minerals (including silica) so that a higher-grade iron concentrate can be produced for further processing.


The Data

The dataset is from Kaggle. It contains data from March 2017 to September 2017.

The first column shows the time and date range (from March of 2017 until September of 2017). Some columns were sampled every 20 seconds. Others were sampled on an hourly base.

Data Dictionary

Let us Explore the Data!

1. Setup

The first thing I did was to import all the necessary libraries and read the CSV file.

2. Checking Basic Details

  • There were 1171 Duplicate records which I removed.
  • I scanned the data to see what it looked like using the head(), shape(), and columns function.
  • Converted Date Column from String to Date time.
  • Used the describe method to see the Summary Stats of all the numerical columns in the data.
  • Ensured the time period for the dataset is in fact March 2017- September 2017.
  • Added a new Time column in HH: MM format.
  • Added a new Day of the Week column for Analysis.


The summary stats give the basic statistics of all numerical columns like count, mean, max, standard deviation, etc.

3. Inspecting June Month's data

An investigation for June 01, 2017, was requested. Hence, We filter the data to further inspect if anything looks out of place.

We began by extracting June 01 data by filtering the datetime value between "2017-05-31 23:59:59" < "2017-06-02".

We then inspect only the important columns and do a pair plot to find the relationships.

There is no significant relationship found.

We try to do a line graph to see if there is any relationship between any of the important columns and time.

Between 11:00 and 13:00, there is a spike in Silica Flow and a dip in the Iron%.

Looking at data during this specific time period on June 1, 2017.

  • %Iron Concentrate has slightly dropped (Below 64, Typically it is 65.05 is the mean).
  • %Silica Concentrate is slightly more(Mean: 2.32, Slightly above 4 around the time).
  • The Starch fed was in the 2800-3000 range which is about the mean value in the original dataset.
  • Ore Pulp PH: Slightly below 9.8, Mean: 9.78
  • Amina is slightly lower than average, average – 488.16, Values in the range of 375-500 are observed.
  • The Amina is a reagent that affects %Iron concentrate. Since there is not a huge dip in %Iron Concentrate we can ignore this.
  • No significant data was found to prove anything happened on June 01, 2017. Even though there was a slight drop in %iron and an increase in %Silica between 11:00 and 13:00, there are no factors that caused this change.

4. Month and Day of the Week Vs Iron Concentrate %

I wanted to see if the month and day of the week play any role in %Iron concentrate.

  • There is a light dip in April and August, ignoring September as data is only till mid-September. August: 64.99, April: 64.85.
  • Except for Tuesday and Wednesday other days have hit the 65% mark. The difference is not significant.

5. Correlation Between Important Columns

There are only two correlations observed:

  • A negative 0.8 correlation between %Iron and %Silica concentrate is obvious. If the Iron concentrate is more, the Silica will be less and vice-versa.
  • A positive correlation of 0.66 between Iron Ore Pulp Density and Amina. This does not affect the final Iron Concentrate extracted as we saw the consistency in %Iron concentrate extracted.

6. Correlation between floatation columns

  • Again, I do not see any significant relationship between the Airflow and froth level in the corresponding Flotation columns.
  • There seems to be a higher correlation between Airflow and froth of different flotation columns and this can be ignored as it is not the corresponding columns.

7. Correlation between %iron concentrate and Froth/Airflow

No significant correlation was observed between Iron Concentate % and froth levels nor the Airflow.

8. Histogram of PH

  • The ideal PH for flotation is 9-12 which has been maintained.
  • PH ranges between 8.75 and 10.8 as per the data.

Key Takeaways

  • Upon investigation, there were no significant data found for June 01, 2017. There were no abnormalities in the process.
  • The only strong correlation is between the final %Iron Concentrate and Silica which was -0.8 and Ore Pulp Density and Amina which was 0.66.
  • No correlation between Floatation column's Air flow and Froth levels
  • Except for Tuesday and Wednesday other days have hit the 65% mark. The difference is not significant.
  • There is a slight dip in %Iron Concentrate in April and August : August: 64.99, April: 64.85.
  • PH was maintained between 8.75 and 10.8 as per the data.

The description of the dataset on Kaggle says that the purpose is to predict the %Iron Concentrate and %Silica in advance. This could be the reason why there are not many insights that could be found(There is nothing weird happening).

Nevertheless, this exploratory analysis proved highly insightful. It served its purpose well, providing a deep understanding of the dataset!


Well, That's all folks!

Thank You so much if you have made it this far!

If you liked this article connect with me on LinkedIn! I will be posting a lot more fun Data Analytics Projects!



Jordan Huayhua

Metalurgista | Analista de Datos | Especialista en programación y simulación de procesos industriales

2mo

A key component in data analysis is the domain expert. As a metallurgist, I have observed conclusions that could lead to discrepancies in meetings on metallurgical processes. For example, a strong correlation between iron and silica in the concentrate would not contribute much, as both are present in the final product. Since the objective is to recover iron, if there is more silica, there will be less iron, so further analysis is unnecessary. Additives such as amine and starch do affect iron recovery. If this is not observed, it could be due to the following reasons: the flow data for amine and starch may not be metallurgically significant enough to influence iron recovery; the data collection interval for these flows may be too short compared to the interval for iron recovery, meaning it is necessary to calculate an average of the independent variables before associating them with the dependent variable (iron recovery). Finally, there may not be a separate correlation between amine and starch with iron recovery, but there could be one when considered as combined variables (interaction: X1*X2, X1/X2, etc.). This should not be confused with a simple multivariable linear regression.

Like
Reply
Madeeha Umar

Data Analyst | SQL | Tableau | Excel | R | Data Visualization

1y

Great work Janani Teklur! I loved you simple explanations and analysis. Very well written.

Asif Saifi

Consultant -Data Operations and analytics at Genpact | Driving Data Excellence and Business Insights

1y

Amazing work! Python is such a powerful programming language.

Niel de Kock

Editor of 'The AI Way' a weekly email newsletter focussed on Education and AI. | Pioneering AI in Education & Self-Learning | Explore AI's Frontier with My Weekly Newsletter |1010+ Subscribers & Growing

1y

A great analysis Janani Teklur Srinivasa. Will you be creating a predictive analysis model as well.

Nivetha Ramesh

Data Scientist at EMeRG | Data Science | Web Scraping | Statistics | Machine Learning | Deep Learning | Artificial Intelligence | LLM | Big Data | Python | Research | SQL | PowerBI

1y

Excellent work and impressive!Janani. I love the correlation part where you used heatmap. I have a question though. Just wanna know. Shouldn't we also use scatter plot to check the correlation?

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics