Iron Ore Incident Investigation: An Analysis of Manufacturing Processes

Megan Smith

Expert Data Analyst │ Excel │ Tableau │ SQL │ R │ Python │ Domo

Published May 16, 2024

Introduction

The United States produced about 46 million metric tons of iron ore in 2022. This only accounted for about 2% of the total iron ore mined globally. For the past 50 years in the United States, more than 90% of iron ore production has been mined from Lake Superior deposits. Being from Michigan, the Great Lakes State, learning more about this mining process was especially intriguing. Iron ore is a raw material used for building bridges, trains and rails, buildings, bikes, and much more. It’s everywhere, including smartphones, and even paint and some cosmetics. You can learn more about where iron ore is present in various products here.

The Mining Process

Iron ore is not found in pure form. It is mixed in with dirt, rock, and other minerals. Flotation plants are commonly used to separate valuable iron ore from impurities. Here is a basic rundown of the process:

Crush and Grind: Crush the rocks down to create a fine powder
Mix with Water: Mix the fine powder with water to create a thick, watery mud called slurry or pulp.
Add Chemicals: Three types of chemicals are added to the slurry: collectors, modifiers, and depressants. Collectors, like amines, are designed to attach to the surfaces of iron ore particles so they can be easily separated from impurities later. Modifiers control the pH levels of the slurry. Maintaining the correct pH level impacts how well the collectors attach to iron ore particles. Depressants, like starch and silica, prevent unwanted minerals from adhering to bubbles where the iron ore particles attach in the next phase of the mining process.
Create Bubbles: Air is blown into the slurry to create a frothy layer. With the added chemicals, the iron ore particles attach to the bubbles.
Collect the Froth: The iron ore bubble froth is skimmed off the top of the slurry.
Separate: The collected froth may still contain other impurities but is rich in iron ore. Then, it is processed further.
Dry and Prepare for Shipping: The concentrated iron ore is dried and prepared for shipping to be used in iron and steel factories.

The data for this project focuses on steps three through six of this process.

Objective

For this project, I’m assuming a data analyst role for a mining company to analyze data from a flotation plant. I have been tasked with the following main objectives:

Identify correlations within the dataset.
Investigate if something significant happened on June 1st.
Determine how the critical columns changed throughout the day on the first of June.

Key Findings

The pH levels of the ore pulp are always between 8.8 and 10.8, which is within the target range
Strong linear relationships exist between the percent silica and iron feeds and the percent silica and iron concentrations.
Fluctuations in critical columns on June 1st were within the normal range, indicating there was not an incident on the first of June.

The Data

The data for this project is from March 2017 to September 2017 and can be found on Kaggle, a community platform where data analysts and data scientists can discover and publish datasets. This is real-world data from a flotation mining plant in South America. The collected data is from one portion of the refining iron ore process.

This dataset contains over 730,000 samples or changes in the flotation process. The timestamp column reveals that some samples were taken every 20 seconds, while others were taken hourly. The columns containing the percent iron feed and percent silica feed are quality measures of the iron ore before it is fed into the flotation plant. The columns that state the starch flow, Amina flow, ore pulp flow, ore pulp pH, and ore pulp density are the most critical variables that impact the ore quality at the end of the process. The airflow and the slurry froth level columns also affect ore quality. Lastly, the percent iron concentrate and percent silica concentrate are measured again at the end of the flotation process.

The data was downloaded as a Comma Separated Values (CSV) file and then imported into Python, a computer programming language used to perform data analysis, through Deepnote. Like Jupyter, Deepnote is an interactive development environment (IDE) that allows you to write and run Python code and visualize data, but it is a cloud-based platform.

Python Libraries Used in This Project:

Pandas
Seaborn
Matplotlib
NumPy

Analysis: Data Overview

To begin this analysis, I imported the Python libraries Pandas, Matplotlib, Seaborn, and NumPy, as shown in the code below:

Click the picture above to enlarge the query on certain devices

Then, I used the Pandas library to import the dataset into Python. To shorten the name of the CSV file, I called it “df” for the data frame, as shown below:

To get a snapshot of the data, I utilized the method “head()” on the data frame object to show the first five rows of the dataset. Below is the code and a portion of the corresponding output:

Click the pictures above to enlarge them on certain devices

Notice how elements within the columns have a comma instead of a decimal point. This problem would make any analysis inaccurate. As a result, I wrote the Python code below to change each comma within the dataset to a period and save over the original data frame. This way, each time I call “df” in the code, it will use the modified table with decimal points in the appropriate places. Below is the code and snapshot of the corrected data frame.

Next, I wanted to know the size of the dataset. So, I used the “shape” attribute within Pandas to retrieve the dataset's number of rows and columns. Below is the code that I utilized for this overview:

The output for this code was:

(737453, 24)

This means there were 737,453 rows, which represent samples or various measurements taken during the flotation process, and 24 columns that represent the percent iron and silica entering the plant, chemical measures, airflow adjustments, froth levels, and final percentages of iron and silica concentrates.

Next, I wanted a quick rundown of each column's name so I could easily refer back to them throughout my analysis. Below is the code and corresponding output:

In many datasets, dates are commonly classified as strings or other quantities, which can be problematic when performing an analysis. I used the following code to determine how dates were classified within the dataset:

The output for this code was:

<class 'str'>

‘str’ means the data within this column is categorized as text. Since dates must correlate to a timestamp, I wrote the following code to make this adjustment:

The output for this code was:

<class ‘pandas._libs.tslibs.timestamps.Timestamp’>

This means I successfully changed the “Date” column from being classified as text to a date or timestamp.

Now, I wanted to determine the timeframe for the dataset. So, I used the following code to determine the starting and ending dates:

The output for this code was:

The min date is 2017-03-10 01:00:00
The max date is 2017-09-09 23:00:00

Recommended by LinkedIn

UNDERGROUND TRANSPORT IN GOLD MINES

Mariana Jacobs 4 years ago

AusIMM-aligned study phase modelling made easy

idoba 4 months ago

Shaping the Future: Tungsten Carbide Powder Market…

Kailaash RS [LION] 9 months ago

This means the dataset contains mining data from the flotation plant from March 10, 2017, through September 9, 2017.

Analysis: Column Aggregations

Next, I wanted to get some basic aggregations on each column. Python makes this very easy using the “describe” method, as shown below:

The descriptive statistics in the table above show:

Count: The number of non-missing values in each column
Mean: The average
Standard Deviation (std): A measure of dispersion that shows how spread out the data is from the mean
Minimum (min): The smallest value in the column
25th Percentile (25%): The first quartile, which is the value for which 25% of the data falls
50th Percentile (50%): The median, which separates the data into two equal halves
75th Percentile (75%): The third quartile, which is the value for which 75% of the data falls
Maximum (max): The largest value in the column

The target range of pH levels of the ore pulp, or slurry, should always stay between seven and eleven. Looking at the last column displayed above, Ore Pulp pH, the pH level is always between 8.8 and 10.8, so this is right on track. This means the pH levels were very stable when this data was collected, which is a good sign that the flotation plant is running well.

The percent iron concentrate, measured at the end of the flotation process, has a mean of 65%. That is over a 10% increase in iron ore from the start of the flotation plant. Additionally, the percent iron concentrate has a range of 5.96% and a standard deviation of 1.12%. Since these measures of spread are relatively small, the floatation plant successfully purifies the iron ore.

Analysis: Identifying Linear Relationships

Then, I created a correlation coefficient heatmap to determine if there were any strong correlations between variables in the dataset. A correlation coefficient, or r-value, is always between –1 and 1 and tells you how closely two variables follow a line of best fit, or linear regression line. Values where r is greater than or equal to the absolute value of 0.8 are considered relatively strong correlations, with 1 or –1 representing the strongest correlations where every data point is precisely on the linear regression line. r-values that are close to zero represent variables that have relatively no correlation. Below is the Python code and corresponding output for this visualization.

As previously discussed, r-values greater than or equal to 0.8 or less than or equal to -0.8 are desirable when identifying linear relationships between variables. So, in the heatmap above, we are looking for squares, or cells, that are shaded either navy blue or white. Navy blue cells represent positive correlations between variables, while white cells represent negative linear relationships. In the visualization above, most variables show relatively no linear correlation. However, a few navy blue cells exist between consecutive flotation air flow columns. It is not surprising that a linear relationship may exist as the airflow adjusts from one cell to the next as the pulp or slurry moves through the flotation plant.

Two cells are nearly white and are circled in red in the correlation heatmap above. One is a negative linear relationship between the percent silica feed and the percent iron feed with a correlation coefficient of –0.97, and the other is a negative linear relationship between the percent silica concentration and the percent iron concentration with a correlation coefficient of –0.80. These strong linear relationships show that as the percent silica feed increases, the percent iron feed tends to decrease, and vice versa. Similarly, this same relationship exists between the percent silica and iron concentrations. Since silica is used as a depressant in flotation plants to prevent unwanted minerals from adhering to bubbles where the iron ore particles attach, it makes sense that less silica is needed when more iron is present in the slurry. This indicates that the flotation plant was functioning correctly when this data was collected.

Analysis: Pair Plot of Critical Columns

Management at the flotation plant indicated that an incident on June 1st may have occurred and needs to be investigated. Based on criteria provided by management, critical columns of the data were identified. Below is the Python code and the first few rows of data for the first of June.

Filtering the data for June 1st still resulted in over 4,000 rows of data. While Python gives a quick overview of the data in each column, it is not enough to conclude something significant occurred. As a result, to determine relationships between variables within the dataset, I created a pair plot using Seaborn. This visualization summarizes a large amount of data into a single figure, making it easy to identify correlations between variables if they exist. Below is the Python code for the pair plot and corresponding output.

In the pair plot above, joint and marginal distributions are plotted simultaneously. The grey histograms along the diagonal estimate the marginal distributions of each numerical feature of the dataset. The linear regression lines on each scatterplot above emphasize that there is relatively no correlation between any of the given variables. Comparing this to the average day at the flotation plant may reveal if something significant occurred on June 1st. As observed earlier, with the correlation heat map of the whole dataset, there was a strong negative linear correlation between the percent silica concentration and the percent iron concentration. However, as outlined in red in the above pair plot, the data does not closely follow the line of best fit, indicating that there is relatively no linear correlation between these two variables on June 1st.

To confirm the lack of correlation between these two variables on June 1st, I created another correlation heat map in Python with these same critical columns, as shown below.

Click the picture above to enlarge the visualization on certain devices

Since the r-value is –0.27, there is relatively no linear correlation between the percent silica concentration and the percent iron concentration on June 1st. However, the correlation coefficient for these two variables across the whole dataset is –0.80. This contrast in r-values seems atypical and may indicate that an incident occurred on June 1st, but further analysis is required to be sure.

Analysis: Change in Linear Correlation over Time

Since the relationship between the percent silica and the percent iron concentrations drastically changed on the first of June, I wondered how often this occurs. If it happens regularly, it may be essential to identify a pattern to the loss of this linear relationship to understand how to optimize this flotation plant best. So, I created a code in Python that compares the daily correlation coefficients between the percent silica concentration and the percent iron concentration over the span of the provided dataset to see what kind of relationships, if any, exist. Below is the code and corresponding visualization.

The dashed red horizontal lines represent correlation coefficients of 0.8 and –0.8. Points above 0.8 or below –0.8 indicate strong linear correlations for that day. The first of June is circled in red in the above visualization. While the correlation between these two variables across the whole dataset tends to be strong (r = –0.80), it is not unusual for some days to have relatively no correlation between the percent silica concentration and the percent iron concentration. It should be noted that when there is relatively no linear relationship between these two variables, it occurs for a couple of days at most. Then, the strong linear relationship is present again. As a result, there is still no indication that an incident occurred on June 1st.

Analysis: Hourly Changes on June 1st

Investigating further, I created Python code to visualize hourly changes on June 1st within the critical columns management was concerned about. Below is the code utilizing a for-loop to show all graphs and the corresponding visualizations.

The above visualization shows that the percent iron concentration dips at 1:00 p.m. (13:00) on the first of June and again at the end of the day. Although these dips occur, they represent less than a 4% change and are within normal limits.

Similarly to the percent iron concentration, the percent silica concentration fluctuated throughout the day on the first of June, remained within a 4% range, and is within normal limits.

Ore Pulp pH also fluctuates but is still within a stable range, indicating that the flotation plant is running well.

The flotation level in cell five takes a significant dip at 6:00 p.m. (18:00) on June 1st. This column shows froth levels within the fifth cell and gives the thickness of the floats. The lower the level, the higher the concentration grade of the ore being processed. This drastic change could mean one of several things, such as:

A change in the feed material entering this cell, such as core composition, size, or feed rate.
Reduced air supply, leading to poor froth formation and decreased flotation.
Changes in chemicals such as collectors, modifiers, and depressants.
Froth removal from this cell.

While this may cause some concern, this dip in flotation level in this cell also occurred in other flotation cells on days the plant ran flawlessly. So, although there were some variations in the samples collected on June 1st, they were still within acceptable range. This means the flotation plant was running smoothly on June 1st, and there is no cause for concern.

Flotation Plant Findings and Next Steps

The pH levels of the ore pulp are always between 8.8 and 10.8, which is within the target range even on the first of June, the day of a possible incident.
There are few strong linear relationships across the dataset except for the percent silica and iron feeds and the percent silica and iron concentrations. Both of those relationships have very strong negative linear correlations, which means that as one variable increases, the other decreases. This floatation plant successfully produces iron ore concentrations with a small standard deviation, which means low variability. Both of these are signs that the manufacturing process in this plant is functioning well.
While some fluctuation exists in values within critical columns, they are still within the normal range for a successful flotation plant. This includes fluctuations observed on the first of June, meaning no incident occurred on June 1st.
The next steps would include creating predictive models for the percent of iron and silica concentrations. Since these are measures taken at the end of the flotation process, creating these models would help the engineers and scientists tweak the process to avoid pitfalls before they occur and maximize the percent iron ore concentration while minimizing the percent silica concentration.

Feedback

As a math educator and data analyst, applying my strong background in mathematics to this project was exciting. I am constantly learning and growing my data skills. I value constructive feedback, so please feel free to message me to share any insights or suggestions. I’m also actively pursuing a data analyst role. If you know of any opportunities, I’d love to learn more. Connect with me on LinkedIn, check out my portfolio, and stay tuned for more of my data projects in the future.

Christy Ehlert-Mackie, MBA, MSBA

Data Analyst who 💗 Excel | SQL | Tableau | I analyze and interpret data so companies have the information and insights they need to make sound business decisions.

7mo

Nice job, Megan! I am from Minnesota and iron ore mining was a big industry in the northern part of the state (on the other end of Lake Superior). There is a state park at a former iron ore mine that I remember touring as a kid. It's incredible how large the mines are. The trucks are huge but in the mine pits they look like toys in comparison.

Roberto Busby

Empowering Organizations with Efficient Reporting and Data-Driven Insights

7mo

Great article! I found the charts particularly insightful, especially with how you highlighted key areas by encircling them. It really helped focus my attention on the important points you were discussing.

1 Reaction

Deb Ferrer

Prompt Data Analyst || Data Analytics || Prompt Engineering || AI || Excel

7mo

Very impressive! I especially like how you began the story with an engaging question. 👍🏼

1 Reaction

Karthik Thurimella

7mo

Good job Megan Smith

1 Reaction

Andres Cordero

7mo

I like the clean and detailed way you present this project. Great job!

1 Reaction

See more comments

To view or add a comment, sign in

See all

Iron Ore Incident Investigation: An Analysis of Manufacturing Processes

Megan Smith

Expert Data Analyst │ Excel │ Tableau │ SQL │ R │ Python │ Domo

Introduction

The Mining Process

Objective

Key Findings

The Data

Python Libraries Used in This Project:

Analysis: Data Overview

Recommended by LinkedIn

Analysis: Column Aggregations

Analysis: Identifying Linear Relationships

Analysis: Pair Plot of Critical Columns

Analysis: Change in Linear Correlation over Time

Analysis: Hourly Changes on June 1st

Flotation Plant Findings and Next Steps

Feedback

More articles by this author

Insights from the community

Others also viewed

SIVAPILLARISETTY.COM - Weekly Newsletter - 10/15/2024 - Gold & Gold Mining Explained.

The Application of Soft Sensors in the Mining Industry

How Technology is Shaping the Mining Industry

Flotation Reagents Market Outlook: Trends and Growth Opportunities

Copper

Chris Fleming: Technical Innovator and Businessman

How to Select and Design Mineral Processing Equipment for Ore Dressing Plants

How can AI help mining companies to predict iron ore demand?

Chromium Market Comprehensive Study Explore Huge Growth in Future

The Gritty Reality of Mining: Not Your Average Assembly Line

Explore topics

Introduction

The Mining Process

Objective

Key Findings

The Data

Python Libraries Used in This Project:

Analysis: Data Overview

Recommended by LinkedIn

Analysis: Column Aggregations

Analysis: Identifying Linear Relationships

Analysis: Pair Plot of Critical Columns

Analysis: Change in Linear Correlation over Time

Analysis: Hourly Changes on June 1st

Flotation Plant Findings and Next Steps

Feedback

Dominating the Court: An Analysis of the 2023-2024 NBA Season

May 29, 2024

Examining Employee Attrition: An HR Analysis of Layoff Patterns

May 24, 2024

Providing the Best Care: An Analysis of Diabetic Patients

Jul 18, 2023

Aiding Developing Countries: An Analysis of Loans and Grants

Jul 6, 2023

Increasing Graduation Rates: An Analysis of Massachusetts Public Education

Jun 20, 2023

DoorDash: Optimizing Marketing Trends through Data Analytics

May 9, 2023

Insights from the community

Others also viewed

SIVAPILLARISETTY.COM - Weekly Newsletter - 10/15/2024 - Gold & Gold Mining Explained.

The Application of Soft Sensors in the Mining Industry

How Technology is Shaping the Mining Industry

Flotation Reagents Market Outlook: Trends and Growth Opportunities

Copper

Chris Fleming: Technical Innovator and Businessman

How to Select and Design Mineral Processing Equipment for Ore Dressing Plants

How can AI help mining companies to predict iron ore demand?

Chromium Market Comprehensive Study Explore Huge Growth in Future

The Gritty Reality of Mining: Not Your Average Assembly Line

Explore topics