Merging Datasets into One DataFrame with Concat in Python

Merging Datasets into One DataFrame with Concat in Python

Merging Datasets into One DataFrame with Concat in Python

Merging Datasets into One DataFrame with Concat in Python for Efficient Data Analysis. Merging several dataset files into one data frame is one of the challenges in the preparation process of data analysis. To meet stakeholder needs, data analysts will need to collect related data that may be sourced from multiple datasets. Then what if the data analyst needs to combine several of these datasets into a single data frame? There are several ways that this can be done, be it using Microsoft Excel, R, SQL, or Python.

BACA ARTIKEL INI DALAM BAHASA INDONESIA DI SINI

In this article, I will discuss how to combine dataset files into one dataframe by using the Concat function of the Pandas library in the Pyhton programming language.

1. Dataset Source

Previously, I was doing data analysis of daily stock data in the S&P 500 index for the period 2010–2023. I got the dataset from Kaggle; here is the link: S&P 500 Daily. The S&P 500 stock dataset consists of daily stock data, including close, open, low, high, and volume price data every day.

Here’s an example of one of the datasets.

READ ALSO: TWITTER DATA SCRAPING

2. Import The Libraries

To merge multiple dataset files into a single dataframe, we need the Pandas library to run the concat function.

import pandas as pd
import numpy as np

3. Import The Dataset

After that, import all of the datasets to be merged and convert them to a new dataframe.

close = pd.read_excel(r'C:\Users\XX\Documents\SSARAS\Data Analysis\SnP500 All assets (close).xlsx')
open = pd.read_excel(r'C:\Users\XX\Documents\SSARAS\Data Analysis\SnP500 All assets (open).xlsx')
low = pd.read_excel(r'C:\Users\XX\Documents\SSARAS\Data Analysis\SnP500 All assets (low).xlsx')
high = pd.read_excel(r'C:\Users\XX\Documents\SSARAS\Data Analysis\SnP500 All assets (high).xlsx')
volume = pd.read_excel(r'C:\Users\XX\Documents\SSARAS\Data Analysis\SnP500 All assets (volume).xlsx')

Don’t forget to use “r” before entering the file location link so that the ” \ ” sign can be read. Here are the steps to get the dataset file location link that has been downloaded and stored on the computer:

  1. Open the folder that contains the dataset files.
  2. Click the dataset file.
  3. On the menu bar (Home) click copy path.
  4. Paste the link into python code.
  5. Remove quotation marks ” at the beginning and end of the file path link.

Enter the code below to see each dataset that has been imported.

print(close)
print(open)
print(low)
print(high)
print(volume)

4. Merge Multiple Datasets Into One DataFrame

Dataset merging comes next. As part of this procedure, I will also index each item of data in the dataframe and arrange them into groups based on the type of data.

Merging Datasets into One DataFrame with Concat in Python

Here’s the Python code to combine dataset files into one dataframe (the snpdata dataframe).

snpdata = pd.concat([close, open, low, high, volume], keys=['close', 'open', 'low', 'high', 'volume'])
print(snpdata)

Let’s look at the results of the merger process:

Merging Datasets into One DataFrame with Concat in Python

5. Export the new dataframe to an Excel spreadsheet

To export a new dataframe into an Excel file, we can use the following code:

snpdata.to_excel('snpdata.xlsx')

Note 1: This dataframe requires an index to show the category of each data point, so I did not add the code index=false in the export code. If the code index is false, then the resulting Excel file will not display the category index of the merged result.

Results of merging datasets (Excel files)

Note 2: Don’t forget to change the date column to date format if the results in the exported Excel file are not in the form of dates.

snpdata.to_excel('snpdata.xlsx')

6. Conclusion

Merging datasets into one DataFrame with Concat in Python makes data processing easier and more efficient. Keep in mind, however, that the data preparation procedure must be tailored to the demands and goals of the data analysis itself.

Hopefully, this post will assist you with your data analysis process. I’ll go over a few more sorts of dataset merges after that. Don’t forget to sign up for the newsletter on this page. Yes, I will notify you by email whenever there is a new post on this website. If you have any suggestions or questions, please leave them in the comments area. Thank you

Let's connect
Passionate about Data Analytics, Entrepreneurship, and Business Strategy. Exploring the intersection of data-driven insights and innovative business decisions. B. Economics | MBA-Entrepreneurship | Data Analytics Certified | Analyst and Research Reach me at saraswatisepti@gmail.com or hai.ssaras@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
Translate »