Extracting Columns into a new dataframe¶
Data manipulation is a crucial aspect of any data science project, and the pandas library in Python provides powerful tools for this purpose. In this guide, we’ll explore how to extract specific columns from a dataset into a new DataFrame using pandas.
We start by importing the necessary libraries: NumPy for numerical operations and pandas for data manipulation.
Step 1: Importing Libraries¶
import numpy as np
import pandas as pd
Step 2: Loading the Dataset¶
We use the read_csv function from pandas to load a dataset named “Train.csv” into a DataFrame called df. This DataFrame represents our structured data.
df=pd.read_csv("Train.csv")
Step 3: Exploring the Dataset¶
The head(5) function allows us to peek into the first five rows of our DataFrame, providing a snapshot of its structure and contents.
df.head(5)
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 462809 | Male | No | 22 | No | Healthcare | 1.0 | Low | 4.0 | Cat_4 | D |
1 | 462643 | Female | Yes | 38 | Yes | Engineer | NaN | Average | 3.0 | Cat_4 | A |
2 | 466315 | Female | Yes | 67 | Yes | Engineer | 1.0 | Low | 1.0 | Cat_6 | B |
3 | 461735 | Male | Yes | 67 | Yes | Lawyer | 0.0 | High | 2.0 | Cat_6 | B |
4 | 462669 | Female | Yes | 40 | Yes | Entertainment | NaN | High | 6.0 | Cat_6 | A |
Step 4: Extracting a Single Column¶
Here, we create a new DataFrame, df_gender, by extracting the ‘Gender’ column from the original DataFrame df. This is useful when you want to focus on a specific attribute.
df_gender = df[['Gender']]
Step 5: Extracting Multiple Columns¶
Similarly, we create another DataFrame, df_new, by extracting both the ‘Gender’ and ‘Age’ columns. This enables us to work with a subset of the original dataset containing only the columns relevant to our analysis.
df_new = df[['Gender', 'Age']]
df_new.head(5)
Gender | Age | |
---|---|---|
0 | Male | 22 |
1 | Female | 38 |
2 | Female | 67 |
3 | Male | 67 |
4 | Female | 40 |
In conclusion, pandas simplifies the process of extracting columns from a dataset, providing flexibility in tailoring your analysis to specific variables of interest. These operations are foundational in any data science workflow, making pandas an indispensable tool for data manipulation and analysis in Python.