Arif Qodari | Tech Notes - Tidying Dataframe Column Names

Let’s use this sample data.

Code

df = pd.read_csv("https://gist.githubusercontent.com/arifqodari/965b854469b279df026e305864a944c0/raw/43984c8596373d64e602717ed6c3313393c85b2e/flight_dummy.csv")
df

	flightNumber	Airline Name	departure date_time
0	QZ-274	AirAsia Indonesia	2023-11-22 20:30
1	QG-526	Citilink	2023-11-22 16:20
2	QG-522	Citilink	2023-11-22 06:00

Next, let’s define our method tidy_columns containing series of functions to tidy up the column names. Then apply the function to the dataframe above.

def camel_to_snake(str_):
    str_ = re.sub("(.)([A-Z][a-z]+)", r"\1 \2", str_)
    return re.sub("([a-z0-9])([A-Z])", r"\1 \2", str_).lower()

def whitespace_to_underscore(str_):
    return re.sub("\s+", "_", str_)

def tidy_columns(df):
    df.columns = (
        df.columns
        .str.strip()  # strip text
        .map(camel_to_snake)  # change to snake case
        .map(whitespace_to_underscore)  # change any whitespace to underscore
        .str.lower()  # lowercase
        .str.replace("(", "")
        .str.replace(")", "")
    )
    return df


tidy_df = tidy_columns(df)
tidy_df

	flight_number	airline_name	departure_date_time
0	QZ-274	AirAsia Indonesia	2023-11-22 20:30
1	QG-526	Citilink	2023-11-22 16:20
2	QG-522	Citilink	2023-11-22 06:00

Neat! The simple method above tidy_columns do the following:

strip trailing whitespace
change from camel case to snake case (if any)
replace any whitespace (e.g. space) to underscore
change case to lowercase
remove parentheses

If you think that it needs additional treatment, you can simply create another function that accept a string and return a string and put another .map(your_function) in the tidy function above.