Next, let’s define our method tidy_columns containing series of functions to tidy up the column names. Then apply the function to the dataframe above.
def camel_to_snake(str_): str_ = re.sub("(.)([A-Z][a-z]+)", r"\1 \2", str_)return re.sub("([a-z0-9])([A-Z])", r"\1 \2", str_).lower()def whitespace_to_underscore(str_):return re.sub("\s+", "_", str_)def tidy_columns(df): df.columns = ( df.columns .str.strip() # strip text .map(camel_to_snake) # change to snake case .map(whitespace_to_underscore) # change any whitespace to underscore .str.lower() # lowercase .str.replace("(", "") .str.replace(")", "") )return dftidy_df = tidy_columns(df)tidy_df
flight_number
airline_name
departure_date_time
0
QZ-274
AirAsia Indonesia
2023-11-22 20:30
1
QG-526
Citilink
2023-11-22 16:20
2
QG-522
Citilink
2023-11-22 06:00
Neat! The simple method above tidy_columns do the following:
strip trailing whitespace
change from camel case to snake case (if any)
replace any whitespace (e.g. space) to underscore
change case to lowercase
remove parentheses
If you think that it needs additional treatment, you can simply create another function that accept a string and return a string and put another .map(your_function) in the tidy function above.