How to save pandas Dataframe to a file and…
Pandas DataFrame is a very useful data type in Python to format and order data in different ways. One of its use is to write Dataframe to a file and read back to a DataFrame when it is needed.
1. DataFrame to CSV
In this example, we will write DataFrame into a CSV file
import pandas as pd
d = {'Name': ['John Doe', 'Foo', 'Tony Stark'], 'Age': [10, 12, 48]}
df = pd.DataFrame(data=d)
df.to_csv('test.csv', index=False)
This will create a file called test.csv
in the directory, you have run the python code.
Note that here we have provided index=False
so it will not provide the index as a column. If you want the index you can just run to_csv
function without params
df.to_csv('test.csv')
You can read this CSV by using the read_csv
function.
import pandas as pd
dfr = pd.read_csv('test.csv')
print(dfr)
This will print the data in the console as a formated table
When you have data with commas, you can change the seperator to a different one using sep=
parameter. Here I have used |
pipe symbol and following is the result
import pandas as pd
d = {'Name': ['John Doe', 'Foo', 'Tony Stark'], 'Age': [10, 12, 48]}
df = pd.DataFrame(data=d)
df.to_csv('test.csv', index=False, sep="|")
dfr = pd.read_csv('test.csv')
print(dfr)
2. Dataframe to JSON
When saving JSON, we have to provide the rows, columns, and indexes as well if you need to guarantee order of data.
For that, we can provide orient
parameter when converting to JSON.
import pandas as pd
d = {'Name': ['John Doe', 'Foo', 'Tony Stark'], 'Age': [10, 12, 48]}
df = pd.DataFrame(data=d)
df.to_json('test.json', orient='split')
dfr = pd.read_json('test.json', orient="split")
print(dfr)
This saves the test.json
file in the following format
You can see the three main attributes columns, index, data
defines how the data should be in the DataFrame.
There are other orients that you can save the DataFrame to JSON files
orient=records
Array of records as each row. No loss as the column title is available. Takes space as column title is always duplicated
orient=columns
Each column data is represented by the index
orient=values
Array of each rows of data as arrays. Similar to records but data row represented as an array. Saves space but column header is missing.
orient=table
Much more details as a table structure. Data type and pandas version all reserved
Following are some useful References to write Dataframe to a file and read back
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html