Let’s start with creating dataframes in Python using Pandas.
First of all, you need to download the library Pandas
pip3 install pandas
For using Pandas in a program, import Pandas typically at the top.
import pandas as pd
It is convention to use the variable pd for Pandas.
It seems to be natural to start with an empty dataframe and then fill it as data becomes available.
df = pd.DataFrame()
For getting a print out of the result,
df ### Jupyter print(df) ### Coding
Fill the dataframe column by column.
df["Col_C1"] = [11, 12] df["Col_C2"] = [21, 22]
Fill the dataframe row by row.
df = pd.DataFrame(columns=["Col_R1", "Col_R2"]) df.loc["Col_R1"] = [11, 12] df.loc["Col_R2"] = [21, 22]
But do not loop over dataframes! As it is correctly pointed out in a highly voted question on Stackoverflow: Creating an empty pandas dataframe - then filling it, though you can create and fill dataframes by looping over data, particularly for performance reasons, you should not do so. Get your data first into an efficient structure, e.g. lists, then create the (final) dataframe.
Getting started, we create a table with accounts and months. All values shall be 0. Let’s put the months as columns and accounts as rows.
You should be able to copy and paste the coding below. It should run, using the import statement above.
accounts = ["Account_A", "Account_B", "Account_D", "Account_X"] months = ["January", "February", "March"] df = pd.DataFrame(index=accounts, columns=months) df = df.fillna(0)
Your result should be similar to:
Certainly, though technically a dataframe, this is just mere ‘list’ or ‘table’, still a bit away from a ‘beautiful dataframe’.
Dataframes can be created in an elegant way from a dict.
pd.DataFrame.from_dict(myDict)
In practice, however, this is found to be rarely needed.
Task: Put these arrays as columns to a dataframe with the capitalized array name as column name.
service = ["Consulting", "Installation", "Operation"] account = ["0815", "1234", "9876"] quantity = [10, 100, 50] price = [100, 10, 50] cost = [30, 7, 25]
One (of many) solutions is to create a dataframe using the rows as they are given in the task
rows = [service, account, quantity, price, cost]
Then, create a dataframe based on these rows. I called it “dfPJ” abbreviating “dataframe for Python Julia comparison”.
dfPJ = pd.DataFrame(data=rows)
Transpose the dataframe
dfPJ = dfPJ.T
Rename the columns
colNames = ["Service", "Account", "Quantity", "Price", "Cost"] dfPJ.columns = colNames
Stay tuned for the introduction on transpose dataframes and renaming columns.
See here for
Often, you have an array of dictionary that you want to convert into a dataframe.
myDict = [ {"Email": "Joe@abc.com", "Title": "Mr.", "Firstname": "Joe"}, {"Email": "Sarah@abc.com", "Title": "Mrs.", "Firstname": "Sarah"}, ]
You can create this dataframe as simple as that:
df = pd.DataFrame(myDict)
Done.