Dataframes and Pandas?

Dataframes and Pandas?

Day01: How is an animal even in the same sentence?

Table of contents

No heading

No headings in the article.

The journey or rather since I like to be fancy, Odyssey towards gaining the coveted title of being referred to as a "Data Scientist" has started.

Beginning with Pandas, sadly not the animal, but the Python library. Pandas is a python library, currently at version 1.4.2, which is used to manipulate data. Not just any form of data but, exel sheet, csv like data. Tabulated data.

Believe me but after eeing fellow collegues go over 1000s of rows of excel data manually, my respect for Pandas grew astronomically (and hope for myself).

Beginning with the heart of this library, the Dataframe.

A Dataframe is essentially a table of data, with some fancy labels attached to it for data identifiation purposes such as rows, columns, headers and index. A 2d data structure basically.

There are 14 types of files, that pandas can work with:

  1. Comma-separated values (.csv)
  2. XLSX
  3. ZIP
  4. Plain Text (.txt)
  5. JSON
  6. XML
  7. HTML
  8. Images
  9. Hierarchical Data Format
  10. PDF
  11. DOCX
  12. MP3
  13. MP4
  14. SQL

The following lines of code deal with understanding your data, and how pandas is considering your data.

Reading the data and storing it in a variable

DataFrame = pd.read_csv('File path')

What does pandas consider our variable as?

type(DataFrame)

->pandas.core.frame.DataFrame

Display all the columns of our df

DataFrame.columns

No. of rows and cols in our data

DataFrame.shape

Size of our df i.e row x col

DataFrame.size

Setting min no. of output rows in jupyter nb

pd.options.display.min_rows = x

Get the first 'n' no. of rows

DataFrame.head(), DataFrame.head(n)

Get the last 'n' no. of rows

DataFrame.tail(), DataFrame.tail(n)

Get info about the df, details about all the cols

DataFrame.info()

Types of data in each col

DataFrame.dtypes