Begging my journey as a Data scientist

The journey or rather since I like to be fancy, Odyssey towards gaining the coveted title of being referred to as a "Data Scientist" has started.

Beginning with Pandas, sadly not the animal, but the Python library. Pandas is a python library, currently at version 1.4.2, which is used to manipulate data. Not just any form of data but, exel sheet, csv like data. Tabulated data.

Believe me but after eeing fellow collegues go over 1000s of rows of excel data manually, my respect for Pandas grew astronomically (and hope for myself).

Beginning with the heart of this library, the Dataframe.

A Dataframe is essentially a table of data, with some fancy labels attached to it for data identifiation purposes such as rows, columns, headers and index. A 2d data structure basically.

There are 14 types of files, that pandas can work with:

Comma-separated values (.csv)
XLSX
ZIP
Plain Text (.txt)
JSON
XML
HTML
Images
Hierarchical Data Format
PDF
DOCX
MP3
MP4
SQL

The following lines of code deal with understanding your data, and how pandas is considering your data.

Reading the data and storing it in a variable

DataFrame = pd.read_csv('File path')

What does pandas consider our variable as?

type(DataFrame)

->pandas.core.frame.DataFrame

Display all the columns of our df

DataFrame.columns

No. of rows and cols in our data

DataFrame.shape

Size of our df i.e row x col

DataFrame.size

Setting min no. of output rows in jupyter nb

pd.options.display.min_rows = x

Get the first 'n' no. of rows

DataFrame.head(), DataFrame.head(n)

Get the last 'n' no. of rows

DataFrame.tail(), DataFrame.tail(n)

Get info about the df, details about all the cols

DataFrame.info()

Types of data in each col

DataFrame.dtypes

GeeKee's Odyssey

GeeKee's Odyssey

Dataframes and Pandas?

Day01: How is an animal even in the same sentence?

Table of contents

No headings in the article.