Python libraries supercharge your productivity. Business analysts use pandas for Excel-like data manipulation, requests for API calls, and openpyxl for reading/writing Excel files. Master key libraries and you automate tasks that used to take hours.

Estimated reading time: 30–35 minutes

Essential Libraries

pandas → Excel-like data manipulation (DataFrames)
requests → HTTP requests, API calls
openpyxl → read/write Excel files
datetime → date/time operations (built-in)

Great for: data analysis, automation, reporting

Installation & Usage

pip install → add libraries to your environment
import → load library into your script
Documentation → read docs for each library
Common aliases → pd for pandas, np for numpy

Great for: getting started quickly

Installing Libraries

Use pip to install third-party libraries.

python

# Install pandas for data analysis
pip install pandas

# Install requests for API calls
pip install requests

# Install openpyxl for Excel files
pip install openpyxl

# Install multiple at once
pip install pandas requests openpyxl

Pandas — Excel-like Data Manipulation

Pandas DataFrames are like Excel spreadsheets in Python. Perfect for filtering, grouping, and analyzing data.

python

import pandas as pd

# Create DataFrame from dict
data = {
    'rep': ['Ana', 'Bob', 'Carol', 'Ana'],
    'region': ['North', 'South', 'North', 'North'],
    'sales': [1200, 1500, 980, 1100]
}

df = pd.DataFrame(data)
print(df)

# Filter rows
high_sales = df[df['sales'] > 1000]
print(high_sales)

# Group by and sum
by_rep = df.groupby('rep')['sales'].sum()
print(by_rep)

# Sort by column
sorted_df = df.sort_values('sales', ascending=False)
print(sorted_df)

Reading and Writing CSV with Pandas

Pandas makes CSV operations trivial.

python

import pandas as pd

# Read CSV
df = pd.read_csv('sales.csv')
print(df.head())  # first 5 rows

# Basic info
print(df.info())
print(df.describe())  # statistics

# Write CSV
df.to_csv('output.csv', index=False)

Requests — Making API Calls

Requests library simplifies HTTP requests for APIs.

python

import requests

# GET request
response = requests.get('https://api.example.com/data')
print(response.status_code)  # 200 = success

# Parse JSON response
data = response.json()
print(data)

# POST request with data
payload = {'name': 'Ana', 'email': 'ana@co.com'}
response = requests.post('https://api.example.com/users', json=payload)
print(response.json())

# Error handling
try:
    response = requests.get('https://api.example.com/data', timeout=5)
    response.raise_for_status()  # raises error for 4xx/5xx
    data = response.json()
except requests.exceptions.RequestException as e:
    print("API error:", e)

Working with Excel Files

Openpyxl reads and writes Excel files directly.

python

import pandas as pd

# Read Excel file
df = pd.read_excel('sales.xlsx', sheet_name='Sheet1')
print(df.head())

# Write Excel file
df.to_excel('output.xlsx', sheet_name='Sales', index=False)

# Multiple sheets
with pd.ExcelWriter('report.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sales', index=False)
    df2.to_excel(writer, sheet_name='Summary', index=False)

Datetime — Working with Dates

Built-in datetime module for date operations.

python

from datetime import datetime, timedelta

# Current date/time
now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))

# Parse string to date
date_str = "2024-01-15"
date = datetime.strptime(date_str, "%Y-%m-%d")
print(date)

# Date arithmetic
tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
print(tomorrow, last_week)

# Pandas datetime
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year

Cornerstone Project — Sales Dashboard Data Pipeline (step-by-step)

Build a data pipeline that fetches sales data from an API, cleans it with pandas, calculates KPIs, and exports to Excel for stakeholders. This combines multiple libraries into a real workflow analysts use daily.

Step 1 — Fetch data from API

Simulate API call (in real life, use your company API).

python

import requests
import pandas as pd
from datetime import datetime

# Simulate API response (replace with real API)
def fetch_sales_data():
    # In production: response = requests.get('https://api.company.com/sales')
    # For demo, return mock data
    return [
        {"date": "2024-01-15", "rep": "Ana", "amount": 1200, "region": "North"},
        {"date": "2024-01-16", "rep": "Bob", "amount": 1500, "region": "South"},
        {"date": "2024-01-17", "rep": "Carol", "amount": 980, "region": "North"},
        {"date": "2024-01-18", "rep": "Ana", "amount": 1100, "region": "North"},
    ]

raw_data = fetch_sales_data()
print("Fetched", len(raw_data), "records")

Step 2 — Load into pandas DataFrame

Convert to DataFrame for easy manipulation.

python

df = pd.DataFrame(raw_data)
print(df.head())

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
print(df.dtypes)

Step 3 — Clean and validate data

Remove invalid rows, handle missing values.

python

# Remove rows with missing values
df = df.dropna()

# Remove negative amounts
df = df[df['amount'] > 0]

# Add calculated columns
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year

print("Clean data:", len(df), "rows")

Step 4 — Calculate KPIs

Aggregate data for reporting.

python

# Total sales
total_sales = df['amount'].sum()
print("Total sales: $", round(total_sales, 2))

# Sales by rep
by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
by_rep.columns = ['total', 'transactions', 'avg_sale']
print(by_rep)

# Sales by region
by_region = df.groupby('region')['amount'].sum()
print(by_region)

# Top performers
top_reps = by_rep.sort_values('total', ascending=False).head(3)
print("Top performers:")
print(top_reps)

Step 5 — Create summary DataFrame

Build executive summary table.

python

summary = pd.DataFrame({
    'Metric': ['Total Sales', 'Transactions', 'Average Sale', 'Top Rep'],
    'Value': [
        round(total_sales, 2),
        len(df),
        round(df['amount'].mean(), 2),
        by_rep['total'].idxmax()
    ]
})

print(summary)

Step 6 — Export to Excel

Create multi-sheet Excel report for stakeholders.

python

with pd.ExcelWriter('sales_dashboard.xlsx', engine='openpyxl') as writer:
    # Raw data
    df.to_excel(writer, sheet_name='Raw Data', index=False)
    
    # Summary
    summary.to_excel(writer, sheet_name='Summary', index=False)
    
    # By rep
    by_rep.to_excel(writer, sheet_name='By Rep')
    
    # By region
    by_region.to_excel(writer, sheet_name='By Region')

print("Excel report saved to sales_dashboard.xlsx")

Step 7 — Put it all together

Combine into reusable pipeline function.

python

def generate_sales_dashboard():
    # Fetch
    raw_data = fetch_sales_data()
    df = pd.DataFrame(raw_data)
    df['date'] = pd.to_datetime(df['date'])
    
    # Clean
    df = df.dropna()
    df = df[df['amount'] > 0]
    df['month'] = df['date'].dt.month
    
    # Calculate KPIs
    total_sales = df['amount'].sum()
    by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
    by_rep.columns = ['total', 'transactions', 'avg_sale']
    by_region = df.groupby('region')['amount'].sum()
    
    # Summary
    summary = pd.DataFrame({
        'Metric': ['Total Sales', 'Transactions', 'Average Sale'],
        'Value': [round(total_sales, 2), len(df), round(df['amount'].mean(), 2)]
    })
    
    # Export
    with pd.ExcelWriter('sales_dashboard.xlsx') as writer:
        df.to_excel(writer, sheet_name='Raw Data', index=False)
        summary.to_excel(writer, sheet_name='Summary', index=False)
        by_rep.to_excel(writer, sheet_name='By Rep')
        by_region.to_excel(writer, sheet_name='By Region')
    
    print("Dashboard generated successfully!")

# Run it
generate_sales_dashboard()

How this helps at work

Automated reporting → run daily/weekly, no manual Excel work
API integration → pull live data from systems
Multi-sheet reports → stakeholders get complete picture
Reusable pipeline → adapt for any data source

Key Takeaways

pandas → DataFrames for Excel-like data manipulation
requests → simple HTTP requests for APIs
openpyxl → read/write Excel files (via pandas)
datetime → date operations and formatting
Combining libraries → build powerful data pipelines
Cornerstone → sales dashboard pipeline automates reporting

Next Steps

You have mastered essential libraries. Next, explore working with APIs for advanced integration, or dive into data visualization with matplotlib and seaborn for charts and graphs.

Working with Libraries

Essential Libraries

Installation & Usage

Installing Libraries

Pandas — Excel-like Data Manipulation

Reading and Writing CSV with Pandas

Requests — Making API Calls

Working with Excel Files

Datetime — Working with Dates

Cornerstone Project — Sales Dashboard Data Pipeline (step-by-step)

Step 1 — Fetch data from API

Step 2 — Load into pandas DataFrame

Step 3 — Clean and validate data

Step 4 — Calculate KPIs

Step 5 — Create summary DataFrame

Step 6 — Export to Excel

Step 7 — Put it all together

How this helps at work

Key Takeaways

Next Steps

On This Page