Working with Libraries

Python libraries supercharge your productivity. Business analysts use pandas for Excel-like data manipulation, requests for API calls, and openpyxl for reading/writing Excel files. Master key libraries and you automate tasks that used to take hours.

Estimated reading time: 30–35 minutes

Essential Libraries

  • pandas → Excel-like data manipulation (DataFrames)
  • requests → HTTP requests, API calls
  • openpyxl → read/write Excel files
  • datetime → date/time operations (built-in)

Great for: data analysis, automation, reporting

Installation & Usage

  • pip install → add libraries to your environment
  • import → load library into your script
  • Documentation → read docs for each library
  • Common aliases → pd for pandas, np for numpy

Great for: getting started quickly

Installing Libraries

Use pip to install third-party libraries.

python
# Install pandas for data analysis
pip install pandas

# Install requests for API calls
pip install requests

# Install openpyxl for Excel files
pip install openpyxl

# Install multiple at once
pip install pandas requests openpyxl

Pandas — Excel-like Data Manipulation

Pandas DataFrames are like Excel spreadsheets in Python. Perfect for filtering, grouping, and analyzing data.

python
import pandas as pd

# Create DataFrame from dict
data = {
    'rep': ['Ana', 'Bob', 'Carol', 'Ana'],
    'region': ['North', 'South', 'North', 'North'],
    'sales': [1200, 1500, 980, 1100]
}

df = pd.DataFrame(data)
print(df)

# Filter rows
high_sales = df[df['sales'] > 1000]
print(high_sales)

# Group by and sum
by_rep = df.groupby('rep')['sales'].sum()
print(by_rep)

# Sort by column
sorted_df = df.sort_values('sales', ascending=False)
print(sorted_df)

Reading and Writing CSV with Pandas

Pandas makes CSV operations trivial.

python
import pandas as pd

# Read CSV
df = pd.read_csv('sales.csv')
print(df.head())  # first 5 rows

# Basic info
print(df.info())
print(df.describe())  # statistics

# Write CSV
df.to_csv('output.csv', index=False)

Requests — Making API Calls

Requests library simplifies HTTP requests for APIs.

python
import requests

# GET request
response = requests.get('https://api.example.com/data')
print(response.status_code)  # 200 = success

# Parse JSON response
data = response.json()
print(data)

# POST request with data
payload = {'name': 'Ana', 'email': 'ana@co.com'}
response = requests.post('https://api.example.com/users', json=payload)
print(response.json())

# Error handling
try:
    response = requests.get('https://api.example.com/data', timeout=5)
    response.raise_for_status()  # raises error for 4xx/5xx
    data = response.json()
except requests.exceptions.RequestException as e:
    print("API error:", e)

Working with Excel Files

Openpyxl reads and writes Excel files directly.

python
import pandas as pd

# Read Excel file
df = pd.read_excel('sales.xlsx', sheet_name='Sheet1')
print(df.head())

# Write Excel file
df.to_excel('output.xlsx', sheet_name='Sales', index=False)

# Multiple sheets
with pd.ExcelWriter('report.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sales', index=False)
    df2.to_excel(writer, sheet_name='Summary', index=False)

Datetime — Working with Dates

Built-in datetime module for date operations.

python
from datetime import datetime, timedelta

# Current date/time
now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))

# Parse string to date
date_str = "2024-01-15"
date = datetime.strptime(date_str, "%Y-%m-%d")
print(date)

# Date arithmetic
tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
print(tomorrow, last_week)

# Pandas datetime
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year

Cornerstone Project — Sales Dashboard Data Pipeline (step-by-step)

Build a data pipeline that fetches sales data from an API, cleans it with pandas, calculates KPIs, and exports to Excel for stakeholders. This combines multiple libraries into a real workflow analysts use daily.

Step 1 — Fetch data from API

Simulate API call (in real life, use your company API).

python
import requests
import pandas as pd
from datetime import datetime

# Simulate API response (replace with real API)
def fetch_sales_data():
    # In production: response = requests.get('https://api.company.com/sales')
    # For demo, return mock data
    return [
        {"date": "2024-01-15", "rep": "Ana", "amount": 1200, "region": "North"},
        {"date": "2024-01-16", "rep": "Bob", "amount": 1500, "region": "South"},
        {"date": "2024-01-17", "rep": "Carol", "amount": 980, "region": "North"},
        {"date": "2024-01-18", "rep": "Ana", "amount": 1100, "region": "North"},
    ]

raw_data = fetch_sales_data()
print("Fetched", len(raw_data), "records")

Step 2 — Load into pandas DataFrame

Convert to DataFrame for easy manipulation.

python
df = pd.DataFrame(raw_data)
print(df.head())

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
print(df.dtypes)

Step 3 — Clean and validate data

Remove invalid rows, handle missing values.

python
# Remove rows with missing values
df = df.dropna()

# Remove negative amounts
df = df[df['amount'] > 0]

# Add calculated columns
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year

print("Clean data:", len(df), "rows")

Step 4 — Calculate KPIs

Aggregate data for reporting.

python
# Total sales
total_sales = df['amount'].sum()
print("Total sales: $", round(total_sales, 2))

# Sales by rep
by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
by_rep.columns = ['total', 'transactions', 'avg_sale']
print(by_rep)

# Sales by region
by_region = df.groupby('region')['amount'].sum()
print(by_region)

# Top performers
top_reps = by_rep.sort_values('total', ascending=False).head(3)
print("Top performers:")
print(top_reps)

Step 5 — Create summary DataFrame

Build executive summary table.

python
summary = pd.DataFrame({
    'Metric': ['Total Sales', 'Transactions', 'Average Sale', 'Top Rep'],
    'Value': [
        round(total_sales, 2),
        len(df),
        round(df['amount'].mean(), 2),
        by_rep['total'].idxmax()
    ]
})

print(summary)

Step 6 — Export to Excel

Create multi-sheet Excel report for stakeholders.

python
with pd.ExcelWriter('sales_dashboard.xlsx', engine='openpyxl') as writer:
    # Raw data
    df.to_excel(writer, sheet_name='Raw Data', index=False)
    
    # Summary
    summary.to_excel(writer, sheet_name='Summary', index=False)
    
    # By rep
    by_rep.to_excel(writer, sheet_name='By Rep')
    
    # By region
    by_region.to_excel(writer, sheet_name='By Region')

print("Excel report saved to sales_dashboard.xlsx")

Step 7 — Put it all together

Combine into reusable pipeline function.

python
def generate_sales_dashboard():
    # Fetch
    raw_data = fetch_sales_data()
    df = pd.DataFrame(raw_data)
    df['date'] = pd.to_datetime(df['date'])
    
    # Clean
    df = df.dropna()
    df = df[df['amount'] > 0]
    df['month'] = df['date'].dt.month
    
    # Calculate KPIs
    total_sales = df['amount'].sum()
    by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
    by_rep.columns = ['total', 'transactions', 'avg_sale']
    by_region = df.groupby('region')['amount'].sum()
    
    # Summary
    summary = pd.DataFrame({
        'Metric': ['Total Sales', 'Transactions', 'Average Sale'],
        'Value': [round(total_sales, 2), len(df), round(df['amount'].mean(), 2)]
    })
    
    # Export
    with pd.ExcelWriter('sales_dashboard.xlsx') as writer:
        df.to_excel(writer, sheet_name='Raw Data', index=False)
        summary.to_excel(writer, sheet_name='Summary', index=False)
        by_rep.to_excel(writer, sheet_name='By Rep')
        by_region.to_excel(writer, sheet_name='By Region')
    
    print("Dashboard generated successfully!")

# Run it
generate_sales_dashboard()

How this helps at work

  • Automated reporting → run daily/weekly, no manual Excel work
  • API integration → pull live data from systems
  • Multi-sheet reports → stakeholders get complete picture
  • Reusable pipeline → adapt for any data source

Key Takeaways

  • pandas → DataFrames for Excel-like data manipulation
  • requests → simple HTTP requests for APIs
  • openpyxl → read/write Excel files (via pandas)
  • datetime → date operations and formatting
  • Combining libraries → build powerful data pipelines
  • Cornerstone → sales dashboard pipeline automates reporting

Next Steps

You have mastered essential libraries. Next, explore working with APIs for advanced integration, or dive into data visualization with matplotlib and seaborn for charts and graphs.