Working with Libraries
Python libraries supercharge your productivity. Business analysts use pandas for Excel-like data manipulation, requests for API calls, and openpyxl for reading/writing Excel files. Master key libraries and you automate tasks that used to take hours.
Estimated reading time: 30–35 minutes
Essential Libraries
- pandas → Excel-like data manipulation (DataFrames)
- requests → HTTP requests, API calls
- openpyxl → read/write Excel files
- datetime → date/time operations (built-in)
Great for: data analysis, automation, reporting
Installation & Usage
- pip install → add libraries to your environment
- import → load library into your script
- Documentation → read docs for each library
- Common aliases → pd for pandas, np for numpy
Great for: getting started quickly
Installing Libraries
Use pip to install third-party libraries.
# Install pandas for data analysis
pip install pandas
# Install requests for API calls
pip install requests
# Install openpyxl for Excel files
pip install openpyxl
# Install multiple at once
pip install pandas requests openpyxlPandas — Excel-like Data Manipulation
Pandas DataFrames are like Excel spreadsheets in Python. Perfect for filtering, grouping, and analyzing data.
import pandas as pd
# Create DataFrame from dict
data = {
'rep': ['Ana', 'Bob', 'Carol', 'Ana'],
'region': ['North', 'South', 'North', 'North'],
'sales': [1200, 1500, 980, 1100]
}
df = pd.DataFrame(data)
print(df)
# Filter rows
high_sales = df[df['sales'] > 1000]
print(high_sales)
# Group by and sum
by_rep = df.groupby('rep')['sales'].sum()
print(by_rep)
# Sort by column
sorted_df = df.sort_values('sales', ascending=False)
print(sorted_df)Reading and Writing CSV with Pandas
Pandas makes CSV operations trivial.
import pandas as pd
# Read CSV
df = pd.read_csv('sales.csv')
print(df.head()) # first 5 rows
# Basic info
print(df.info())
print(df.describe()) # statistics
# Write CSV
df.to_csv('output.csv', index=False)Requests — Making API Calls
Requests library simplifies HTTP requests for APIs.
import requests
# GET request
response = requests.get('https://api.example.com/data')
print(response.status_code) # 200 = success
# Parse JSON response
data = response.json()
print(data)
# POST request with data
payload = {'name': 'Ana', 'email': 'ana@co.com'}
response = requests.post('https://api.example.com/users', json=payload)
print(response.json())
# Error handling
try:
response = requests.get('https://api.example.com/data', timeout=5)
response.raise_for_status() # raises error for 4xx/5xx
data = response.json()
except requests.exceptions.RequestException as e:
print("API error:", e)Working with Excel Files
Openpyxl reads and writes Excel files directly.
import pandas as pd
# Read Excel file
df = pd.read_excel('sales.xlsx', sheet_name='Sheet1')
print(df.head())
# Write Excel file
df.to_excel('output.xlsx', sheet_name='Sales', index=False)
# Multiple sheets
with pd.ExcelWriter('report.xlsx') as writer:
df1.to_excel(writer, sheet_name='Sales', index=False)
df2.to_excel(writer, sheet_name='Summary', index=False)Datetime — Working with Dates
Built-in datetime module for date operations.
from datetime import datetime, timedelta
# Current date/time
now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))
# Parse string to date
date_str = "2024-01-15"
date = datetime.strptime(date_str, "%Y-%m-%d")
print(date)
# Date arithmetic
tomorrow = now + timedelta(days=1)
last_week = now - timedelta(weeks=1)
print(tomorrow, last_week)
# Pandas datetime
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.yearCornerstone Project — Sales Dashboard Data Pipeline (step-by-step)
Build a data pipeline that fetches sales data from an API, cleans it with pandas, calculates KPIs, and exports to Excel for stakeholders. This combines multiple libraries into a real workflow analysts use daily.
Step 1 — Fetch data from API
Simulate API call (in real life, use your company API).
import requests
import pandas as pd
from datetime import datetime
# Simulate API response (replace with real API)
def fetch_sales_data():
# In production: response = requests.get('https://api.company.com/sales')
# For demo, return mock data
return [
{"date": "2024-01-15", "rep": "Ana", "amount": 1200, "region": "North"},
{"date": "2024-01-16", "rep": "Bob", "amount": 1500, "region": "South"},
{"date": "2024-01-17", "rep": "Carol", "amount": 980, "region": "North"},
{"date": "2024-01-18", "rep": "Ana", "amount": 1100, "region": "North"},
]
raw_data = fetch_sales_data()
print("Fetched", len(raw_data), "records")Step 2 — Load into pandas DataFrame
Convert to DataFrame for easy manipulation.
df = pd.DataFrame(raw_data)
print(df.head())
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
print(df.dtypes)Step 3 — Clean and validate data
Remove invalid rows, handle missing values.
# Remove rows with missing values
df = df.dropna()
# Remove negative amounts
df = df[df['amount'] > 0]
# Add calculated columns
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
print("Clean data:", len(df), "rows")Step 4 — Calculate KPIs
Aggregate data for reporting.
# Total sales
total_sales = df['amount'].sum()
print("Total sales: $", round(total_sales, 2))
# Sales by rep
by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
by_rep.columns = ['total', 'transactions', 'avg_sale']
print(by_rep)
# Sales by region
by_region = df.groupby('region')['amount'].sum()
print(by_region)
# Top performers
top_reps = by_rep.sort_values('total', ascending=False).head(3)
print("Top performers:")
print(top_reps)Step 5 — Create summary DataFrame
Build executive summary table.
summary = pd.DataFrame({
'Metric': ['Total Sales', 'Transactions', 'Average Sale', 'Top Rep'],
'Value': [
round(total_sales, 2),
len(df),
round(df['amount'].mean(), 2),
by_rep['total'].idxmax()
]
})
print(summary)Step 6 — Export to Excel
Create multi-sheet Excel report for stakeholders.
with pd.ExcelWriter('sales_dashboard.xlsx', engine='openpyxl') as writer:
# Raw data
df.to_excel(writer, sheet_name='Raw Data', index=False)
# Summary
summary.to_excel(writer, sheet_name='Summary', index=False)
# By rep
by_rep.to_excel(writer, sheet_name='By Rep')
# By region
by_region.to_excel(writer, sheet_name='By Region')
print("Excel report saved to sales_dashboard.xlsx")Step 7 — Put it all together
Combine into reusable pipeline function.
def generate_sales_dashboard():
# Fetch
raw_data = fetch_sales_data()
df = pd.DataFrame(raw_data)
df['date'] = pd.to_datetime(df['date'])
# Clean
df = df.dropna()
df = df[df['amount'] > 0]
df['month'] = df['date'].dt.month
# Calculate KPIs
total_sales = df['amount'].sum()
by_rep = df.groupby('rep')['amount'].agg(['sum', 'count', 'mean'])
by_rep.columns = ['total', 'transactions', 'avg_sale']
by_region = df.groupby('region')['amount'].sum()
# Summary
summary = pd.DataFrame({
'Metric': ['Total Sales', 'Transactions', 'Average Sale'],
'Value': [round(total_sales, 2), len(df), round(df['amount'].mean(), 2)]
})
# Export
with pd.ExcelWriter('sales_dashboard.xlsx') as writer:
df.to_excel(writer, sheet_name='Raw Data', index=False)
summary.to_excel(writer, sheet_name='Summary', index=False)
by_rep.to_excel(writer, sheet_name='By Rep')
by_region.to_excel(writer, sheet_name='By Region')
print("Dashboard generated successfully!")
# Run it
generate_sales_dashboard()How this helps at work
- Automated reporting → run daily/weekly, no manual Excel work
- API integration → pull live data from systems
- Multi-sheet reports → stakeholders get complete picture
- Reusable pipeline → adapt for any data source
Key Takeaways
- pandas → DataFrames for Excel-like data manipulation
- requests → simple HTTP requests for APIs
- openpyxl → read/write Excel files (via pandas)
- datetime → date operations and formatting
- Combining libraries → build powerful data pipelines
- Cornerstone → sales dashboard pipeline automates reporting
Next Steps
You have mastered essential libraries. Next, explore working with APIs for advanced integration, or dive into data visualization with matplotlib and seaborn for charts and graphs.