Data Structures
Data structures are containers that organize your data for fast lookups, updates, and analysis. Business analysts use lists for ordered records, dicts for KPI mappings, sets for deduplication, and tuples for immutable configs. Master these four and you handle 95% of real-world data tasks.
Estimated reading time: 25–30 minutes
When to Use Each Structure
- List → ordered, mutable (sales records, task queues)
- Dict → key-value lookups (employee ID to name, SKU to price)
- Set → unique items, fast membership (dedupe emails, tags)
- Tuple → immutable sequence (coordinates, config constants)
Great for: organizing business data by access pattern
Performance & Best Practices
- Dict/set lookups are O(1) → use for find by ID tasks
- List append is O(1), insert/delete is O(n) → append when possible
- Comprehensions are faster and cleaner than loops
- Use tuples for data that should not change (safer, hashable)
Great for: writing efficient, bug-resistant code
Lists — Ordered, Mutable Sequences
Lists hold ordered items you can change. Perfect for sales records, task lists, or any sequence you will modify.
# Creating lists
sales = [1200, 1500, 980, 2100]
regions = ["North", "South", "East", "West"]
# Accessing by index (0-based)
first_sale = sales[0] # 1200
last_sale = sales[-1] # 2100
# Slicing
top_two = sales[:2] # [1200, 1500]
# Modifying
sales.append(1800) # add to end
sales[0] = 1250 # update first
sales.remove(980) # remove by valueDictionaries — Fast Key-Value Lookups
Dicts map keys to values for instant lookups. Use for employee records, SKU prices, or any find by ID task.
# Creating dicts
employee = {"id": 101, "name": "Ana", "dept": "Sales"}
prices = {"SKU123": 49.99, "SKU456": 89.99}
# Accessing
name = employee["name"] # "Ana"
price = prices.get("SKU123", 0) # 49.99 (safe with default)
# Adding/updating
employee["email"] = "ana@co.com"
prices["SKU789"] = 120.00
# Iterating
for sku, price in prices.items():
print(sku, ":", price)Sets — Unique Items, Fast Membership
Sets store unique items with O(1) membership checks. Perfect for deduplication and finding overlaps.
# Creating sets
emails = {"ana@co.com", "bob@co.com", "ana@co.com"} # auto-dedupes
print(emails) # {"ana@co.com", "bob@co.com"}
# Adding/removing
emails.add("carol@co.com")
emails.discard("bob@co.com") # safe remove (no error if missing)
# Set operations
team_a = {"Ana", "Bob", "Carol"}
team_b = {"Bob", "David", "Eve"}
both = team_a & team_b # intersection: {"Bob"}
either = team_a | team_b # union: all 5 names
only_a = team_a - team_b # difference: {"Ana", "Carol"}Tuples — Immutable Sequences
Tuples are like lists but cannot be changed. Use for coordinates, config constants, or dict keys.
# Creating tuples
coords = (40.7128, -74.0060) # NYC lat/lon
config = ("prod", 8080, True)
# Accessing
lat, lon = coords # unpacking
env = config[0] # "prod"
# Tuples are hashable so can be dict keys
locations = {
(40.7128, -74.0060): "New York",
(34.0522, -118.2437): "Los Angeles"
}Comprehensions — Concise Data Transforms
Comprehensions create lists/dicts/sets in one line. Faster and cleaner than loops.
# List comprehension
sales = [1200, 1500, 980, 2100]
high_sales = [s for s in sales if s > 1000] # [1200, 1500, 2100]
# Dict comprehension
prices = {"A": 10, "B": 20, "C": 15}
discounted = {k: v * 0.9 for k, v in prices.items()} # 10% off
# Set comprehension
regions = ["North", "South", "North", "East"]
unique = {r.upper() for r in regions} # {"NORTH", "SOUTH", "EAST"}Cornerstone Project — Product Inventory Tracker (step-by-step)
Build a simple inventory system that tracks stock levels, flags low inventory, and finds duplicate SKUs. You will combine lists, dicts, and sets to organize data efficiently—skills you will use daily for dashboards, reports, and ETL.
Step 1 — Define the inventory data
Start with a list of dicts (mimics CSV rows).
inventory = [
{"sku": "A101", "name": "Widget", "qty": 50, "price": 12.99},
{"sku": "B202", "name": "Gadget", "qty": 5, "price": 24.50},
{"sku": "C303", "name": "Doohickey", "qty": 120, "price": 8.75},
{"sku": "A101", "name": "Widget", "qty": 30, "price": 12.99}, # duplicate SKU
]Step 2 — Find low-stock items (list comprehension)
Filter items below a threshold in one line.
LOW_STOCK = 10
low_items = [item for item in inventory if item["qty"] < LOW_STOCK]
print("Low Stock Alert:")
for item in low_items:
print(" •", item['name'], "(SKU", item['sku'], "):", item['qty'], "left")Step 3 — Build a SKU to details lookup (dict)
Convert list to dict for instant lookups by SKU.
sku_map = {item["sku"]: item for item in inventory}
# Fast lookup
if "B202" in sku_map:
print("Found:", sku_map["B202"]["name"])Step 4 — Detect duplicate SKUs (set)
Use a set to find SKUs that appear more than once.
seen = set()
duplicates = set()
for item in inventory:
sku = item["sku"]
if sku in seen:
duplicates.add(sku)
else:
seen.add(sku)
if duplicates:
print("Duplicate SKUs found:", duplicates)Step 5 — Calculate total inventory value
Sum up qty times price for all items.
total_value = sum(item["qty"] * item["price"] for item in inventory)
print("Total inventory value: $", round(total_value, 2))Step 6 — Put it all together
Combine into a reusable function that returns a summary dict.
def inventory_report(items, low_threshold=10):
low = [i for i in items if i["qty"] < low_threshold]
seen, dupes = set(), set()
for i in items:
(dupes if i["sku"] in seen else seen).add(i["sku"])
total = sum(i["qty"] * i["price"] for i in items)
return {
"low_stock": low,
"duplicates": list(dupes),
"total_value": round(total, 2)
}
report = inventory_report(inventory)
print("Low stock:", len(report["low_stock"]))
print("Duplicates:", report["duplicates"])
print("Total value: $", report['total_value'])How this helps at work
- Instant alerts → spot low stock before customers complain
- Data quality → catch duplicate SKUs that break reports
- Financial visibility → know your inventory value in seconds
- Reusable pattern → adapt for customer lists, order tracking, etc.
Key Takeaways
- Lists → ordered, mutable; use for sequences you will modify
- Dicts → key-value pairs; O(1) lookups for find by ID tasks
- Sets → unique items; fast membership and deduplication
- Tuples → immutable sequences; safe for constants and dict keys
- Comprehensions → one-line transforms; faster and cleaner than loops
- Cornerstone → inventory tracker combining all four structures
Next Steps
You have mastered Python core data structures. Next, explore loops and iterations to process these structures efficiently, or jump to file handling to load real CSV/JSON data into your trackers.