Data Structures

Data structures are containers that organize your data for fast lookups, updates, and analysis. Business analysts use lists for ordered records, dicts for KPI mappings, sets for deduplication, and tuples for immutable configs. Master these four and you handle 95% of real-world data tasks.

Estimated reading time: 25–30 minutes

When to Use Each Structure

  • List → ordered, mutable (sales records, task queues)
  • Dict → key-value lookups (employee ID to name, SKU to price)
  • Set → unique items, fast membership (dedupe emails, tags)
  • Tuple → immutable sequence (coordinates, config constants)

Great for: organizing business data by access pattern

Performance & Best Practices

  • Dict/set lookups are O(1) → use for find by ID tasks
  • List append is O(1), insert/delete is O(n) → append when possible
  • Comprehensions are faster and cleaner than loops
  • Use tuples for data that should not change (safer, hashable)

Great for: writing efficient, bug-resistant code

Lists — Ordered, Mutable Sequences

Lists hold ordered items you can change. Perfect for sales records, task lists, or any sequence you will modify.

python
# Creating lists
sales = [1200, 1500, 980, 2100]
regions = ["North", "South", "East", "West"]

# Accessing by index (0-based)
first_sale = sales[0]      # 1200
last_sale = sales[-1]      # 2100

# Slicing
top_two = sales[:2]        # [1200, 1500]

# Modifying
sales.append(1800)         # add to end
sales[0] = 1250            # update first
sales.remove(980)          # remove by value

Dictionaries — Fast Key-Value Lookups

Dicts map keys to values for instant lookups. Use for employee records, SKU prices, or any find by ID task.

python
# Creating dicts
employee = {"id": 101, "name": "Ana", "dept": "Sales"}
prices = {"SKU123": 49.99, "SKU456": 89.99}

# Accessing
name = employee["name"]           # "Ana"
price = prices.get("SKU123", 0)   # 49.99 (safe with default)

# Adding/updating
employee["email"] = "ana@co.com"
prices["SKU789"] = 120.00

# Iterating
for sku, price in prices.items():
    print(sku, ":", price)

Sets — Unique Items, Fast Membership

Sets store unique items with O(1) membership checks. Perfect for deduplication and finding overlaps.

python
# Creating sets
emails = {"ana@co.com", "bob@co.com", "ana@co.com"}  # auto-dedupes
print(emails)  # {"ana@co.com", "bob@co.com"}

# Adding/removing
emails.add("carol@co.com")
emails.discard("bob@co.com")  # safe remove (no error if missing)

# Set operations
team_a = {"Ana", "Bob", "Carol"}
team_b = {"Bob", "David", "Eve"}

both = team_a & team_b         # intersection: {"Bob"}
either = team_a | team_b       # union: all 5 names
only_a = team_a - team_b       # difference: {"Ana", "Carol"}

Tuples — Immutable Sequences

Tuples are like lists but cannot be changed. Use for coordinates, config constants, or dict keys.

python
# Creating tuples
coords = (40.7128, -74.0060)  # NYC lat/lon
config = ("prod", 8080, True)

# Accessing
lat, lon = coords  # unpacking
env = config[0]    # "prod"

# Tuples are hashable so can be dict keys
locations = {
    (40.7128, -74.0060): "New York",
    (34.0522, -118.2437): "Los Angeles"
}

Comprehensions — Concise Data Transforms

Comprehensions create lists/dicts/sets in one line. Faster and cleaner than loops.

python
# List comprehension
sales = [1200, 1500, 980, 2100]
high_sales = [s for s in sales if s > 1000]  # [1200, 1500, 2100]

# Dict comprehension
prices = {"A": 10, "B": 20, "C": 15}
discounted = {k: v * 0.9 for k, v in prices.items()}  # 10% off

# Set comprehension
regions = ["North", "South", "North", "East"]
unique = {r.upper() for r in regions}  # {"NORTH", "SOUTH", "EAST"}

Cornerstone Project — Product Inventory Tracker (step-by-step)

Build a simple inventory system that tracks stock levels, flags low inventory, and finds duplicate SKUs. You will combine lists, dicts, and sets to organize data efficiently—skills you will use daily for dashboards, reports, and ETL.

Step 1 — Define the inventory data

Start with a list of dicts (mimics CSV rows).

python
inventory = [
    {"sku": "A101", "name": "Widget", "qty": 50, "price": 12.99},
    {"sku": "B202", "name": "Gadget", "qty": 5, "price": 24.50},
    {"sku": "C303", "name": "Doohickey", "qty": 120, "price": 8.75},
    {"sku": "A101", "name": "Widget", "qty": 30, "price": 12.99},  # duplicate SKU
]

Step 2 — Find low-stock items (list comprehension)

Filter items below a threshold in one line.

python
LOW_STOCK = 10
low_items = [item for item in inventory if item["qty"] < LOW_STOCK]

print("Low Stock Alert:")
for item in low_items:
    print(" •", item['name'], "(SKU", item['sku'], "):", item['qty'], "left")

Step 3 — Build a SKU to details lookup (dict)

Convert list to dict for instant lookups by SKU.

python
sku_map = {item["sku"]: item for item in inventory}

# Fast lookup
if "B202" in sku_map:
    print("Found:", sku_map["B202"]["name"])

Step 4 — Detect duplicate SKUs (set)

Use a set to find SKUs that appear more than once.

python
seen = set()
duplicates = set()

for item in inventory:
    sku = item["sku"]
    if sku in seen:
        duplicates.add(sku)
    else:
        seen.add(sku)

if duplicates:
    print("Duplicate SKUs found:", duplicates)

Step 5 — Calculate total inventory value

Sum up qty times price for all items.

python
total_value = sum(item["qty"] * item["price"] for item in inventory)
print("Total inventory value: $", round(total_value, 2))

Step 6 — Put it all together

Combine into a reusable function that returns a summary dict.

python
def inventory_report(items, low_threshold=10):
    low = [i for i in items if i["qty"] < low_threshold]
    
    seen, dupes = set(), set()
    for i in items:
        (dupes if i["sku"] in seen else seen).add(i["sku"])
    
    total = sum(i["qty"] * i["price"] for i in items)
    
    return {
        "low_stock": low,
        "duplicates": list(dupes),
        "total_value": round(total, 2)
    }

report = inventory_report(inventory)
print("Low stock:", len(report["low_stock"]))
print("Duplicates:", report["duplicates"])
print("Total value: $", report['total_value'])

How this helps at work

  • Instant alerts → spot low stock before customers complain
  • Data quality → catch duplicate SKUs that break reports
  • Financial visibility → know your inventory value in seconds
  • Reusable pattern → adapt for customer lists, order tracking, etc.

Key Takeaways

  • Lists → ordered, mutable; use for sequences you will modify
  • Dicts → key-value pairs; O(1) lookups for find by ID tasks
  • Sets → unique items; fast membership and deduplication
  • Tuples → immutable sequences; safe for constants and dict keys
  • Comprehensions → one-line transforms; faster and cleaner than loops
  • Cornerstone → inventory tracker combining all four structures

Next Steps

You have mastered Python core data structures. Next, explore loops and iterations to process these structures efficiently, or jump to file handling to load real CSV/JSON data into your trackers.