Essential functions that empower data insights with Python

By Abhinash Jena on December 11, 2024

The future belongs to those who can combine domain expertise with technological tools. Learning Python is not about becoming a programmer; it’s about empowering yourself with data insights and telling compelling stories.

The ability to handle data, uncover trends, and communicate findings is invaluable today. Python’s built-in functions are not just tools; they are like a versatile Swiss Army knife for any researcher or analyst.

Bridging the gap between data and insight

Whether working with survey responses, demographic datasets, or behavioural patterns, functions like sort() help organize messy data, while reduce() can condense a dataset into meaningful metrics.

EXAMPLE

Suppose there are responses from a survey about reading habits among college students. Before analyzing the data, the responses need to be organized and summarized effectively. Data collected is about the number of books read per month. Sorting helps to quickly see the range from least to most reading.

survey_data = [ 
  {"name": "Priya Sharma", " books_per_month ": 3},
  {"name": "Raj Patel", " books_per_month ": 7},
  {"name": "Sanjay Kumar", " books_per_month ": 1},
  {"name": "Amrita Singh", " books_per_month ": 5}, 
  {"name": "Deepak Mehta", " books_per_month ": 2}, 
  {"name": "Kavya Krishnamurthy", " books_per_month ": 8}, 
  {"name": "Arjun Reddy", " books_per_month ": 4}
]

# Sorting in ascending order

sorted_by_reading = sorted(survey_data, key = lambda x: x[' books_per_month '])

print(sorted_by_reading)
[1, 2, 3, 4, 5, 7, 8] 
NOTE

lambda is an anonymous or nameless function that can be defined in a single line. It is often used for short, throwaway functions.

lambda arguments: expression

  • Arguments: Inputs to the function.
  • Expression: The operation or condition applied to the arguments.

reduce() is the magic summarizer, to find the total number of books read across all survey respondents. Instead of adding them up manually, reduce can do this in one step.

from functools import reduce 

survey_data = [
  {"name": "Priya Sharma", " books_per_month ": 3},
  {"name": "Raj Patel", " books_per_month ": 7},
  {"name": "Sanjay Kumar", " books_per_month ": 1},
  {"name": "Amrita Singh", " books_per_month ": 5},
  {"name": "Deepak Mehta", " books_per_month ": 2},
  {"name": "Kavya Krishnamurthy", " books_per_month ": 8},
  {"name": "Arjun Reddy", " books_per_month ": 4}
]

# Calculating average reading

total_books = reduce(lambda x, y: x + y[' books_per_month '], survey_data, 0)
average_reading = round(total_books / len (survey_data),2)
print(average_reading + " books per month.")
4.29 books per month.

Think of functools as a toolbox in Python and reduce is one of the specialized tools inside this toolbox. The import statement is like picking up that specific tool to use in your workshop of code. functools is a standard Python module that provides higher-order functions and operations on callable objects.

The code from functools import reduce can be broken down as follows:

  • Accessing the Functools module
  • Specifically selecting the Reduce function
  • Bringing that function directly into the code

Python’s built-in functions that are useful for data extraction

Data extraction is like an archaeology for information. Data extraction plays an important role in a quantitative or qualitative study. Python provides powerful tools such as s trip (), slice(), filter() and map() to simplify this complex process. Imagine strip() as a digital eraser that removes unwanted characters like whitespace from the beginning and end of text. In survey research, this is incredibly important because data entries can be messy and inconsistent.

Removing spaces from both the beginning and end of the string

text = " Hello, World! " 

print(text.strip())
"Hello, World!"

Removing only left-side spaces

text = " Hello, World! " 

print(text.lstrip())
"Hello, World! " 

Removing only right-side spaces

text = " Hello, World! " 

print(text.rstrip())
" Hello, World!" 

strip() can also be combined with other methods.

raw_data = [ 
  " Priya Sharma ",
  "raj patel ",
  " AMRITA SINGH "
]

# Multiple cleaning steps

cleaned_data = [
  name.strip ().lower().title()
  for name in raw_data
]
print(cleaned_data)
['Priya Sharma', 'Raj Patel', 'Amrita Singh']

The technique used for cleaned_data combines multiple string manipulation methods in one compact iterative operation. [for name in raw_data] is the fundamental list comprehension syntax.

Breaking down the cleaning process:

  • strip() removes whitespace from both ends of the string
  • lower() converts to lowercase
  • title() capitalizes first letters

Furthermore, use slice(start, stop, step) to extract specific parts of a sequence like a list or string by defining a start, stop, and step.

  • Start: The index where slicing begins.
  • Stop: The index where slicing ends (exclusive).
  • Step: The interval between elements to include.

When parameters change dynamically, slice() adjust the portion of data being retrieved.

EXAMPLE

While analyzing temperature data, you may need to dynamically extract a subset based on pre-defined parameters such as extracting temperature data for days 2 to 5.

Extracting data from an existing set of data the following code can be used.

data = [30, 32, 31, 33, 35, 34] 
start, stop = 1, 4 # Dynamically defined
subset = data[slice(start, stop)]
print(subset)
[32, 31, 33] 

Moreover, as the research requirements become complex such as selectively extracting elements from a dataset based on a condition filter(function, iterable) can help.

  • Function defines the condition and a sequence, returning only the elements that meet the criteria.
  • The data set can be a string, list, tuple, etc. to filter.
EXAMPLE

Extracting responses of a survey above a certain threshold.

responses = [5, 8, 2, 9, 3] 
threshold = 5

filtered = filter(lambda x: x > threshold, responses)

print(filtered)
[8, 9] 

It is an iterative process that ensures data relevance by removing outliers from data and returning items that meet the condition.

Function: lambda x: x > threshold

Determines whether a score is included in the result.

  • x represents each item in the scores list.
  • x > threshold evaluates whether x (a score) is greater than the threshold.

Iterable: scores

This is the list of scores to filter.

filter() processes each item in scores, applying the lambda function to decide if the item should be included in the output.

Transforming data with Map

The map(function, iterable) function in Python is used to apply a transformation to every item in an iterable such as a list, or tuple and return the transformed results. It takes an input data structure, applies a specified operation to each item, and returns a new iterable with the transformed items. The function can perform any operation, such as mathematical calculations, string manipulations, or conversions. Furthermore, it does not alter the original data structure but produces a new one with transformed items.

celsius_temps = [0, 20, 30] 
fahrenheit_temps = map(lambda c: (c * 9/5) + 32, celsius_temps )
print(list( fahrenheit_temps ))
[32.0, 68.0, 86.0] 

map() pairs perfectly with functions, applying them across datasets without explicit iteration. It returns an iterator, that doesn’t calculate results until needed. This improves efficiency for large datasets.

TIP

You can also pass more than one iterable to map() as multiple arguments.

list1 = [1, 2, 3] 
list2 = [4, 5, 6]
result = map(lambda x, y: x + y, list1, list2)
print(list(result))
[5, 7, 9]

Making data manipulation efficient and readable data insights

The functions sorted(), reduce(), strip(), slice(), filter(), and map() play an important role in making data manipulation for research not only efficient but also highly readable. They simplify complex operations, ensure consistency, and provide clarity when working with datasets, enabling researchers to focus more on insights rather than technicalities.

  • sorted() helps organize data systematically, whether by alphabetical order, numerical value, or custom logic, enabling clarity and structure in your datasets.
  • reduce() condenses a collection into a single value by applying a function repeatedly, making it invaluable for cumulative operations like summing or multiplying values.
  • strip() cleans unwanted spaces or characters from strings, ensuring cleaner and more uniform text data, crucial for accurate analysis and processing.
  • slice() dynamically extracts specific portions of data, offering flexibility to work with subsets of lists, strings, or other sequences.
  • filter() selects elements from a dataset based on conditions, ensuring relevance and focus while removing unwanted data.
  • map() transforms data by applying a function to each element in a sequence, enabling efficient and consistent changes across entire datasets.

Exercise program: Employee Performance and Salary Analyzer

Develop a program that analyzes employee performance data, calculates bonuses, and generates insights using the functions sorted(), reduce(), strip(), slice(), filter(), and map().

Program Requirements

  1. Input Phase
    • Use a predefined list of employee records.
    • Where each record contains the employee’s name, monthly sales, and department.
employees = [ 
  " Rohan, 8000, Sales ",
  " Anjali, 4500, HR",
  "Pooja, 7000, Marketing",
  " Rajesh, 5200, IT ",
  "Vikram, 3000, Support"
]
  1. Cleaning
    • Remove unnecessary spaces from names and data fields.
  2. Extraction
    • Select employees with sales greater than 5000.
  3. Transformation
    • Calculate bonuses ( 10% of sales ) for filtered employees.
    • Format cleaned employee names to title case.
  4. Summarization
    • Calculate the total sales for all employees.
    • Arrange employees by sales, highest to lowest.
  5. Reporting
    • Extract the top 3 performers for a report.

Expected output

# Cleaned Employee Data

["Rohan, 8000, Sales", "Anjali, 4500, HR", "Pooja, 7000, Marketing", "Rajesh, 5200, IT", "Vikram, 3000, Support"]

# Filtered Employees (Sales > 5000)

["Rohan, 8000, Sales", "Pooja, 7000, Marketing", "Rajesh, 5200, IT"]

# Bonuses (10% of Sales)

[800.0, 700.0, 520.0]

# Total Sales Across All Employees

32700

# Employees Sorted by Sales (Descending)

["Rohan, 8000, Sales", "Pooja, 7000, Marketing", "Rajesh, 5200, IT", "Anjali, 4500, HR", "Vikram, 3000, Support"]

# Top 3 Performers

["Rohan, 8000, Sales", "Pooja, 7000, Marketing", "Rajesh, 5200, IT"]

Complete code

NOTES

I am an interdisciplinary educator, researcher, and technologist with over a decade of experience in applied coding, educational design, and research mentorship in fields spanning management, marketing, behavioral science, machine learning, and natural language processing. I specialize in simplifying complex topics such as sentiment analysis, adaptive assessments and data visualizatiion. My training approach emphasizes real-world application, clear interpretation of results and the integration of data mining, processing, and modeling techniques to drive informed strategies across academic and industry domains.

Discuss