Pyspark Column is not Iterable : Fixing Generic Error

Pyspark Column is not Iterable Fixing Generic Error

Pyspark column is not iterable error occurs only when we try to access any pyspark column as a function since columns are not callable objects. Actually, this is not a pyspark specific error. The generic error is TypeError: ‘Column’ object is not callable. Since it is coming for pyspark dataframe hence we call in the above way. However, the same error is also possible with pandas, etc. Well In this article, we are going to uncover this error with one practical example. We will also understand the best way to fix the error.

 

pyspark column is not iterable :  ( Root Cause and Fix ) –

Let’s create a dummy pyspark dataframe and then create a scenario where we can replicate this error.  here is the code to create a dummy pyspark dataframe.

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Data Science Learner').getOrCreate()
data_df = [[1, "Abhishek", "A"], [2, "Ankita", "B"], [3, "Sukesh", "C"]]
columns = ['Seq', 'Name', 'Identifier']
dataframe = spark.createDataFrame(data_df, columns)
dataframe.show()

Let’s run and see if dummy pyspark dataframe is created?

pyspark dataframe
pyspark dataframe

Yes, we have created the same. Now let’s apply any condition over any column. Here we will replicate the same error.

dataframe.select('Identifier').where(dataframe.Identifier() < B).show()
TypeError'Column' object is not callable
TypeError’Column’ object is not callable

Here we are getting this error because  Identifier is a pyspark column. But we are treating it as a function here. That is the root cause of this error. Only any form of function in Python is callable. NoneType, List , Tuple, int and str are not callable.

Fixing this bug by syntax correction –

As we already explained this is just a syntax error. We can simply fix the same by removing parenthesis after the column name of pyspark dataframe. To make it more clear,  In the above example, we used dataframe.Identifier() which is incorrect. Since it represents a function ( callable object ) if we remove the same and access the column incorrect way, We will get rid of the error.

dataframe.select('Identifier').where(dataframe.Identifier < 'B').show()

Not exactly but a quite a similar error occurs when we try to access the complete dataframe as callable object. Hope now the basics are pretty clear to us.

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner