1 | initial version |
One approach to delete the initial zeros from a string/varchar in Spark Scala is to use the regexp_replace
function. Here's an example:
import org.apache.spark.sql.functions._
val df = Seq("000123", "004567", "010987", "987654").toDF("num_string")
df.show()
// Output:
// +---------+
// |num_string|
// +---------+
// | 000123|
// | 004567|
// | 010987|
// | 987654|
// +---------+
val df2 = df.withColumn("num_string_trimmed", regexp_replace($"num_string", "^0*", ""))
df2.show()
// Output:
// +---------+----------------+
// |num_string|num_string_trimmed|
// +---------+----------------+
// | 000123| 123|
// | 004567| 4567|
// | 010987| 10987|
// | 987654| 987654|
// +---------+----------------+
In this example, we use the regular expression ^0*
to match zero or more occurrences of the digit 0 at the beginning of the string. The ^
character is used to anchor the match at the start of the string. The *
character means "zero or more." The regexp_replace
function replaces these matches with an empty string (""
), effectively removing them from the start of the string. The resulting DataFrame has a new column called num_string_trimmed
with the initial zeros removed from each value in the original num_string
column.