Ask Your Question
4

How can I delete the initial zeros from a string/varchar in Spark Scala?

asked 2022-03-11 11:00:00 +0000

lalupa gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2022-01-12 09:00:00 +0000

plato gravatar image

One approach to delete the initial zeros from a string/varchar in Spark Scala is to use the regexp_replace function. Here's an example:

import org.apache.spark.sql.functions._

val df = Seq("000123", "004567", "010987", "987654").toDF("num_string")
df.show()

// Output:
// +---------+
// |num_string|
// +---------+
// |   000123|
// |   004567|
// |   010987|
// |   987654|
// +---------+

val df2 = df.withColumn("num_string_trimmed", regexp_replace($"num_string", "^0*", ""))
df2.show()

// Output:
// +---------+----------------+
// |num_string|num_string_trimmed|
// +---------+----------------+
// |   000123|             123|
// |   004567|            4567|
// |   010987|           10987|
// |   987654|          987654|
// +---------+----------------+

In this example, we use the regular expression ^0* to match zero or more occurrences of the digit 0 at the beginning of the string. The ^ character is used to anchor the match at the start of the string. The * character means "zero or more." The regexp_replace function replaces these matches with an empty string (""), effectively removing them from the start of the string. The resulting DataFrame has a new column called num_string_trimmed with the initial zeros removed from each value in the original num_string column.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2022-03-11 11:00:00 +0000

Seen: 9 times

Last updated: Jan 12 '22