java - Spark - Extract JSON from text in dataset while keeping the other columns -
i reading data kafka (the value alrady in json format in text field).
i using bit of code doc
dataset<row> ds = ss.read().format("kafka").option("kafka.bootstrap.servers", ipbrokers) .option("subscribe", topic).load().selectexpr("cast(key string)", "cast(value string)", "topic", "partition", "offset", "timestamp", "timestamptype"); with "ss" being mys sparksession.
in first time didn't bother keeping other informations value , used :
javardd<string> jrdd = ds.select(ds.col("value")).tojavardd().map(v1 -> v1.mkstring()); return ss.read().json(jrdd); and getting nice dataset value column.
now have 2 problems :
- the json(javardd) has been deprecated , try not use if possible
- i can't see how extract json in field value while keeping other informations (key, offset..)
i have tried transform dataset big json array re-read :
dataset<row> ds2 = ss.read().json(ds.tojson()); but didn't go value field , got started...
i think should able map or udf don't know start..
note: can't use static mapping json change
thanks help
edit : has been marked possible duplicate of this question answer not seem correspond.
in first case need schema (which don't have, have kafka message schema) in second case extract data in value , not append rest of kafka data (key, offset...)
example : have in value message : {"test1":"a", "test2":"b"} :
+------+----------------------------+-------+-----------+--------+----------------------+---------------+ | key | value | topic | partition | offset | timestamp | timestamptype | +------+----------------------------+-------+-----------+--------+----------------------+---------------+ | null | {"test1":"a", "test2":"b"} | part | 0 | 0 | 2017-09-11 11:03:... | 0 | +------+----------------------------+-------+-----------+--------+----------------------+---------------+ and have :
+----+----------------+---------+-----+---------+------+--------------------+-------------+-----+-----+ | key| value| |topic|partition|offset| timestamp |timestamptype|test1|test2| +----+----------------+---------+-----+---------+------+--------------------+-------------+-----+-----+ |null|{"test1":"a", "test2":"b"}| part| 0 | 0 |2017-09-11 11:03:...| 0 | | b | +----+----------------+-----+---------+---------+------+----------+---------+-------------+-----+-----+ i don't care if value field has dropped.
Comments
Post a Comment