Toggle navigation
Toggle navigation
This project
Loading...
Sign in
신은섭(Shin Eun Seop)
/
Detecting_fraud_clicks
Go to a project
Toggle navigation
Toggle navigation pinning
Projects
Groups
Snippets
Help
Project
Activity
Repository
Pipelines
Graphs
Issues
2
Merge Requests
0
Snippets
Network
Create a new issue
Builds
Commits
Issue Boards
Authored by
신은섭(Shin Eun Seop)
2018-05-28 12:22:22 +0900
Browse Files
Options
Browse Files
Download
Email Patches
Plain Diff
Commit
98173ec8ed5f76a5ded831b76f788120b6233ff0
98173ec8
1 parent
dd04a0b3
add comment
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
1 deletions
src/main/java/AvgAdvTime.java
src/main/java/AvgAdvTime.java
View file @
98173ec
...
...
@@ -12,13 +12,15 @@ import static org.apache.spark.sql.functions.sum;
public
class
AvgAdvTime
{
public
static
void
main
(
String
[]
args
)
throws
Exception
{
// Start Spark Session
SparkSession
spark
=
SparkSession
.
builder
()
.
master
(
"local"
)
.
appName
(
"Java Spark SQL basic example"
)
.
getOrCreate
();
// Read SCV to DataSet
Dataset
<
Row
>
df
=
spark
.
read
().
format
(
"csv"
)
.
option
(
"inferSchema"
,
"true"
)
.
option
(
"header"
,
"true"
)
...
...
@@ -29,13 +31,17 @@ public class AvgAdvTime {
newdf
=
newdf
.
withColumn
(
"utc_attributed_time"
,
df
.
col
(
"attributed_time"
).
cast
(
"long"
));
newdf
=
newdf
.
drop
(
"click_time"
).
drop
(
"attributed_time"
);
// set Window partition by 'ip' and 'app' order by 'utc_click_time' select rows between 1st row to current row
WindowSpec
w
=
Window
.
partitionBy
(
"ip"
,
"app"
)
.
orderBy
(
"utc_click_time"
)
.
rowsBetween
(
Window
.
unboundedPreceding
(),
Window
.
currentRow
());
// aggregation
newdf
=
newdf
.
withColumn
(
"cum_count_click"
,
count
(
"utc_click_time"
).
over
(
w
));
newdf
=
newdf
.
withColumn
(
"cum_sum_attributed"
,
sum
(
"is_attributed"
).
over
(
w
));
newdf
=
newdf
.
withColumn
(
"avg_efficient"
,
col
(
"cum_sum_attributed"
).
divide
(
col
(
"cum_count_click"
)));
// print example
newdf
.
where
(
"ip == '5348' and app == '19'"
).
show
();
newdf
.
printSchema
();
...
...
Please
register
or
login
to post a comment