[Spce-user] Fraud calculation takes hours

Rene Krenn rkrenn at sipwise.com
Thu Jan 25 22:08:25 EST 2018


Hi,

 

i have now generated a db of similar size:

 

provision subscribers completed:

  total contracts: 100269 rows

  total subscribers: 100216 rows

  total aliases: 100167 rows

  primary aliases: 100167 rows

 

generate cdrs completed:

  total CDRs: 500100 rows

 

time elapsed: 04:33:39

 

the differnece to Alejandro’s DB is that his one will show a larger billing.billing_mappings

table. (can you report how many records?)

 

on my laptop runnung the currently implemented query against that generated db takes

-19 secs when it does not hit cdrs. „hit“ means the cdr start time is covered by the queries BETWEEN clause.

-3mins40secs when it hits 500k cdrs and reports 45k contracts exceeding a profile’s daily limit

-the query time reduces to 1-2 secs when reapeating it (ie. temptables got cached)

-running the query when the database was under load (while populating the data) went up to 10 minutes, app. 2minutes of which required to gather the actual billing mappings subquery (also for the last query variant i posted)

 

This is what’s expected with the current structure. Ist ok for the fraud jobs running once a day.

i agree it’s too slow for accessing /api/customerfraudevents frequently.

 

In november the query was patched to address profile level fraud limits to work properly (at the cost of performance).

 

so one option for Alejandro ist o use the acc-cleanup.pl tool to move cdrs from the cdr table to monthly tables.

Another option ist o collapse billing mappings records (for which a tool could be made).

 

On the Other hand, a change oft he billing mappings table structure is planned on our side.

Before this bigger refactoring, another goal is to avoid the fraud jobs in favour of getting the fraud alarms generated by the rating engine directly/instantly - a recent fix for „free cash“ also makes the „debit“/“Spent this interval“ summary fields to show concise sums meanwhile. This is also an indirect answer to walters point: customers prefer the fraud alarms more aggressively rather than relaxed/delayed – some prefer to run the fraud jobs hourly or even in 15 min intervals. Changing the BETWEEN clause to eg. the last day will not change the speed oft he query.

 

regards, rene

 

Von: Walter Klomp [mailto:walter at myrepublic.net] 
Gesendet: Donnerstag, 25. Jänner 2018 18:12
An: Alejandro Grijalba <agrijalba at innovasur.es>
Cc: Rene Krenn <rkrenn at sipwise.com>; spce-user at lists.sipwise.com
Betreff: Re: [Spce-user] Fraud calculation takes hours

 

 

Hi,

 

I actually moved my database to external server and optimised some indexes to make this faster (but SSD and more memory definitely helps)

 

While you (Sipwise) are at it to optimise this query, you may also want to consider to calculate this not from the 1st of the month, but from a month before, until now…

 

so if you do the calculation today it should run from 27 Dec to 26 Jan (this time) - so that it’s a more accurate period, otherwise in the beginning of the month somebody can do a huge amount of fraud, and at the end of the month they run out immediately.  Same for the daily limits, they should be from yesterday, current time, to now.

 

as in this example.

 


WHERE CASE WHEN bpinfo.interval = 'month'
  THEN cdr.start_time
    BETWEEN UNIX_TIMESTAMP(DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-%d %H:%i:%s'))
        AND UNIX_TIMESTAMP(DATE_FORMAT(NOW(),
        '%Y-%m-%d %H:%i:%s'))-1
  ELSE cdr.start_time
    BETWEEN UNIX_TIMESTAMP(DATE_FORMAT(NOW() - INTERVAL 1 DAY, '%Y-%m-%d %H:%i:%S'))
        AND UNIX_TIMESTAMP(DATE_FORMAT(NOW(),
        '%Y-%m-%d %H:%i:%S'))
  END

Does that make sense ?

 

Thanks

Walter












On 26 Jan 2018, at 12:08 AM, Alejandro Grijalba <agrijalba at innovasur.es <mailto:agrijalba at innovasur.es> > wrote:

 

10 minutes and counting so far :(


El 25/01/2018 a las 16:45, Rene Krenn escribió:

Hi,
 
thanks for pointing.
 
Can you try running this modified query below on your test database a few times,
and report back how long it took (longer or less 2 minutes)?
 
SELECT  <http://bpinfo.id/> bpinfo.id, bpinfo.reseller_id, bpinfo.interval, bpinfo.type,
       IF (bpinfo.fraud_use_reseller_rates > 0, 
          SUM(cdr.source_reseller_cost),
          SUM(cdr.source_customer_cost)) as interval_cost,
       bpinfo.interval_limit,
       bpinfo.interval_lock,
       bpinfo.interval_notify,
       bpinfo.fraud_use_reseller_rates as use_reseller_rates
FROM (
  SELECT  <http://c.id/> c.id, n.reseller_id, bp.fraud_use_reseller_rates, i.interval,
    IF (i.interval = 'month',
        IF (cfp.fraud_interval_limit > 0,
            'account_limit', 'profile_limit'),
        IF (cfp.fraud_daily_limit > 0,
            'account_limit', 'profile_limit')
       ) AS type,
    IF (i.interval = 'month',
        IF (cfp.fraud_interval_limit > 0,
            cfp.fraud_interval_limit, bp.fraud_interval_limit),
        IF (cfp.fraud_daily_limit > 0,
            cfp.fraud_daily_limit, bp.fraud_daily_limit)
       ) AS interval_limit,
    IF (i.interval = 'month',
        IF (cfp.fraud_interval_limit > 0,
            cfp.fraud_interval_lock, bp.fraud_interval_lock),
        IF (cfp.fraud_daily_limit > 0,
            cfp.fraud_daily_lock, bp.fraud_daily_lock)
       ) AS interval_lock,
    IF (i.interval = 'month',
        IF (cfp.fraud_interval_limit > 0,
            cfp.fraud_interval_notify, bp.fraud_interval_notify),
        IF (cfp.fraud_daily_limit > 0,
            cfp.fraud_daily_notify, bp.fraud_daily_notify)
       ) AS interval_notify
  FROM (SELECT IF("day" = 'month','month','day') AS 'interval') i,
       billing.contracts c
  INNER JOIN (
    select  <http://ca.id/> ca.id as contract_id,
      (select  <http://bma.id/> bma.id from billing.billing_mappings bma where
      bma.contract_id= <http://ca.id/> ca.id and
      (bma.end_date >= NOW() OR bma.end_date IS NULL) 
      AND (bma.start_date <= NOW() OR bma.start_date IS NULL) order
      by bma.start_date desc,  <http://bma.id/> bma.id desc limit 1) as id from
      billing.contracts ca
   ) AS bm_actual ON bm_actual.contract_id =  <http://c.id/> c.id
  JOIN billing.billing_mappings bm ON  <http://bm.id/> bm.id =  <http://bm_actual.id/> bm_actual.id
  JOIN billing.billing_profiles bp ON  <http://bp.id/> bp.id = bm.billing_profile_id
  JOIN billing.contacts n ON  <http://n.id/> n.id = c.contact_id
  JOIN billing.resellers r ON  <http://r.id/> r.id = n.reseller_id
  LEFT JOIN billing.contract_fraud_preferences cfp 
    ON cfp.contract_id =  <http://c.id/> c.id
  WHERE c.status = 'active'
  HAVING interval_limit > 0
) AS bpinfo
JOIN accounting.cdr ON cdr.source_account_id =  <http://bpinfo.id/> bpinfo.id
WHERE CASE WHEN bpinfo.interval = 'month'
  THEN cdr.start_time
    BETWEEN UNIX_TIMESTAMP(DATE_FORMAT(NOW(), '%Y-%m-01 00:00:00'))
        AND UNIX_TIMESTAMP(DATE_FORMAT(NOW() + INTERVAL 1 MONTH,
        '%Y-%m-01 00:00:00'))-1
  ELSE cdr.start_time
    BETWEEN UNIX_TIMESTAMP(DATE_FORMAT(NOW(), '%Y-%m-%d 00:00:00'))
        AND UNIX_TIMESTAMP(DATE_FORMAT(NOW() + INTERVAL 1 DAY,
        '%Y-%m-%d 00:00:00'))-1
  END
GROUP BY  <http://bpinfo.id/> bpinfo.id
HAVING interval_cost >= interval_limit;
 
 
 
 
thanks&regards,
rene
 





-- 

Alejandro Grijalba Martínez
Departamento de Sistemas

<firma_correo.png>

E  <mailto:agrijalba at innovasur.com> agrijalba at innovasur.com
D Calle Sierra Morena 12 A Edif.CTSAII. Pta. 2º, Oficina 9. 23620 Mengíbar (Jaén) / España. 
T  <tel:+34953267919> +34 953 267 919

 <http://www.innovasur.com/> www.innovasur.com

_______________________________________________
Spce-user mailing list
 <mailto:Spce-user at lists.sipwise.com> Spce-user at lists.sipwise.com
 <https://lists.sipwise.com/listinfo/spce-user> https://lists.sipwise.com/listinfo/spce-user

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sipwise.com/pipermail/spce-user_lists.sipwise.com/attachments/20180126/8710adcf/attachment-0001.html>


More information about the Spce-user mailing list