Discussion Forum

Out of memory error

Data migration using console is on hold from last 10 minutes and got these error messages on server running terminal:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “logback-1”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “grakn-core-async-1::3”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “grakn-core-async-1::2”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “grakn-core-scheduled::0”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “grakn-core-async-1::0”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “grakn-core-async-1::1”

How do i resolve this?
Also %cpu goes to 370+

Tried the solution on python 3.x - Java Heap Space issue in Grakn 1.6.0 - Stack Overflow but not working in my case as i am using 2.0.1
Any solution except this?

Hi there - what do you mean you’re using console to do the data import?

Cheers!

yes migrating data using client python

Note that client-python and console are two different things!

Try to make sure your transactions only load 50-100 new concepts per transaction before a commit, that should solve your problem.

If i am using around 2000 rows dataset with the schema file shown below:
define

Payee_Name sub attribute, value string;
Payee_Type sub attribute, value string;
Payee_Address sub attribute, value string;
City_State_Zip sub attribute, value string;

Paid_By_Name sub attribute, value string;

Expenditure_Type sub attribute, value string;

Report_Filed sub attribute, value string;
Expense_Description sub attribute, value string;
Correction sub attribute, value string;
TRANSACTION_ID sub attribute, value string;
View_Report sub attribute, value string;
Travel_Outside_Texas sub attribute, value boolean;
Political_Obligation sub attribute, value boolean;
Reimbursement_Intended sub attribute, value boolean;

Payment_Date sub attribute, value datetime;
Payment_Year sub attribute, value long;
Payment_Amount_In_Dollars sub attribute, value long;

PayeePerson sub entity,
    owns Payee_Name,
    owns Payee_Type,
    owns Payee_Address,
    owns City_State_Zip,
    plays Paymentresults:payee,
    plays PayerExpenditure:Expenditure_Done_To;
    

PaidByPerson sub entity,
    owns Paid_By_Name,
    plays Paymentresults:payer,
    plays PayerExpenditure:Expenditure_Done_By;

Expenditure sub entity,
    owns Expenditure_Type,
    
    owns Report_Filed,
    owns Expense_Description,
    owns Correction,
    owns TRANSACTION_ID,
    owns View_Report,
    owns Travel_Outside_Texas,
    owns Political_Obligation,
    owns Reimbursement_Intended,
    plays PayerExpenditure:Expenditure_Details;
    

Payment sub entity,
    owns Payment_Date,
    owns Payment_Year,
    owns Payment_Amount_In_Dollars,
    plays Paymentresults:paymentDone,
    plays PayerExpenditure:paymentForExpense;

Paymentresults sub relation,
    relates paymentDone,
    relates payer,
    relates payee;
    

PayerExpenditure sub relation,
    relates Expenditure_Done_By,
    relates Expenditure_Details,
    relates Expenditure_Done_To,
    relates paymentForExpense;

Is this needed here to reduce computations? Like i didn’t understood well your previous revert!

yes - what I’m saying is that in your program that loads the data using client-python, you should break your 2000 rows into say 10-20 row batches and do a transaction.commit(). Loading a lot of data in 1 transaction can lead to OOM errors in some cases.

yes it was working when i tried with breaking rows. Isn’t there any way to load whole data at once because if in case i would be there with a dataset of 10,000 or above it’s difficult to break data and load again.

I’m not quite following your text, but it’s standard practice (and faster) to use parallel, small transactions (10-20) rows instead of large transactions - it should be simple to keep using that approach right?

ok. understood…thanks!

Hi there - also wanted to follow up that the next release coming up includes this PR Reduce OOM by reusing rocks iterators in write transactions by flyingsilverfin · Pull Request #6324 · vaticle/typedb · GitHub which should allow you do much larger loads in one transaction if you wanted. However, if you’re doing parallel loading we still recommend using small transactions :slight_smile:

The java.lang.OutOfMemoryError means that your program needs more memory than your Java Virtual Machine (JVM) allowed it to use.

How to Track the error?

  • Increase the default memory your program is allowed to use using the -Xmx option (for instance for 1024 MB: -Xmx1024m). By default, the values are based on the JRE version and system configuration. NOTE: Increasing the heap size is a bad solution, 100% temporary, because you will hit the same issue if you get several parallel requests or when you try to process a bigger file.

  • Find the root cause of memory leaks with help of profiling tools like MAT, Visual VM , jconsole etc. Once you find the root cause, You can fix this memory leaks.

  • Optimize your code so that it needs less memory, using less big data structures and getting rid of objects that are not any more used at some point in your program.

How to avoid this issue?

  • Use local variables wherever possible.
  • Release those objects which you think shall not be needed further.
  • Avoid creation of objects in your loop each time.
  • Try to use caches.
  • Try to move with Multy Threading.

Hi there - what version are you on @williamholding ? Also what size machine (memory/cpu)?

This might be best filed as an issue on github :slight_smile:

thanks!! a reproducible issue would be absolutely fantastic to help us avoid this bug in the future!